Quantcast

A new framework for focused Web crawling

Research paper by Peng Tao, He Fengling, Zuo Wanli

Indexed on: 01 Sep '06Published on: 01 Sep '06Published in: Wuhan University Journal of Natural Sciences



Abstract

Focused crawlers are important tools to support applications such as specialized Web portals, online searching, and Web search engines. A topic driven crawler chooses the best URLs and relevant pages to pursue during Web crawling. It is difficult to deal with irrelevant pages. This paper presents a novel focused crawler framework. In our focused crawler, we propose a method to overcome some of the limitations of dealing with the irrelevant pages. We also introduce the implementation of our focused crawler and present some important metrics and an evaluation function for ranking pages relevance. The experimental result shows that our crawler can obtain more “important” pages and has a high precision and recall value.