A LITERATURE SURVEY ON WEB CRAWLERS

Abstract

Author(s): V. Rajapriya

The web contains large data and it contains innumerable websites that is monitored by a tool or a program known as Crawler. Forum Crawler Under Supervision is a supervised web-scale forum crawler. The goal is to crawl relevant forum content from the web with minimal overhead. Forums have different layouts or styles and are powered by different forum software packages. They have similar implicit navigation paths connected by specific URL types to lead users from entry pages to thread pages. It reduces the web forum crawling problem to a URL type recognition problem. It also shows how to learn accurate and effective regular expression patterns of implicit navigation paths from automatically created training sets using aggregated results from weak page type classifiers. These type classifiers can be trained and applied to large set of unseen forums. It produces the best effectiveness and addresses the scalability issue and includes the concept called sentimental analysis. This paper tells about the web crawler and their challenges and I produced survey of four papers.