The capability to quickly fetch a large number of Web pages into a local repository and to index them based on keywords is required in many applications. Large-scale programs that fetch tens of thousands of Web pages per second are called crawlers, spiders, Web robots, or bots. Crawling is usually performed to subsequently index the documents fetched, together, a crawler and an index form key component of a Web search engine. Crawling the web has been the best mechanism in retrieving information from the internet. Due to the share size and dynamism of the internet searching depend mostly on crawlers. Researchers have been working on new crawlers to resolve the problem of relevant web page retrieval, computation cost and efficiency. In this paper we compare Depth first and Breath first crawler with our proposed modified Maxmin ant system based on computation cost and reliability. We simulate the crawlers using Jgraph on different data set, and were able to show that our proposed MMM crawler has less computation cost as the data size increases.
Key words: : Breadth first crawler, Depth first Crawler, Modified Max min ant crawler, web crawler
|