资源说明:
udacity-search-engine ################################## This search engine is separated into two parts. 1. engine_spider.py 2. engine_search.py We can crawl new things to enlarge the data, and search word without a crawling process. engine_spider: Crawling web pages using thread pool model. At the same time, indexing all the words with crawled page(word stemming in it). After crawling and indexing, based on all pages' outgoing link, compute every page's ranking. Dump the ranking and indexing variables into the hard disk. Load the ranking and indexing variables from old data on hard disk. Whenever need to stop the spider,use Ctrl+C signal, program will dump the variables from memory to disk automatically. Auto detecte the webpages' encoding, and decode them. engine_search: Load ranking and indexing variables from hard disk. Search the words or a sentence. Inputing words from the console, then stemming and searching. Showing the result in smart way.(quick sort and count the NO. of different word in one url) Caching the query result.
本源码包内暂不包含可直接显示的源代码文件,请下载源码包。