我爱Aspx >> 数据库 >> Google的技术剖析:创始人Sergey Brin 和 Lawrence Page的研究论文3.1 Information RetrievalWork in information retrieval systems goes back many years and is well developed [Witten 94]. However, most of the research on information retrieval systems is on small well controlled homogeneous collections such as collections of scientific papers or news stories on a related topic. Indeed, the primary benchmark for information retrieval, the Text Retrieval Conference [TREC 96], uses a fairly small, well controlled collection for their benchmarks. The "Very Large Corpus" benchmark is only 20GB compared to the 147GB from our crawl of 24 million web pages. Things that work well on TREC often do not produce good results on the web. For example, the standard vector space model tries to return the document that most closely approximates the query, given that both query and document are vectors defined by their word occurrence. On the web, this strategy often returns very short documents that are the query plus a few words. For example, we have seen a major search engine return a page containing only "Bill Clinton Sucks" and picture from a "Bill Clinton" query. Some argue that on the web, users should specify more accurately what they want and add more words to their query. We disagree vehemently with this position. If a user issues a query like "Bill Clinton" they should get reasonable results since there is a enormous amount of high quality information available on this topic. Given examples like these, we believe that the standard information retrieval work needs to be extended to deal effectively with the web.
【我对这篇文章有话说?】
微软公开在华招聘"Google杀..[05-21]
Google周三市值超越eBay 成为最大..[05-21]
传百度将推硬盘搜索 抗衡Google桌..[05-21]
Google雅虎eBay:均有意并购Trad..[05-21]
Google:线下广告拍卖惨淡收场 8..[05-21]
Google:参股AOL落定 双方签署最..[05-21]
Google:收购AOL 5%股权 拟再融资..[05-21]
Google:网络服务将对微软造成致..[05-21]
Google:拟收购新搜索算法 微软雅..[05-21]
Google中国:诸多难题待解 牌照门..[05-21]
实用的存储过程之二[05-22]
一道褒贬不一的 SQL 考试题[05-22]
PFC的使用与探索(一)[05-22]
如何将全文检索中的“干扰词”去..[05-22]
深入研究SQL结构化查询语言中的L..[05-22]
CREATE TABLE – SQL Command[05-22]
Fetching Across Commits: Where..[05-22]
动态的连接到数据库的方法[05-22]
锁定某一列实现同Excel中的样式[05-22]
使用SQL-DMO备份数据库并进行校验[05-22]