我爱Aspx >> 数据库 >> Google的技术剖析:创始人Sergey Brin 和 Lawrence Page的研究论文9 Appendix B: Scalability
9. 1 Scalability of GoogleWe have designed Google to be scalable in the near term to a goal of 100 million web pages. We have just received disk and machines to handle roughly that amount. All of the time consuming parts of the system are parallelize and roughly linear time. These include things like the crawlers, indexers, and sorters. We also think that most of the data structures will deal gracefully with the expansion. However, at 100 million web pages we will be very close up against all sorts of operating system limits in the common operating systems (currently we run on both Solaris and Linux). These include things like addressable memory, number of open file descriptors, network sockets and bandwidth, and many others. We believe expanding to a lot more than 100 million pages would greatly increase the complexity of our system.
9.2 Scalability of Centralized Indexing ArchitecturesAs the capabilities of computers increase, it becomes possible to index a very large amount of text for a reasonable cost. Of course, other more bandwidth intensive media such as video is likely to become more pervasive. But, because the cost of production of text is low compared to media like video, text is likely to remain very pervasive. Also, it is likely that soon we will have speech recognition that does a reasonable job converting speech into text, expanding the amount of text available. All of this provides amazing possibilities for centralized indexing. Here is an illustrative example. We assume we want to index everything everyone in the US has written for a year. We assume that there are 250 million people in the US and they write an average of 10k per day. That works out to be about 850 terabytes. Also assume that indexing a terabyte can be done now for a reasonable cost. We also assume that the indexing methods used over the text are linear, or nearly linear in their complexity. Given all these assumptions we can compute how long it would take before we could index our 850 terabytes for a reasonable cost assuming certain growth factors. Moore@#s Law was defined in 1965 as a doubling every 18 months in processor power. It has held remarkably true, not just for processors, but for other important system parameters such as disk as well. If we assume that Moore@#s law holds for the future, we need only 10 more doublings, or 15 years to reach our goal of indexing everything everyone in the US has written for a year for a price that a small company could afford. Of course, hardware experts are somewhat concerned Moore@#s Law may not continue to hold for the next 15 years, but there are certainly a lot of interesting centralized applications even if we only get part of the way to our hypothetical example.
【我对这篇文章有话说?】
微软公开在华招聘"Google杀..[05-21]
Google周三市值超越eBay 成为最大..[05-21]
传百度将推硬盘搜索 抗衡Google桌..[05-21]
Google雅虎eBay:均有意并购Trad..[05-21]
Google:线下广告拍卖惨淡收场 8..[05-21]
Google:参股AOL落定 双方签署最..[05-21]
Google:收购AOL 5%股权 拟再融资..[05-21]
Google:网络服务将对微软造成致..[05-21]
Google:拟收购新搜索算法 微软雅..[05-21]
Google中国:诸多难题待解 牌照门..[05-21]
实用的存储过程之二[05-22]
一道褒贬不一的 SQL 考试题[05-22]
PFC的使用与探索(一)[05-22]
如何将全文检索中的“干扰词”去..[05-22]
深入研究SQL结构化查询语言中的L..[05-22]
CREATE TABLE – SQL Command[05-22]
Fetching Across Commits: Where..[05-22]
动态的连接到数据库的方法[05-22]
锁定某一列实现同Excel中的样式[05-22]
使用SQL-DMO备份数据库并进行校验[05-22]