Friday, November 26, 2010

Muddiest point 22/11

Does citing an article increase the webpage’s probability of being retrieved by a crawler? I’m looking at this in the context of a site’s 'hubness' .

Saturday, November 20, 2010

Comment links-Unit 11

Reading notes_Unit 11: Web Search and OAI Protocol

Web Search Engines: Pt1&2
While reading the article and considering the data processing potential of today’s search engines I was reminded of Moore’s Law and capacity capabilities. The content was fairly easy to follow and it clarified some points for me on the search processes of search engines. I found the section on spam rejection to be especially interesting; the ‘policing task’ seems to exist everywhere and spam evolves like everything else.
The second article dealt with the more involved task of actually processing queries, the speed needed for crawling through the W3 is incredible and makes you realize how vast the network really is. The use of caching, which increases capability and reduces cost is something that really emphasizes the amount of separate processes that take place behind the scenes in the www.
OAI
The aims and goals of the OAI, as is relate to metadata harvesting, will have far reaching consequences for digitally mined data on information. The fact that this will allow search engines to trawl parts of the ‘hidden web’ indicates exposure to previously hidden information sources. The implication for libraries is a deeper information mine for user consumption and a more extensive resource pool.
The Deep Web:

In order to maximize information sources we would then require better searching tools, there are constant upgrades but from the statistics quoted there seems to be a disconnect between possible sources and those actually being unearthed.