Sunday, January 20, 2008

Yahoo adds del.icio.us data to search results

I'm a big fan of the social bookmarking site del.icio.us. It contains a lot of really useful information on what's popular online. Users only tag webpages that they find interesting, so the fact that a page is bookmarked on del.icio.us is a pretty good indication of its quality, and the more individuals that have tagged the same page, the more likely it is to be interesting.

Yahoo bought del.icio.us a while back, and finally (according to this post on TechCrunch) are starting to incorporate this wealth of information into their own search results. Currently they are adding a line to each result page summary detailing the number of users that have bookmarked the page on del.icio.us. It is hard to tell whether they are also using the data internally to improve search results, i.e. by pushing highly tagged pages further up the list.

A recent and excellent paper by Paul Heymann, Georgia Koutrika and Hector Garcia-Molina called "Can Social Bookmarking Improve Web Search?" suggests that social bookmarking data may indeed (albeit in certain limited cases) be useful for improving search results.

They find that URLs in del.icio.us are disproportionately common in search results, given the relative size of del.icio.us compared to Yahoo's search engine index (which they estimate to be at least two orders of magnitude larger): Looking at the results for some 30000 popular queries, they found that 19% of the top 10 results and 9% of the top 100 results were present in del.icio.us. These statistics are hardly surprising given that users must first find a web page in order to add it to del.icio.us - and they do that using a search engine!

Far more interesting to me is the question as to whether the users are adding useful information by bookmarking a page once they've found it. The search engines record which search results users click on and can already use this information to improve their algorithms. Presumably, however, a user will often click on lots of different results and save only the very best (most relevant) ones to del.icio.us. Thus the fact that a page appears in del.icio.us should carry more meaning than a user clicking on a particular search engine result and therefore may be useful for reordering search results.

The authors don't deal directly with the question of whether a page's bookmark count is on its own a good indication of quality. They do however study the freshness of URLs on del.icio.us and conclude that the stream of URLs being added to del.icio.us are at least as new as search engine results. The authors also estimate that 12.5% of URLs posted to del.icio.us were not present in Yahoo's index at the time they were added, implying that del.icio.us could serve as a small source of highly relevant content for seeding Web crawls.

(Thanks to Greg Lindon for pointing out the paper on his blog.)

No comments: