Thursday, July 10, 2008

Search rankings based on user navigation

Mikhail Bilenko and Ryen White at Microsoft Research performed an interesting study on the utility of website navigation data for improving search results. They first recorded users' post-search browsing activity using (opt-in) functionality of the Microsoft toolbar plugin. They then analyzed the navigation data, turning it into features that they provided to a "learning to rank" system. The resulting retrieval function was able to significantly improve search results on their testbed. The paper is called "Mining the Search Trails of Surfing Crowds: Identifying Relevant Websites From User Activity".

Their basic idea was to record the trace of websites visited by a user from when he/she first issues a query to a search engine up until he/she stops navigating (by closing the browser window, remaining idle on a page for a period, or issuing a new query). The presence of websites in the trace is assumed to provide some indication as to the relevance of those pages to the initial query. The intuition here is that users issue queries to search engines to find "jumping off" pages, from where they begin their own navigation in order to find the truly relevant document for their current "information need".

In order to deal with the sparsity of query data (most queries to a search engine are seen infrequently and thus little click/navigation data is available for each), they proposed a very nice random walk idea that can generalize the click data across different query terms. Basically, a page that is navigated to (lets call it page0), after a user submits a query containing term0 is also visited by another user after submitting a different term, say term1. Moreover the latter term also resulted in clicks on pages: page1, page2, ..., so we might consider these links to be relevant (to a lesser extent) to the original query term term0. The random walk quantifies this intuition by deciding to what extent these additional pages are weighted for term0. As an aside, it seems that performing random walks over click-through data is popular idea at the moment in Web Mining research - see this post by Greg Linden.

In general, I like papers that go beyond analyzing the static Web graph (of hyperlinked webpages) to investigate the dynamic Web graph (of hyperlinks weighted by traffic volume) in order to improve search results. I read a somewhat related article recently on using the dynamic Web graph to calculate a better query-independent document prior (i.e. a better version of Google's PageRank). The article is called "Ranking Websites with Real User Traffic" by Meiss, Menczer, Fortunato, Flammini and Vespignani. The authors measure all of the navigation activity by (suitably anonymized) students and employees at the University of Indiana. (See this other post by Greg Linden.)

The PageRank algorithm takes as input the static graph and uses some rather strong assumptions (that a random surfer will choose from a page's outgoing links with uniform probability) to try to predict the dynamic graph (the amount of user traffic arriving at each webpage). The article looks at surfer data for all Internet users at the University of Indiana over a period of seven months and finds that the random surfer model of PageRank does a relatively poor job at predicting which pages attract the most traffic.

Of course, user navigation data can be very useful for a number of different purposes apart from improving search results, such as e-commerce site optimization and more recently, spam webpage detection. - See for example this recent paper on the subject: "Identifying Web Spam with User Behavior Analysis" by Yiqun Liu, Rongwei Cen, Min Zhang, Liyun Ru and Shaoping Ma.

2 comments:

Anonymous said...

alesse effets secondaires forum http://surveys.questionpro.com/a/TakeSurvey?id=3418728 alesse side effects reviews [url=http://flavors.me/alesse_ethazalit1982] alesse generic brand[/url] alesso wiki music alessio years midi alesse birth control pill generic
mamoleptino321
alesse pill no period http://surveys.questionpro.com/a/TakeSurvey?id=3418912 alesse 21 how to take [url=http://www.world66.com/member/alesse_9l5m482xpjx/] alesse 21 birth control price[/url] anovulant alesse effets secondaires alesse 21 price canada alesso years zippy original

http://www.ige-enterprises.net/guestbook/addguest.html http://www.stuy92.com/ http://mptcwebdev.net/smf/index.php?topic=549620.new#new http://ip-50-63-83-210.ip.secureserver.net/index.php?view=detail&id=168&option=com_joomgallery&Itemid=113+Result:+%E2%EE%E7%EC%EE%E6%ED%EE,+%EE%F2%EF%F0%E0%E2%EB%E5%ED%EE+%28%F1+%EF%E5%F0%E2%EE%E9+%F1%F2%F0%E0%ED%E8%F6%FB%29;+Result:+%ED%E5+%ED%E0%F8%EB%EE%F1%FC+%F4%EE%F0%EC%FB+%E4%EB%FF+%EE%F2%EF%F0%E0%E2%EA%E8; http://www.metabot.ru/forum?profile=en++++++++++++++++++++++++++++Result:+%EE%E1%F0%E0%E1%EE%F2%EA%E0+%F4%F0%E5%E9%EC%EE%E2;+%F3%F1%EF%E5%F5+-+%E7%E0%EF%EE%F1%F2%E8%EB%E8+%E2+%EF%E5%F0%E2%FB%E9+%EF%EE%EF%E0%E2%F8%E8%E9%F1%FF+%F0%E0%E7%E4%E5%EB+%22http://www.metabot.ru/forum?profile=en%22;

alessi brothers wikipedia http://surveys.questionpro.com/a/TakeSurvey?id=3418709 alesse emanuele [url=http://archive.org/details/glashiegassi] difference between alesse 21 and 28[/url] alesse birth control active ingredients alesse 28 effets secondaires pressure alessio wikipedia

Anonymous said...

tratamientos faciales para las cicatrices del acne http://archive.org/details/contchanaha acne de quiste [url=http://archive.org/details/costthagaso] azucar y acne[/url] antes y despues del tratamiento del acne del laser productos de acne de OTC Yaz peor de acne
mamoleptino321
curacion de causa acne http://alesse20fk.carbonmade.com/projects/4680037 aceite de pescado y acne [url=http://alessepw38wo.carbonmade.com/projects/4679982] acne infectado[/url] puede la comunicacion mascara natural del acne mascara de tratamiento de acne de Zia

http://wadrifgpf.co.za/QQ/?p=3#comment-28 http://actibiz.ru/img/guest/index.php?showforum= http://www.blogger.com/comment.g?blogID=2871438652679341143&postID=4181529269569637756&page=1&token=1362065649925 http://hotelbonaca.net/?p=68#comment-370 http://www.blogger.com/comment.g?blogID=10809788&postID=115773513784569794&page=1&token=1362055557779

acne masturbandose http://archive.org/details/tiosygimad Neutrogena skinid promocode [url=http://surveys.questionpro.com/a/TakeSurvey?id=3428059] mascarilla de aspirina para las cicatrices del acne[/url] id de la Neutrogena tratamiento del acne fototerapia acne de manteca de cacao de Palmers