Quality in Ranking

A good ranking is what all searchers wants back from a query.

A widely used measure for prediction are Precision and Recall.

In a few words:

Precision measures the hit rate over the times you shoot. More you shoot, more easily you can hit, but you have a low Precision
Recall measures the hit rate over the hittable. You can reach the maximal Recall hitting all the targets, no problem how many tries you did.

It is quite obvious that shooting a lot you can have a good Recall and a bad Precision, on the opposite side, shooting only a few good slugs you will have a good Precision, but a small Recall.

Ok, but the problem in ranking is quite different, for some reasons:

Quantity, the results are very a lot more than the few the user will ever look at.

Quality, you have no hint about what the user actually likes or dislikes.

The idea started Google is quite easy (now… as the bulb): imagine the network as a… network, every site is connected to other sites by links.
Imagine to be a random surfer, starting somewhere in the net and moving over links.
Count the times you visit every site, this is more or less the basic PageRank.

The hypothesis under this “Links are valuable man made choices”.

Let’s make some step forward.

Imagine a basket suggestion system in a supermarket, you can some basic algorithms to make a suggestion:

Popularity, suggest only common stuff: milk, bread, fruit… and no one will think “What an intelligent suggestion!”

Random, suggest random stuff, sometime you can suggest something really uncommon but interesting… anyone will think “What a mad system is this?”

Let say that Google is using something based on Popularity, while Altavista was using something like Random.

Is there some space for something better?

Let’s go back to Precision ad Recall, and let move them towards a Ranking quality measure.

It’s easy to find that a lot of human manifestations, like popularity on the net, are modeled by a Zipf’s-like law:

– A few Internet sites are known from the majority of the users.

– A very lot of the Internet sites are known only form a few users.

Let to:predict be the set of sites an user what to find.

And let suggest the user a set of sites: suggestion

Standard Precision and Recall can be computed by:

Precision = delim{|}{suggestion inter to:predict}{|}/delim{|}{suggestion}{|}

Recall = delim{|}{suggestion inter to:predict}{|}/delim{|}{to:predict}{|}

What is the meaning of this? We are counting the hits regardless their popularity, suggesting “bread” has the same value as suggesting “tamari”. The same for misses, missing “milk” has the same value as missing “avocado”.

Let’s order all the sites in the net by descending popularity in popdesc and let give any element the value of the position the element has in the list ordered by descending popularity.

We will have something like milk=4 (common, early in the list), and tamari=453 (uncommon, late in the list).
Considering The Long Tail, in this case that a lot of equally uncommon elements will have very different position, we can correct this behavior using a log() function (at least for now).

Precision_R = {log(delim{|}{suggestion inter to:predict}{|}_R)}/{log(delim{|}{suggestion}{|}_R)}

Recall_R = {log(delim{|}{suggestion inter to:predict}{|}_R)}/{log(delim{|}{to:predict}{|}_R)}

with delim{|}{set}{|}_R = sum of the indexes in popdesc.

This measures are Precision and Recall like, extended with some statistical information about the training data.

Precision_R considers more a serious error to suggest a wrong uncommon element than a common one.

Recall_R considers more valuable a correct suggestion coming from the tail.

Considering that a good ranking is needed for a good classification, having a good way to measure the quality of a ranking before starting to search for a good classifier can be an interesting approach.

Questo articolo è stato pubblicato in Senza categoria e contrassegnato come da Matteo . Aggiungi il permalink ai segnalibri.