Thesis

In this work I study the famous Item-User-Tag space characterizing a lot of so called Web 2.0 applications like del.icio.us, and I’ll show how many informations are “hidden” in this structure.

###Introduction

Internet is a huge archive of information, nowdays we are able to reach what we are searching mainly through search engines, but in the same time more and more sites on internet are using “tags”.

Tags are simple labels characterizing the subject. Usually a tag is a common word, like “news”, “software”, “economy”, but sometime tags are personalized, like “toread” or “interesting”, representing a personal task or opinion.

A social bookmarking site offers a way to archive bookmarks with associated tags. Every User using a social bookmarking site has a number of bookmarked sistes (Items) and a number of Tags.

Every social bookmarking site as an archive of Users, Items and Tags.

Let’s try to explain this reality :

* Users whith common Items have perhaps common interests.
* Users using common Tags are perhaps like-minded.
* Items saved by common Users are perhaps similar in the expressed opinion, or in the style.
* Items tagged with common Tags are perhaps about related subjects.
* Tags used by common Users are perhaps a sort of User’s group slang.
* Tags used in common Items have parhaps a related meaning.

###Dive into the tag space

A common way of teaching a computer to understand the similarity of two text document is the creation of an index of the used words, with the relative number of occurrences, and then compare the two indices.

A smart way of doing this is thinking a space with an axes for every word, and think a document as a vector having as projection in any direction (word) the number of occurrences of the considered word.

The similarity of two documents is somehow related with the distance in this space. Distance a computer can easily compute even in a multidimensional space.

This old idea is almost perfect to be the mathematical ground of the idea expressed in the introduction.

We can compute two distances for any couple of Item, User, Tag :

* Users have a distance in the Tag and in the Item spaces.
* Items have a distance in the User and in the Tag spaces.
* Tags have a distance in the User and in the Item spaces.