3 research outputs found
Similarity and diversity: two sides of the same coin in the evaluation of data streams
The Information Systems represent the primary instrument of growth for the companies
that operate in the so-called e-commerce environment. The data streams
generated by the users that interact with their websites are the primary source to
define the user behavioral models.
Some main examples of services integrated in these websites are the Recommender
Systems, where these models are exploited in order to generate recommendations
of items of potential interest to users, the User Segmentation Systems,
where the models are used in order to group the users on the basis of their preferences,
and the Fraud Detection Systems, where these models are exploited to
determine the legitimacy of a financial transaction.
Even though in literature diversity and similarity are considered as two sides
of the same coin, almost all the approaches take into account them in a mutually
exclusive manner, rather than jointly. The aim of this thesis is to demonstrate how
the consideration of both sides of this coin is instead essential to overcome some
well-known problems that affict the state-of-the-art approaches used to implement these services, improving their performance.
Its contributions are the following: with regard to the recommender systems,
the detection of the diversity in a user profile is used to discard incoherent items,
improving the accuracy, while the exploitation of the similarity of the predicted
items is used to re-rank the recommendations, improving their effectiveness; with
regard to the user segmentation systems, the detection of the diversity overcomes
the problem of the non-reliability of data source, while the exploitation of the
similarity reduces the problems of understandability and triviality of the obtained
segments; lastly, concerning the fraud detection systems, the joint use of both
diversity and similarity in the evaluation of a new transaction overcomes the problems
of the data scarcity, and those of the non-stationary and unbalanced class
distribution
Similarity and diversity: two sides of the same coin in the evaluation of data streams
The Information Systems represent the primary instrument of growth for the companies
that operate in the so-called e-commerce environment. The data streams
generated by the users that interact with their websites are the primary source to
define the user behavioral models.
Some main examples of services integrated in these websites are the Recommender
Systems, where these models are exploited in order to generate recommendations
of items of potential interest to users, the User Segmentation Systems,
where the models are used in order to group the users on the basis of their preferences,
and the Fraud Detection Systems, where these models are exploited to
determine the legitimacy of a financial transaction.
Even though in literature diversity and similarity are considered as two sides
of the same coin, almost all the approaches take into account them in a mutually
exclusive manner, rather than jointly. The aim of this thesis is to demonstrate how
the consideration of both sides of this coin is instead essential to overcome some
well-known problems that affict the state-of-the-art approaches used to implement these services, improving their performance.
Its contributions are the following: with regard to the recommender systems,
the detection of the diversity in a user profile is used to discard incoherent items,
improving the accuracy, while the exploitation of the similarity of the predicted
items is used to re-rank the recommendations, improving their effectiveness; with
regard to the user segmentation systems, the detection of the diversity overcomes
the problem of the non-reliability of data source, while the exploitation of the
similarity reduces the problems of understandability and triviality of the obtained
segments; lastly, concerning the fraud detection systems, the joint use of both
diversity and similarity in the evaluation of a new transaction overcomes the problems
of the data scarcity, and those of the non-stationary and unbalanced class
distribution