26 research outputs found

    Subprofile-aware diversification of recommendations

    Get PDF
    A user of a recommender system is more likely to be satisfied by one or more of the recommendations if each individual recommendation is relevant to her but additionally if the set of recommendations is diverse. The most common approach to recommendation diversification uses re-ranking: the recommender system scores a set of candidate items for relevance to the user; it then re-ranks the candidates so that the subset that it will recommend achieves a balance between relevance and diversity. Ordinarily, we expect a trade-off between relevance and diversity: the diversity of the set of recommendations increases by including items that have lower relevance scores but which are different from the items already in the set. In early work, the diversity of a set of recommendations was given by the average of their distances from one another, according to some semantic distance metric defined on item features such as movie genres. More recent intent-aware diversification methods formulate diversity in terms of coverage and relevance of aspects. The aspects are most commonly defined in terms of item features. By trying to ensure that the aspects of a set of recommended items cover the aspects of the items in the user’s profile, the level of diversity is more personalized. In offline experiments on pre-collected datasets, intent-aware diversification using item features as aspects sometimes defies the relevance/diversity trade-off: there are configurations in which the recommendations exhibits increases in both relevance and diversity. In this paper, we present a new form of intent-aware diversification, which we call SPAD (Subprofile-Aware Diversification), and a variant called RSPAD (Relevance-based SPAD). In SPAD, the aspects are not item features; they are subprofiles of the user’s profile. We present and compare a number of different ways to extract subprofiles from a user’s profile. None of them is defined in terms of item features. Therefore, SPAD is useful even in domains where item features are not available or are of low quality. On three pre-collected datasets from three different domains (movies, music artists and books), we compare SPAD and RSPAD to intent-aware methods in which aspects are item features. We find on these datasets that SPAD and RSPAD suffer even less from the relevance/diversity trade-off: across all three datasets, they increase both relevance and diversity for even more configurations than other approaches to diversification. Moreover, we find that SPAD and RSPAD are the most accurate systems across all three datasets

    Subprofile aware diversification of recommendations

    Get PDF
    A user of a recommender system is more likely to be satisfied by one or more of the recommendations if each individual recommendation is relevant to her but additionally if the set of recommendations is diverse. The most common approach to recommendation diversification uses re-ranking: the recommender system scores a set of candidate items for relevance to the user; it then re-ranks the candidates so that the subset that it will recommend achieves a balance between relevance and diversity. Ordinarily, we expect a trade-off between relevance and diversity: the diversity of the set of recommendations increases by including items that have lower relevance scores but which are different from the items already in the set. In early work, the diversity of a set of recommendations was given by an aggregate of their distances from one another, according to some semantic distance metric defined on item features such as movie genres. More recent intent-aware diversification methods formulate diversity in terms of coverage and relevance of aspects. The aspects are most commonly defined in terms of item features. By trying to ensure that the aspects of a set of recommended items cover the aspects of the items in the user’s profile, the level of diversity is more personalized. In offline experiments on pre-collected datasets, intent-aware diversification using item features as aspects sometimes defies the relevance/diversity trade-off: there are configurations in which the recommendations exhibits increases in both relevance and diversity. In this thesis, we present a new form of intent-aware diversification, which we call SPAD (Subprofile-Aware Diversification). In SPAD and its variants, the aspects are not item features; they are subprofiles of the user’s profile. We present a number of different ways to extract subprofiles from a user’s profile. None of them is defined in terms of item features. Therefore, SPAD and its variants are useful even in domains where item features are not available or are of low quality. On several pre-collected datasets from different domains (movies, music, books, social network), we compare SPAD and its variants to intent-aware methods in which aspects are item features. We also compare them to calibrated recommendations, which are related to intent-aware recommendations. We find on these datasets that SPAD and its variants suffer even less from the relevance/diversity trade-off: across all datasets, they increase both relevance and diversity for even more configurations than other approaches. Moreover, we apply SPAD to the task of automatic playlist continuation (APC), in which relevance is the main goal, not diversity. We find that, even when applied to the task of APC, SPAD increases both relevance and diversity

    Extending Faceted Search to the Open-Domain Web

    Get PDF
    Faceted search enables users to navigate a multi-dimensional information space by combining keyword search with drill-down options in each facets. For example, when searching “computer monitor”\u27 in an e-commerce site, users can select brands and monitor types from the the provided facets {“Samsung”, “Dell”, “Acer”, ...} and {“LET-Lit”, “LCD”, “OLED”, ...}. It has been used successfully for many vertical applications, including e-commerce and digital libraries. However, this idea is not well explored for general web search in an open-domain setting, even though it holds great potential for assisting multi-faceted queries and exploratory search. The goal of this work is to explore this potential by extending faceted search into the open-domain web setting, which we call Faceted Web Search. We address three fundamental issues in Faceted Web Search, namely: how to automatically generate facets (facet generation); how to re-organize search results with users\u27 selections on facets (facet feedback); and how to evaluate generated facets and entire Faceted Web Search systems. In conventional faceted search, facets are generated in advance for an entire corpus either manually or semi-automatically, and then recommended for particular queries in most of the previous work. However, this approach is difficult to extend to the entire web due to the web\u27s large and heterogeneous nature. We instead propose a query-dependent approach, which extracts facets for queries from their web search results. We further improve our facet generation model under a more practical scenario, where users care more about precision of presented facets than recall. The dominant facet feedback method in conventional faceted search is Boolean filtering, which filters search results by users\u27 selections on facets. However, our investigation shows Boolean filtering is too strict when extended to the open-domain setting. Thus, we propose soft ranking models for Faceted Web Search, which expand original queries with users\u27 selections on facets to re-rank search results. Our experiments show that the soft ranking models are more effective than Boolean filtering models for Faceted Web Search. To evaluate Faceted Web Search, we propose both intrinsic evaluation, which evaluates facet generation on its own, and extrinsic evaluation, which evaluates an entire Faceted Web Search system by its utility in assisting search clarification. We also design a method for building reusable test collections for such evaluations. Our experiments show that using the Faceted Web Search interface can significantly improve the original ranking if allowed sufficient time for user feedback on facets

    Building a semantic search engine with games and crowdsourcing

    Get PDF
    Semantic search engines aim at improving conventional search with semantic information, or meta-data, on the data searched for and/or on the searchers. So far, approaches to semantic search exploit characteristics of the searchers like age, education, or spoken language for selecting and/or ranking search results. Such data allow to build up a semantic search engine as an extension of a conventional search engine. The crawlers of well established search engines like Google, Yahoo! or Bing can index documents but, so far, their capabilities to recognize the intentions of searchers are still rather limited. Indeed, taking into account characteristics of the searchers considerably extend both, the quantity of data to analyse and the dimensionality of the search problem. Well established search engines therefore still focus on general search, that is, "search for all", not on specialized search, that is, "search for a few". This thesis reports on techniques that have been adapted or conceived, deployed, and tested for building a semantic search engine for the very specific context of artworks. In contrast to, for example, the interpretation of X-ray images, the interpretation of artworks is far from being fully automatable. Therefore artwork interpretation has been based on Human Computation, that is, a software-based gathering of contributions by many humans. The approach reported about in this thesis first relies on so called Games With A Purpose, or GWAPs, for this gathering: Casual games provide an incentive for a potentially unlimited community of humans to contribute with their appreciations of artworks. Designing convenient incentives is less trivial than it might seem at first. An ecosystem of games is needed so as to collect the meta-data on artworks intended for. One game generates the data that can serve as input of another game. This results in semantically rich meta-data that can be used for building up a successful semantic search engine. Thus, a first part of this thesis reports on a "game ecosystem" specifically designed from one known game and including several novel games belonging to the following game classes: (1) Description Games for collecting obvious and trivial meta-data, basically the well-known ESP (for extra-sensorial perception) game of Luis von Ahn, (2) the Dissemination Game Eligo generating translations, (3) the Diversification Game Karido aiming at sharpening differences between the objects, that is, the artworks, interpreted and (3) the Integration Games Combino, Sentiment and TagATag that generate structured meta-data. Secondly, the approach to building a semantic search engine reported about in this thesis relies on Higher-Order Singular Value Decomposition (SVD). More precisely, the data and meta-data on artworks gathered with the afore mentioned GWAPs are collected in a tensor, that is a mathematical structure generalising matrices to more than only two dimensions, columns and rows. The dimensions considered are the artwork descriptions, the players, and the artwork themselves. A Higher-Order SVD of this tensor is first used for noise reduction in This thesis reports also on deploying a Higher-Order LSA. The parallel Higher-Order SVD algorithm applied for the Higher-Order LSA and its implementation has been validated on an application related to, but independent from, the semantic search engine for artworks striven for: image compression. This thesis reports on the surprisingly good image compression which can be achieved with Higher-Order SVD. While compression methods based on matrix SVD for each color, the approach reported about in this thesis relies on one single (higher-order) SVD of the whole tensor. This results in both, better quality of the compressed image and in a significant reduction of the memory space needed. Higher-Order SVD is extremely time-consuming what calls for parallel computation. Thus, a step towards automatizing the construction of a semantic search engine for artworks was parallelizing the higher-order SVD method used and running the resulting parallel algorithm on a super-computer. This thesis reports on using Hestenes’ method and R-SVD for parallelising the higher-order SVD. This method is an unconventional choice which is explained and motivated. As of the super-computer needed, this thesis reports on turning the web browsers of the players or searchers into a distributed parallel computer. This is done by a novel specific system and a novel implementation of the MapReduce data framework to data parallelism. Harnessing the web browsers of the players or searchers saves computational power on the server-side. It also scales extremely well with the number of players or searchers because both, playing with and searching for artworks, require human reflection and therefore results in idle local processors that can be brought together into a distributed super-computer.Semantische Suchmaschinen dienen der Verbesserung konventioneller Suche mit semantischen Informationen, oder Metadaten, zu Daten, nach denen gesucht wird, oder zu den Suchenden. Bisher nutzt Semantische Suche Charakteristika von Suchenden wie Alter, Bildung oder gesprochene Sprache für die Auswahl und/oder das Ranking von Suchergebnissen. Solche Daten erlauben den Aufbau einer Semantischen Suchmaschine als Erweiterung einer konventionellen Suchmaschine. Die Crawler der fest etablierten Suchmaschinen wie Google, Yahoo! oder Bing können Dokumente indizieren, bisher sind die Fähigkeiten eher beschränkt, die Absichten von Suchenden zu erkennen. Tatsächlich erweitert die Berücksichtigung von Charakteristika von Suchenden beträchtlich beides, die Menge an zu analysierenden Daten und die Dimensionalität des Such-Problems. Fest etablierte Suchmaschinen fokussieren deswegen stark auf allgemeine Suche, also "Suche für alle", nicht auf spezialisierte Suche, also "Suche für wenige". Diese Arbeit berichtet von Techniken, die adaptiert oder konzipiert, eingesetzt und getestet wurden, um eine semantische Suchmaschine für den sehr speziellen Kontext von Kunstwerken aufzubauen. Im Gegensatz beispielsweise zur Interpretation von Röntgenbildern ist die Interpretation von Kunstwerken weit weg davon gänzlich automatisiert werden zu können. Deswegen basiert die Interpretation von Kunstwerken auf menschlichen Berechnungen, also Software-basiertes Sammeln von menschlichen Beiträgen. Der Ansatz, über den in dieser Arbeit berichtet wird, beruht auf sogenannten "Games With a Purpose" oder GWAPs die folgendes sammeln: Zwanglose Spiele bieten einen Anreiz für eine potenziell unbeschränkte Gemeinde von Menschen, mit Ihrer Wertschätzung von Kunstwerken beizutragen. Geeignete Anreize zu entwerfen in weniger trivial als es zuerst scheinen mag. Ein Ökosystem von Spielen wird benötigt, um Metadaten gedacht für Kunstwerke zu sammeln. Ein Spiel erzeugt Daten, die als Eingabe für ein anderes Spiel dienen können. Dies resultiert in semantisch reichhaltigen Metadaten, die verwendet werden können, um eine erfolgreiche Semantische Suchmaschine aufzubauen. Deswegen berichtet der erste Teil dieser Arbeit von einem "Spiel-Ökosystem", entwickelt auf Basis eines bekannten Spiels und verschiedenen neuartigen Spielen, die zu verschiedenen Spiel-Klassen gehören. (1) Beschreibungs-Spiele zum Sammeln offensichtlicher und trivialer Metadaten, vor allem dem gut bekannten ESP-Spiel (Extra Sensorische Wahrnehmung) von Luis von Ahn, (2) dem Verbreitungs-Spiel Eligo zur Erzeugung von Übersetzungen, (3) dem Diversifikations-Spiel Karido, das Unterschiede zwischen Objekten, also interpretierten Kunstwerken, schärft und (3) Integrations-Spiele Combino, Sentiment und Tag A Tag, die strukturierte Metadaten erzeugen. Zweitens beruht der Ansatz zum Aufbau einer semantischen Suchmaschine, wie in dieser Arbeit berichtet, auf Singulärwertzerlegung (SVD) höherer Ordnung. Präziser werden die Daten und Metadaten über Kunstwerk gesammelt mit den vorher genannten GWAPs in einem Tensor gesammelt, einer mathematischen Struktur zur Generalisierung von Matrizen zu mehr als zwei Dimensionen, Spalten und Zeilen. Die betrachteten Dimensionen sind die Beschreibungen der Kunstwerke, die Spieler, und die Kunstwerke selbst. Eine Singulärwertzerlegung höherer Ordnung dieses Tensors wird zuerst zur Rauschreduktion verwendet nach der Methode der sogenannten Latenten Semantischen Analyse (LSA). Diese Arbeit berichtet auch über die Anwendung einer LSA höherer Ordnung. Der parallele Algorithmus für Singulärwertzerlegungen höherer Ordnung, der für LSA höherer Ordnung verwendet wird, und seine Implementierung wurden validiert an einer verwandten aber von der semantischen Suche unabhängig angestrebten Anwendung: Bildkompression. Diese Arbeit berichtet von überraschend guter Kompression, die mit Singulärwertzerlegung höherer Ordnung erzielt werden kann. Neben Matrix-SVD-basierten Kompressionsverfahren für jede Farbe, beruht der Ansatz wie in dieser Arbeit berichtet auf einer einzigen SVD (höherer Ordnung) auf dem gesamten Tensor. Dies resultiert in beidem, besserer Qualität von komprimierten Bildern und einer signifikant geringeren des benötigten Speicherplatzes. Singulärwertzerlegung höherer Ordnung ist extrem zeitaufwändig, was parallele Berechnung verlangt. Deswegen war ein Schritt in Richtung Aufbau einer semantischen Suchmaschine für Kunstwerke eine Parallelisierung der verwendeten SVD höherer Ordnung auf einem Super-Computer. Diese Arbeit berichtet vom Einsatz der Hestenes’-Methode und R-SVD zur Parallelisierung der SVD höherer Ordnung. Diese Methode ist eine unkonventionell Wahl, die erklärt und motiviert wird. Ab nun wird ein Super-Computer benötigt. Diese Arbeit berichtet über die Wandlung der Webbrowser von Spielern oder Suchenden in einen verteilten Super-Computer. Dies leistet ein neuartiges spezielles System und eine neuartige Implementierung des MapReduce Daten-Frameworks für Datenparallelismus. Das Einspannen der Webbrowser von Spielern und Suchenden spart server-seitige Berechnungskraft. Ebenso skaliert die Berechnungskraft so extrem gut mit der Spieleranzahl oder Suchenden, denn beides, Spiel mit oder Suche nach Kunstwerken, benötigt menschliche Reflektion, was deswegen zu ungenutzten lokalen Prozessoren führt, die zu einem verteilten Super-Computer zusammengeschlossen werden können

    Review of Intent Diversity in Information Retrieval : Approaches, Models and Trends

    Get PDF
    The fast increasing volume of information databases made some difficulties for a user to find the information that they need. Its important for researchers to find the best method for challenging this problem. user intention detection can be used to increase the relevancies of information delivered from the information retrieval system. This research used a systematic mapping process to identify what area, approaches, and models that mostly used to detect user intention in information retrieval in four years later. the result of this research identified that item-based approach is still the most approach researched by researchers to identify intent diversity in information retrieval. The used of item-based approach still increasing from 2015 until 2017. 34% paper used topic models in their research. It means that Topic models still the necessary models explored by the researchers in this study

    A hybrid approach for item collection recommendations : an application to automatic playlist continuation

    Get PDF
    Current recommender systems aim mainly to generate accurate item recommendations, without properly evaluating the multiple dimensions of the recommendation problem. However, in many domains, like in music, where items are rarely consumed in isolation, users would rather need a set of items, designed to work well together, while having some cognitive properties as a whole, related to their perception of quality and satisfaction. In this thesis, a hybrid case-based recommendation approach for item collections is proposed. In particular, an application to automatic playlist continuation, addressing similar cognitive concepts, rather than similar users, is presented. Playlists, that are sets of music items designed to be consumed as a sequence, with a specific purpose and within a specific context, are treated as cases. The proposed recommender system is based on a meta-level hybridization. First, Latent Dirichlet Allocation is applied to the set of past playlists, described as distributions over music styles, to identify their underlying concepts. Then, for a started playlist, its semantic characteristics, like its latent concept and the styles of the included items, are inferred, and Case-Based Reasoning is applied to the set of past playlists addressing the same concept, to construct and recommend a relevant playlist continuation. A graph-based item model is used to overcome the semantic gap between songs’ signal-based descriptions and users’ high-level preferences, efficiently capture the playlists’ structures and the similarity of the music items in those. As the proposed method bases its reasoning on previous playlists, it does not require the construction of complex user profiles to generate accurate recommendations. Furthermore, apart from relevance, support to parameters beyond accuracy, like increased coherence or support to diverse items is provided to deliver a more complete user experience. Experiments on real music datasets have revealed improved results, compared to other state of the art techniques, while achieving a “good trade-off” between recommendations’ relevance, diversity and coherence. Finally, although actually focusing on playlist continuations, the designed approach could be easily adapted to serve other recommendation domains with similar characteristics.Los sistemas de recomendación actuales tienen como objetivo principal generar recomendaciones precisas de artículos, sin evaluar propiamente las múltiples dimensiones del problema de recomendación. Sin embargo, en dominios como la música, donde los artículos rara vez se consumen en forma aislada, los usuarios más bien necesitarían recibir recomendaciones de conjuntos de elementos, diseñados para que se complementaran bien juntos, mientras se cubran algunas propiedades cognitivas, relacionadas con su percepción de calidad y satisfacción. En esta tesis, se propone un sistema híbrido de recomendación meta-nivel, que genera recomendaciones de colecciones de artículos. En particular, el sistema se centra en la generación automática de continuaciones de listas de música, tratando conceptos cognitivos similares, en lugar de usuarios similares. Las listas de reproducción son conjuntos de elementos musicales diseñados para ser consumidos en secuencia, con un propósito específico y dentro de un contexto específico. El sistema propuesto primero aplica el método de Latent Dirichlet Allocation a las listas de reproducción, que se describen como distribuciones sobre estilos musicales, para identificar sus conceptos. Cuando se ha iniciado una nueva lista, se deducen sus características semánticas, como su concepto y los estilos de los elementos incluidos en ella. A continuación, el sistema aplica razonamiento basado en casos, utilizando las listas del mismo concepto, para construir y recomendar una continuación relevante. Se utiliza un grafo que modeliza las relaciones de los elementos, para superar el ?salto semántico? existente entre las descripciones de las canciones, normalmente basadas en características sonoras, y las preferencias de los usuarios, expresadas en características de alto nivel. También se utiliza para calcular la similitud de los elementos musicales y para capturar la estructura de las listas de dichos elementos. Como el método propuesto basa su razonamiento en las listas de reproducción y no en usuarios que las construyeron, no se requiere la construcción de perfiles de usuarios complejos para poder generar recomendaciones precisas. Aparte de la relevancia de las recomendaciones, el sistema tiene en cuenta parámetros más allá de la precisión, como mayor coherencia o soporte a la diversidad de los elementos para enriquecer la experiencia del usuario. Los experimentos realizados en bases de datos reales, han revelado mejores resultados, en comparación con las técnicas utilizadas normalmente. Al mismo tiempo, el algoritmo propuesto logra un "buen equilibrio" entre la relevancia, la diversidad y la coherencia de las recomendaciones generadas. Finalmente, aunque la metodología presentada se centra en la recomendación de continuaciones de listas de reproducción musical, el sistema se puede adaptar fácilmente a otros dominios con características similares.Postprint (published version

    Building a semantic search engine with games and crowdsourcing

    Get PDF
    Semantic search engines aim at improving conventional search with semantic information, or meta-data, on the data searched for and/or on the searchers. So far, approaches to semantic search exploit characteristics of the searchers like age, education, or spoken language for selecting and/or ranking search results. Such data allow to build up a semantic search engine as an extension of a conventional search engine. The crawlers of well established search engines like Google, Yahoo! or Bing can index documents but, so far, their capabilities to recognize the intentions of searchers are still rather limited. Indeed, taking into account characteristics of the searchers considerably extend both, the quantity of data to analyse and the dimensionality of the search problem. Well established search engines therefore still focus on general search, that is, "search for all", not on specialized search, that is, "search for a few". This thesis reports on techniques that have been adapted or conceived, deployed, and tested for building a semantic search engine for the very specific context of artworks. In contrast to, for example, the interpretation of X-ray images, the interpretation of artworks is far from being fully automatable. Therefore artwork interpretation has been based on Human Computation, that is, a software-based gathering of contributions by many humans. The approach reported about in this thesis first relies on so called Games With A Purpose, or GWAPs, for this gathering: Casual games provide an incentive for a potentially unlimited community of humans to contribute with their appreciations of artworks. Designing convenient incentives is less trivial than it might seem at first. An ecosystem of games is needed so as to collect the meta-data on artworks intended for. One game generates the data that can serve as input of another game. This results in semantically rich meta-data that can be used for building up a successful semantic search engine. Thus, a first part of this thesis reports on a "game ecosystem" specifically designed from one known game and including several novel games belonging to the following game classes: (1) Description Games for collecting obvious and trivial meta-data, basically the well-known ESP (for extra-sensorial perception) game of Luis von Ahn, (2) the Dissemination Game Eligo generating translations, (3) the Diversification Game Karido aiming at sharpening differences between the objects, that is, the artworks, interpreted and (3) the Integration Games Combino, Sentiment and TagATag that generate structured meta-data. Secondly, the approach to building a semantic search engine reported about in this thesis relies on Higher-Order Singular Value Decomposition (SVD). More precisely, the data and meta-data on artworks gathered with the afore mentioned GWAPs are collected in a tensor, that is a mathematical structure generalising matrices to more than only two dimensions, columns and rows. The dimensions considered are the artwork descriptions, the players, and the artwork themselves. A Higher-Order SVD of this tensor is first used for noise reduction in This thesis reports also on deploying a Higher-Order LSA. The parallel Higher-Order SVD algorithm applied for the Higher-Order LSA and its implementation has been validated on an application related to, but independent from, the semantic search engine for artworks striven for: image compression. This thesis reports on the surprisingly good image compression which can be achieved with Higher-Order SVD. While compression methods based on matrix SVD for each color, the approach reported about in this thesis relies on one single (higher-order) SVD of the whole tensor. This results in both, better quality of the compressed image and in a significant reduction of the memory space needed. Higher-Order SVD is extremely time-consuming what calls for parallel computation. Thus, a step towards automatizing the construction of a semantic search engine for artworks was parallelizing the higher-order SVD method used and running the resulting parallel algorithm on a super-computer. This thesis reports on using Hestenes’ method and R-SVD for parallelising the higher-order SVD. This method is an unconventional choice which is explained and motivated. As of the super-computer needed, this thesis reports on turning the web browsers of the players or searchers into a distributed parallel computer. This is done by a novel specific system and a novel implementation of the MapReduce data framework to data parallelism. Harnessing the web browsers of the players or searchers saves computational power on the server-side. It also scales extremely well with the number of players or searchers because both, playing with and searching for artworks, require human reflection and therefore results in idle local processors that can be brought together into a distributed super-computer.Semantische Suchmaschinen dienen der Verbesserung konventioneller Suche mit semantischen Informationen, oder Metadaten, zu Daten, nach denen gesucht wird, oder zu den Suchenden. Bisher nutzt Semantische Suche Charakteristika von Suchenden wie Alter, Bildung oder gesprochene Sprache für die Auswahl und/oder das Ranking von Suchergebnissen. Solche Daten erlauben den Aufbau einer Semantischen Suchmaschine als Erweiterung einer konventionellen Suchmaschine. Die Crawler der fest etablierten Suchmaschinen wie Google, Yahoo! oder Bing können Dokumente indizieren, bisher sind die Fähigkeiten eher beschränkt, die Absichten von Suchenden zu erkennen. Tatsächlich erweitert die Berücksichtigung von Charakteristika von Suchenden beträchtlich beides, die Menge an zu analysierenden Daten und die Dimensionalität des Such-Problems. Fest etablierte Suchmaschinen fokussieren deswegen stark auf allgemeine Suche, also "Suche für alle", nicht auf spezialisierte Suche, also "Suche für wenige". Diese Arbeit berichtet von Techniken, die adaptiert oder konzipiert, eingesetzt und getestet wurden, um eine semantische Suchmaschine für den sehr speziellen Kontext von Kunstwerken aufzubauen. Im Gegensatz beispielsweise zur Interpretation von Röntgenbildern ist die Interpretation von Kunstwerken weit weg davon gänzlich automatisiert werden zu können. Deswegen basiert die Interpretation von Kunstwerken auf menschlichen Berechnungen, also Software-basiertes Sammeln von menschlichen Beiträgen. Der Ansatz, über den in dieser Arbeit berichtet wird, beruht auf sogenannten "Games With a Purpose" oder GWAPs die folgendes sammeln: Zwanglose Spiele bieten einen Anreiz für eine potenziell unbeschränkte Gemeinde von Menschen, mit Ihrer Wertschätzung von Kunstwerken beizutragen. Geeignete Anreize zu entwerfen in weniger trivial als es zuerst scheinen mag. Ein Ökosystem von Spielen wird benötigt, um Metadaten gedacht für Kunstwerke zu sammeln. Ein Spiel erzeugt Daten, die als Eingabe für ein anderes Spiel dienen können. Dies resultiert in semantisch reichhaltigen Metadaten, die verwendet werden können, um eine erfolgreiche Semantische Suchmaschine aufzubauen. Deswegen berichtet der erste Teil dieser Arbeit von einem "Spiel-Ökosystem", entwickelt auf Basis eines bekannten Spiels und verschiedenen neuartigen Spielen, die zu verschiedenen Spiel-Klassen gehören. (1) Beschreibungs-Spiele zum Sammeln offensichtlicher und trivialer Metadaten, vor allem dem gut bekannten ESP-Spiel (Extra Sensorische Wahrnehmung) von Luis von Ahn, (2) dem Verbreitungs-Spiel Eligo zur Erzeugung von Übersetzungen, (3) dem Diversifikations-Spiel Karido, das Unterschiede zwischen Objekten, also interpretierten Kunstwerken, schärft und (3) Integrations-Spiele Combino, Sentiment und Tag A Tag, die strukturierte Metadaten erzeugen. Zweitens beruht der Ansatz zum Aufbau einer semantischen Suchmaschine, wie in dieser Arbeit berichtet, auf Singulärwertzerlegung (SVD) höherer Ordnung. Präziser werden die Daten und Metadaten über Kunstwerk gesammelt mit den vorher genannten GWAPs in einem Tensor gesammelt, einer mathematischen Struktur zur Generalisierung von Matrizen zu mehr als zwei Dimensionen, Spalten und Zeilen. Die betrachteten Dimensionen sind die Beschreibungen der Kunstwerke, die Spieler, und die Kunstwerke selbst. Eine Singulärwertzerlegung höherer Ordnung dieses Tensors wird zuerst zur Rauschreduktion verwendet nach der Methode der sogenannten Latenten Semantischen Analyse (LSA). Diese Arbeit berichtet auch über die Anwendung einer LSA höherer Ordnung. Der parallele Algorithmus für Singulärwertzerlegungen höherer Ordnung, der für LSA höherer Ordnung verwendet wird, und seine Implementierung wurden validiert an einer verwandten aber von der semantischen Suche unabhängig angestrebten Anwendung: Bildkompression. Diese Arbeit berichtet von überraschend guter Kompression, die mit Singulärwertzerlegung höherer Ordnung erzielt werden kann. Neben Matrix-SVD-basierten Kompressionsverfahren für jede Farbe, beruht der Ansatz wie in dieser Arbeit berichtet auf einer einzigen SVD (höherer Ordnung) auf dem gesamten Tensor. Dies resultiert in beidem, besserer Qualität von komprimierten Bildern und einer signifikant geringeren des benötigten Speicherplatzes. Singulärwertzerlegung höherer Ordnung ist extrem zeitaufwändig, was parallele Berechnung verlangt. Deswegen war ein Schritt in Richtung Aufbau einer semantischen Suchmaschine für Kunstwerke eine Parallelisierung der verwendeten SVD höherer Ordnung auf einem Super-Computer. Diese Arbeit berichtet vom Einsatz der Hestenes’-Methode und R-SVD zur Parallelisierung der SVD höherer Ordnung. Diese Methode ist eine unkonventionell Wahl, die erklärt und motiviert wird. Ab nun wird ein Super-Computer benötigt. Diese Arbeit berichtet über die Wandlung der Webbrowser von Spielern oder Suchenden in einen verteilten Super-Computer. Dies leistet ein neuartiges spezielles System und eine neuartige Implementierung des MapReduce Daten-Frameworks für Datenparallelismus. Das Einspannen der Webbrowser von Spielern und Suchenden spart server-seitige Berechnungskraft. Ebenso skaliert die Berechnungskraft so extrem gut mit der Spieleranzahl oder Suchenden, denn beides, Spiel mit oder Suche nach Kunstwerken, benötigt menschliche Reflektion, was deswegen zu ungenutzten lokalen Prozessoren führt, die zu einem verteilten Super-Computer zusammengeschlossen werden können

    Digital user's decision journey

    Full text link
    The landscape of the Internet is continually evolving. This creates huge opportunities for different industries to optimize vital channels online, resulting in various-forms of new Internet services. As a result, digital users are interacting with many digital systems and they are exhibiting dynamic behaviors. Their shopping behaviors are drastically different today than it used to be, with offline and online shopping interacting with each other. They have many channels to access online media but their consumption patterns on different channels are quite different. They do philanthropy online to help others but their heterogeneous motivations and different fundraising campaigns leads to distinct path-to-contribution. Understanding the digital user’s decision making process behind their dynamic behaviors is critical as they interact with various digital systems for the firms to improve user experience and improve their bottom line. In this thesis, I study digital users’ decision journeys and the corresponding digital technology firms’ strategies using inter-disciplinary approaches that combine econometrics, economic structural modeling and machine learning. The uncovered decision journey not only offer empirical managerial insights but also provide guideline for introducing intervention to better serve digital users

    Modeling Users' Information Needs in a Document Recommender for Meetings

    Get PDF
    People are surrounded by an unprecedented wealth of information. Access to it depends on the availability of suitable search engines, but even when these are available, people often do not initiate a search, because their current activity does not allow them, or they are not aware of the existence of this information. Just-in-time retrieval brings a radical change to the process of query-based retrieval, by proactively retrieving documents relevant to users' current activities, in an easily accessible and non-intrusive manner. This thesis presents a novel set of methods intended to improve the relevance of a just-in-time retrieval system, specifically a document recommender system designed for conversations, in terms of precision and diversity of results. Additionally, we designed an evaluation protocol to compare the proposed methods in the thesis with other ones using crowdsourcing. In contrast to previous systems, which model users' information needs by extracting keywords from clean and well-structured texts, this system models them from the conversation transcripts, which contain noise from automatic speech recognition (ASR) and have a free structure, often switching between several topics. To deal with these issues, we first propose a novel keyword extraction method which preserves both the relevance and the diversity of topics of the conversation, to properly capture possible users' needs with minimum ASR noise. Implicit queries are then built from these keywords. However, the presence of multiple unrelated topics in one query introduces significant noise into the retrieval results. To reduce this effect, we separate users' needs by topically clustering keyword sets into several subsets or implicit queries. We introduce a merging method which combines the results of multiple queries which are prepared from users' conversation to generate a concise, diverse and relevant list of documents. This method ensures that the system does not distract its users from their current conversation by frequently recommending them a large number of documents. Moreover, we address the problem of explicit queries that may be asked by users during a conversation. We introduce a query refinement method which leverages the conversation context to answer the users' information needs without asking for additional clarifications and therefore, again, avoiding to distract users during their conversation. Finally, we implemented the end-to-end document recommender system by integrating the ideas proposed in this thesis and then proposed an evaluation scenario with human users in a brainstorming meeting

    REVEALING AND ADDRESSING BIAS IN RECOMMENDER SYSTEMS

    Get PDF
    In the past decades, recommenders have achieved outstanding success in delivering personalized and accurate recommendations. Highly customized recommendations bring individuals great convenience, while helping content providers to connect with interested consumers accurately. However, they may also lead to undesirable outcomes, often through the introduction of bias in the training, deployment, and maintenance of recommender systems. For example, recommenders may impose unfair burdens on certain user groups, disadvantaging their prominence on job-based recommenders. And they may narrow down a user’s interest areas, raising concerns of echo chambers, fairness, and diversity. In this dissertation, we ground our work in identifying gaps in the literature for identifying and addressing bias in recommender systems, then introduce four approaches for mitigating biases from multiple perspectives, listed as below: • First, we identify an inherent bias in many recommendation algorithms which optimize for the head (or popular portion) of the rating distribution, thus lead to large estimation errors for tail ratings. We conduct a data-driven investigation and theoretical analysis of the challenges posed by traditional latent factor models for estimating such tail ratings. With these challenges in mind, we propose a new multi-latent representation method designed specifically to estimate these tail ratings better, to reduce the bias associated with tail ratings in recommender systems. • Second, we address another unexplored bias – the target customer distribution distortion. Traditional recommender systems typically aim to optimize an engagement metric without considering the overall distribution of target customers, thereby leading to serious distortion problems. Through a data-driven study, we reveal several distortions that arise from conventional recommenders. Toward overcoming these issues, we propose a target customer re-ranking algorithm to adjust the population distribution and composition in the Top-k target customers of an item while maintaining recommendation quality. ii • Third, we focus on mitigating the next unexplored bias – user’s taste distortion. We show how existing approaches assume a static view of user’s tastes, and so previously proposed calibrated recommenders result in poor modeling of the shift of a user’s evolution. Thus, we empirically identify the taste distortion problem through a data-driven study over multiple datasets. We propose a taste-enhanced calibrated recommender system designed with the shifts and trends of user’s taste preferences in mind, which results in improved taste distribution estimation and recommendation results. • Last but not least, we study the distribution bias in a dynamic recommendation environment. Previous studies of such distribution-aware recommendation have focused exclusively on static scenarios, ignoring important challenges that manifest in real-world dynamics and leading to poor performance in practice. Hence, we present the first study of distribution bias in dynamic recommendations, and propose new methods to mitigate this bias even in the presence of feedback loops and other dynamics
    corecore