11 research outputs found

    Representation learning in heterogeneous information networks for user modeling and recommendations

    Get PDF
    Doctor of PhilosophyDepartment of Computer ScienceWilliam H. HsuCurrent research in the field of recommender systems takes into consideration the interaction between users and items; we call this the homogeneous setting. In most real world systems, however these interactions are heterogeneous, i.e., apart from users and items there are other types of entities present within the system, and the interaction between the users and items occurs in multiple contexts and scenarios. The presence of multiple types of entities within a heterogeneous information network, opens up new interaction modalities for generating recommendations to the users. The key contribution of the proposed dissertation is representation learning in heterogeneous information networks for the recommendations task. Query-based information retrieval is one of the primary ways in which meaningful nuggets of information is retrieved from large amounts of data. Here the query is represented as a user's information need. In a homogeneous setting, in the absence of type and contextual side information, the retrieval context for a user boils down to the user's preferences over observed items. In a heterogeneous setting, information regarding entity types and preference context is available. Thus query-based contextual recommendations are possible in a heterogeneous network. The contextual query could be type-based (e.g., directors, actors, movies, books etc.) or value-based (e.g., based on tag values, genre values such as ``Comedy", ``Romance") or a combination of Types and Values. Exemplar-based information retrieval is another technique for of filtering information, where the objective is to retrieve similar entities based on a set of examples. This dissertation proposes approaches for recommendation tasks in heterogeneous networks, based on these retrieval mechanisms present in traditional information retrieval domain

    A Personalized Dense Retrieval Framework for Unified Information Access

    Full text link
    Developing a universal model that can efficiently and effectively respond to a wide range of information access requests -- from retrieval to recommendation to question answering -- has been a long-lasting goal in the information retrieval community. This paper argues that the flexibility, efficiency, and effectiveness brought by the recent development in dense retrieval and approximate nearest neighbor search have smoothed the path towards achieving this goal. We develop a generic and extensible dense retrieval framework, called \framework, that can handle a wide range of (personalized) information access requests, such as keyword search, query by example, and complementary item recommendation. Our proposed approach extends the capabilities of dense retrieval models for ad-hoc retrieval tasks by incorporating user-specific preferences through the development of a personalized attentive network. This allows for a more tailored and accurate personalized information access experience. Our experiments on real-world e-commerce data suggest the feasibility of developing universal information access models by demonstrating significant improvements even compared to competitive baselines specifically developed for each of these individual information access tasks. This work opens up a number of fundamental research directions for future exploration.Comment: Accepted to SIGIR 202

    Overview of the TREC 2023 Product Product Search Track

    Full text link
    This is the first year of the TREC Product search track. The focus this year was the creation of a reusable collection and evaluation of the impact of the use of metadata and multi-modal data on retrieval accuracy. This year we leverage the new product search corpus, which includes contextual metadata. Our analysis shows that in the product search domain, traditional retrieval systems are highly effective and commonly outperform general-purpose pretrained embedding models. Our analysis also evaluates the impact of using simplified and metadata-enhanced collections, finding no clear trend in the impact of the expanded collection. We also see some surprising outcomes; despite their widespread adoption and competitive performance on other tasks, we find single-stage dense retrieval runs can commonly be noncompetitive or generate low-quality results both in the zero-shot and fine-tuned domain.Comment: 14 pages, 4 figures, 11 tables - TREC 202

    Data aggregation in sensor networks

    Get PDF
    Master of ScienceDepartment of Computing and Information SciencesGurdip SinghSevere energy constraints and limited computing abilities of the nodes in a network present a major challenge in the design and deployment of a wireless sensor network. This thesis aims to present energy efficient algorithms for data fusion and information aggregation in a sensor network. The various methodologies of data fusion presented in this thesis intend to reduce the data traffic within a network by mapping the sensor network application task graph onto a sensor network topology. Partitioning of an application into sub-tasks that can be mapped onto the nodes of a sensor network offers opportunities to reduce the overall energy consumption of a sensor network. The first approach proposes a grid based coordinated incremental data fusion and routing with heterogeneous nodes of varied computational abilities. In this approach high performance nodes arranged in a mesh like structure spanning the network topology, are present amongst the resource constrained nodes. The sensor network protocol performance, measured in terms of hop-count is analysed for various grid sizes of the high performance nodes. To reduce network traffic and increase the energy efficiency in a randomly deployed sensor network, distributed clustering strategies which consider network density and structure similarity are applied on the network topology. The clustering methods aim to improve the energy efficiency of the sensor network by dividing the network into logical clusters and mapping the fusion points onto the clusters. Routing of network information is performed by inter-cluster and intra-cluster routing

    Drug Review Dataset (Druglib.com)

    No full text

    Drug Review Dataset (Drugs.com)

    No full text

    Clause Topic Classification in German and English Standard Form Contracts

    No full text
    So-called standard form contracts, i.e. contracts that are drafted unilaterally by one party, like terms and conditions of online shops or terms of services of social networks, are cornerstones of our modern economy. Their processing is, therefore, of significant practical value. Often, the sheer size of these contracts allows the drafting party to hide unfavourable terms from the other party. In this paper, we compare different approaches for automatically classifying the topics of clauses in standard form contracts, based on a data-set of more than 6,000 clauses from more than 170 contracts, which we collected from German and English online shops and annotated based on a taxonomy of clause topics, that we developed together with legal experts. We will show that, in our comparison of seven approaches, from simple keyword matching to transformer language models, BERT performed best with an F1-score of up to 0.91, however much simpler and computationally cheaper models like logistic regression also achieved similarly good results of up to 0.87
    corecore