71 research outputs found

    User Modeling and User Profiling: A Comprehensive Survey

    Full text link
    The integration of artificial intelligence (AI) into daily life, particularly through information retrieval and recommender systems, has necessitated advanced user modeling and profiling techniques to deliver personalized experiences. These techniques aim to construct accurate user representations based on the rich amounts of data generated through interactions with these systems. This paper presents a comprehensive survey of the current state, evolution, and future directions of user modeling and profiling research. We provide a historical overview, tracing the development from early stereotype models to the latest deep learning techniques, and propose a novel taxonomy that encompasses all active topics in this research area, including recent trends. Our survey highlights the paradigm shifts towards more sophisticated user profiling methods, emphasizing implicit data collection, multi-behavior modeling, and the integration of graph data structures. We also address the critical need for privacy-preserving techniques and the push towards explainability and fairness in user modeling approaches. By examining the definitions of core terminology, we aim to clarify ambiguities and foster a clearer understanding of the field by proposing two novel encyclopedic definitions of the main terms. Furthermore, we explore the application of user modeling in various domains, such as fake news detection, cybersecurity, and personalized education. This survey serves as a comprehensive resource for researchers and practitioners, offering insights into the evolution of user modeling and profiling and guiding the development of more personalized, ethical, and effective AI systems.Comment: 71 page

    Evaluating Information Retrieval and Access Tasks

    Get PDF
    This open access book summarizes the first two decades of the NII Testbeds and Community for Information access Research (NTCIR). NTCIR is a series of evaluation forums run by a global team of researchers and hosted by the National Institute of Informatics (NII), Japan. The book is unique in that it discusses not just what was done at NTCIR, but also how it was done and the impact it has achieved. For example, in some chapters the reader sees the early seeds of what eventually grew to be the search engines that provide access to content on the World Wide Web, today’s smartphones that can tailor what they show to the needs of their owners, and the smart speakers that enrich our lives at home and on the move. We also get glimpses into how new search engines can be built for mathematical formulae, or for the digital record of a lived human life. Key to the success of the NTCIR endeavor was early recognition that information access research is an empirical discipline and that evaluation therefore lay at the core of the enterprise. Evaluation is thus at the heart of each chapter in this book. They show, for example, how the recognition that some documents are more important than others has shaped thinking about evaluation design. The thirty-three contributors to this volume speak for the many hundreds of researchers from dozens of countries around the world who together shaped NTCIR as organizers and participants. This book is suitable for researchers, practitioners, and students—anyone who wants to learn about past and present evaluation efforts in information retrieval, information access, and natural language processing, as well as those who want to participate in an evaluation task or even to design and organize one

    Eight Biennial Report : April 2005 – March 2007

    No full text

    Multi-Dimensional Joins

    Get PDF
    We present three novel algorithms for performing multi-dimensional joins and an in-depth survey and analysis of a low-dimensional spatial join. The first algorithm, the Iterative Spatial Join, performs a spatial join on low-dimensional data and is based on a plane-sweep technique. As we show analytically and experimentally, the Iterative Spatial Join performs well when internal memory is limited, compared to competing methods. This suggests that the Iterative Spatial Join would be useful for very large data sets or in situations where internal memory is a shared resource and is therefore limited, such as with today's database engines which share internal memory amongst several queries. Furthermore, the performance of the Iterative Spatial Join is predictable and has no parameters which need to be tuned, unlike other algorithms. The second algorithm, the Quickjoin algorithm, performs a higher-dimensional similarity join in which pairs of objects that lie within a certain distance epsilon of each other are reported. The Quickjoin algorithm overcomes drawbacks of competing methods, such as requiring embedding methods on the data first or using multi-dimensional indices, which limit the ability to discriminate between objects in each dimension, thereby degrading performance. A formal analysis is provided of the Quickjoin method, and experiments show that the Quickjoin method significantly outperforms competing methods. The third algorithm adapts incremental join techniques to improve the speed of calculating the Hausdorff distance, which is used in applications such as image matching, image analysis, and surface approximations. The nearest neighbor incremental join technique for indices that are based on hierarchical containment use a priority queue of index node pairs and bounds on the distance values between pairs, both of which need to modified in order to calculate the Hausdorff distance. Results of experiments are described that confirm the performance improvement. Finally, a survey is provided which instead of just summarizing the literature and presenting each technique in its entirety, describes distinct components of the different techniques, and each technique is decomposed into an overall framework for performing a spatial join

    Joint Discourse-aware Concept Disambiguation and Clustering

    Get PDF
    This thesis addresses the tasks of concept disambiguation and clustering. Concept disambiguation is the task of linking common nouns and proper names in a text – henceforth called mentions – to their corresponding concepts in a predefined inventory. Concept clustering is the task of clustering mentions, so that all mentions in one cluster denote the same concept. In this thesis, we investigate concept disambiguation and clustering from a discourse perspective and propose a discourse-aware approach for joint concept disambiguation and clustering in the framework of Markov logic. The contributions of this thesis are fourfold: Joint Concept Disambiguation and Clustering. In previous approaches, concept disambiguation and concept clustering have been considered as two separate tasks (Schütze, 1998; Ji & Grishman, 2011). We analyze the relationship between concept disambiguation and concept clustering and argue that these two tasks can mutually support each other. We propose the – to our knowledge – first joint approach for concept disambiguation and clustering. Discourse-Aware Concept Disambiguation. One of the determining factors for concept disambiguation and clustering is the context definition. Most previous approaches use the same context definition for all mentions (Milne & Witten, 2008b; Kulkarni et al., 2009; Ratinov et al., 2011, inter alia). We approach the question which context is relevant to disambiguate a mention from a discourse perspective and state that different mentions require different notions of contexts. We state that the context that is relevant to disambiguate a mention depends on its embedding into discourse. However, how a mention is embedded into discourse depends on its denoted concept. Hence, the identification of the denoted concept and the relevant concept mutually depend on each other. We propose a binwise approach with three different context definitions and model the selection of the context definition and the disambiguation jointly. Modeling Interdependencies with Markov Logic. To model the interdependencies between concept disambiguation and concept clustering as well as the interdependencies between the context definition and the disambiguation, we use Markov logic (Domingos & Lowd, 2009). Markov logic combines first order logic with probabilities and allows us to concisely formalize these interdependencies. We investigate how we can balance between linguistic appropriateness and time efficiency and propose a hybrid approach that combines joint inference with aggregation techniques. Concept Disambiguation and Clustering beyond English: Multi- and Cross-linguality. Given the vast amount of texts written in different languages, the capability to extend an approach to cope with other languages than English is essential. We thus analyze how our approach copes with other languages than English and show that our approach largely scales across languages, even without retraining. Our approach is evaluated on multiple data sets originating from different sources (e.g. news, web) and across multiple languages. As an inventory, we use Wikipedia. We compare our approach to other approaches and show that it achieves state-of-the-art results. Furthermore, we show that joint concept disambiguating and clustering as well as joint context selection and disambiguation leads to significant improvements ceteris paribus

    Interim research assessment 2003-2005 - Computer Science

    Get PDF
    This report primarily serves as a source of information for the 2007 Interim Research Assessment Committee for Computer Science at the three technical universities in the Netherlands. The report also provides information for others interested in our research activities

    Spatial and temporal resolution of sensor observations

    Full text link
    Beobachtung ist ein Kernkonzept der Geoinformatik. Beobachtungen dienen bei Phänomenen wie Klimawandel, Massenbewegungen (z. B. Hangbewegungen) und demographischer Wandel zur Überwachung, Entwicklung von Modellen und Simulation dieser Erscheinungen. Auflösung ist eine zentrale Eigenschaft von Beobachtungen. Der Gebrauch von Beobachtungen unterschiedlicher Auflösung führt zu (potenziell) unterschiedlichen Entscheidungen, da die Auflösung der Beobachtungen das Erkennen von Strukturen während der Phase der Datenanalyse beeinflusst. Der Hauptbeitrag dieser Arbeit ist eine entwickelte Theorie der raum- und zeitlichen Auflösung von Beobachtungen, die sowohl auf technische Sensoren (z. B. Fotoapparat) als auch auf menschliche Sensoren anwendbar ist. Die Konsistenz der Theorie wurde anhand der Sprache Haskell evaluiert, und ihre praktische Anwendbarkeit wurde unter Einsatz von Beobachtungen des Webportals Flickr illustriert

    Geographic information extraction from texts

    Get PDF
    A large volume of unstructured texts, containing valuable geographic information, is available online. This information – provided implicitly or explicitly – is useful not only for scientific studies (e.g., spatial humanities) but also for many practical applications (e.g., geographic information retrieval). Although large progress has been achieved in geographic information extraction from texts, there are still unsolved challenges and issues, ranging from methods, systems, and data, to applications and privacy. Therefore, this workshop will provide a timely opportunity to discuss the recent advances, new ideas, and concepts but also identify research gaps in geographic information extraction
    • …
    corecore