962 research outputs found

    Hierarchical topic structuring: from dense segmentation to topically focused fragments via burst analysis

    Get PDF
    International audienceTopic segmentation traditionally relies on lexical cohesion measured through word re-occurrences to output a dense segmen-tation, either linear or hierarchical. In this paper, a novel organization of the topical structure of textual content is proposed. Rather than searching for topic shifts to yield dense segmentation, we propose an algorithm to extract topically focused fragments organized in a hierarchical manner. This is achieved by leveraging the temporal distribution of word re-occurrences, searching for bursts, to skirt the limits imposed by a global counting of lexical re-occurrences within segments. Comparison to a reference dense segmentation on varied datasets indicates that we can achieve a better topic focus while retrieving all of the important aspects of a text

    Évaluation d'une nouvelle structuration thématique hiérarchique des textes dans un cadre de résumé automatique et de détection d'ancres au sein de vidéos

    Get PDF
    National audienceDans cet article, nous évaluons, à travers son intérêt pour le résumé automatique et la détection d'ancres dans des vidéos, le potentiel d'une nouvelle structure thématique extraite de données textuelles, composée d'une hiérarchie de fragments thématiquement focalisés. Cette structure est produite par un algorithme exploitant les distributions temporelles d'apparition des mots dans les textes en se fondant sur une analyse de salves lexicales. La hiérarchie obtenue a pour objet de filtrer le contenu non crucial et de ne conserver que l'information saillante des textes, à différents niveaux de détail. Nous montrons qu'elle permet d'améliorer la production de résumés ou au moins de maintenir les résultats de l'état de l'art, tandis que pour la détection d'ancres, elle nous conduit à la meilleure précision dans le contexte de la tâche Search and Anchoring in Video Archives à MediaEval. Les expériences sont réalisées sur du texte écrit et sur un corpus de transcriptions automatiques d'émissions de télévision. ABSTRACT Evaluation of a novel hierarchical thematic structuring of texts in the framework of text sum-marization and anchor detection for video hyperlinking This paper investigates the potential of a novel topical structure of text-like data in the context of summarization and anchor detection in video hyperlinking. This structure is produced by an algorithm that exploits temporal distributions of words through word burst analysis to generate a hierarchy of topically focused fragments. The obtained hierarchy aims at filtering out non-critical content, retaining only the salient information at various levels of detail. For the tasks we choose to evaluate the structure on, the lost of important information is highly damaging. We show that the structure can actually improve the results of summarization or at least maintain state-of-the-art results, while for anchor detection it leads us to the best precision in the context of the Search and Anchoring in Video Archives task at MediaEval. The experiments were carried on written text and a more challenging corpus containing automatic transcripts of TV shows. MOTS-CLÉS : analyse de salves lexicales, hiérarchie de fragments thématiques, résumé automa-tique, détection d'ancres. KEYWORDS: burst analysis, hierarchy of topical fragments, text summarization, anchor detection. (a) (b) (c) FIGURE 1 – Représentations génériques (a) d'une segmentation thématique linéaire, (b) d'une segmentation thématique hiérarchique dense classique, versus (c) celle d'une hiérarchie de fragments thématiquement focalisés. Les lignes verticales en pointillés illustrent les frontières des thèmes et sous-thèmes

    Personalized Expert Recommendation: Models and Algorithms

    Get PDF
    Many large-scale information sharing systems including social media systems, questionanswering sites and rating and reviewing applications have been growing rapidly, allowing millions of human participants to generate and consume information on an unprecedented scale. To manage the sheer growth of information generation, there comes the need to enable personalization of information resources for users — to surface high-quality content and feeds, to provide personally relevant suggestions, and so on. A fundamental task in creating and supporting user-centered personalization systems is to build rich user profile to aid recommendation for better user experience. Therefore, in this dissertation research, we propose models and algorithms to facilitate the creation of new crowd-powered personalized information sharing systems. Specifically, we first give a principled framework to enable personalization of resources so that information seekers can be matched with customized knowledgeable users based on their previous historical actions and contextual information; We then focus on creating rich user models that allows accurate and comprehensive modeling of user profiles for long tail users, including discovering user’s known-for profile, user’s opinion bias and user’s geo-topic profile. In particular, this dissertation research makes two unique contributions: First, we introduce the problem of personalized expert recommendation and propose the first principled framework for addressing this problem. To overcome the sparsity issue, we investigate the use of user’s contextual information that can be exploited to build robust models of personal expertise, study how spatial preference for personally-valuable expertise varies across regions, across topics and based on different underlying social communities, and integrate these different forms of preferences into a matrix factorization-based personalized expert recommender. Second, to support the personalized recommendation on experts, we focus on modeling and inferring user profiles in online information sharing systems. In order to tap the knowledge of most majority of users, we provide frameworks and algorithms to accurately and comprehensively create user models by discovering user’s known-for profile, user’s opinion bias and user’s geo-topic profile, with each described shortly as follows: —We develop a probabilistic model called Bayesian Contextual Poisson Factorization to discover what users are known for by others. Our model considers as input a small fraction of users whose known-for profiles are already known and the vast majority of users for whom we have little (or no) information, learns the implicit relationships between user?s known-for profiles and their contextual signals, and finally predict known-for profiles for those majority of users. —We explore user’s topic-sensitive opinion bias, propose a lightweight semi-supervised system called “BiasWatch” to semi-automatically infer the opinion bias of long-tail users, and demonstrate how user’s opinion bias can be exploited to recommend other users with similar opinion in social networks. — We study how a user’s topical profile varies geo-spatially and how we can model a user’s geo-spatial known-for profile as the last step in our dissertation for creation of rich user profile. We propose a multi-layered Bayesian hierarchical user factorization to overcome user heterogeneity and an enhanced model to alleviate the sparsity issue by integrating user contexts into the two-layered hierarchical user model for better representation of user’s geo-topic preference by others

    Modélisation des comportements de recherche basé sur les interactions des utilisateurs

    Get PDF
    Les utilisateurs de systèmes d'information divisent normalement les tâches en une séquence de plusieurs étapes pour les résoudre. En particulier, les utilisateurs divisent les tâches de recherche en séquences de requêtes, en interagissant avec les systèmes de recherche pour mener à bien le processus de recherche d'informations. Les interactions des utilisateurs sont enregistrées dans des journaux de requêtes, ce qui permet de développer des modèles pour apprendre automatiquement les comportements de recherche à partir des interactions des utilisateurs avec les systèmes de recherche. Ces modèles sont à la base de multiples applications d'assistance aux utilisateurs qui aident les systèmes de recherche à être plus interactifs, faciles à utiliser, et cohérents. Par conséquent, nous proposons les contributions suivantes : un modèle neuronale pour apprendre à détecter les limites des tâches de recherche dans les journaux de requête ; une architecture de regroupement profond récurrent qui apprend simultanément les représentations de requête et regroupe les requêtes en tâches de recherche ; un modèle non supervisé et indépendant d'utilisateur pour l'identification des tâches de recherche prenant en charge les requêtes dans seize langues ; et un modèle de tâche de recherche multilingue, une approche non supervisée qui modélise simultanément l'intention de recherche de l'utilisateur et les tâches de recherche. Les modèles proposés améliorent les méthodes existantes de modélisation, en tenant compte de la confidentialité des utilisateurs, des réponses en temps réel et de l'accessibilité linguistique. Le respect de la vie privée de l'utilisateur est une préoccupation majeure, tandis que des réponses rapides sont essentielles pour les systèmes de recherche qui interagissent avec les utilisateurs en temps réel, en particulier dans la recherche par conversation. Dans le même temps, l'accessibilité linguistique est essentielle pour aider les utilisateurs du monde entier, qui interagissent avec les systèmes de recherche dans de nombreuses langues. Les contributions proposées peuvent bénéficier à de nombreuses applications d'assistance aux utilisateurs, en aidant ces derniers à mieux résoudre leurs tâches de recherche lorsqu'ils accèdent aux systèmes de recherche pour répondre à leurs besoins d'information.Users of information systems normally divide tasks in a sequence of multiple steps to solve them. In particular, users divide search tasks into sequences of queries, interacting with search systems to carry out the information seeking process. User interactions are registered on search query logs, enabling the development of models to automatically learn search patterns from the users' interactions with search systems. These models underpin multiple user assisting applications that help search systems to be more interactive, user-friendly, and coherent. User assisting applications include query suggestion, the ranking of search results based on tasks, query reformulation analysis, e-commerce applications, retrieval of advertisement, query-term prediction, mapping of queries to search tasks, and so on. Consequently, we propose the following contributions: a neural model for learning to detect search task boundaries in query logs; a recurrent deep clustering architecture that simultaneously learns query representations through self-training, and cluster queries into groups of search tasks; Multilingual Graph-Based Clustering, an unsupervised, user-agnostic model for search task identification supporting queries in sixteen languages; and Language-agnostic Search Task Model, an unsupervised approach that simultaneously models user search intent and search tasks. Proposed models improve on existing methods for modeling user interactions, taking into account user privacy, realtime response times, and language accessibility. User privacy is a major concern in Ethics for intelligent systems, while fast responses are critical for search systems interacting with users in realtime, particularly in conversational search. At the same time, language accessibility is essential to assist users worldwide, who interact with search systems in many languages. The proposed contributions can benefit many user assisting applications, helping users to better solve their search tasks when accessing search systems to fulfill their information needs
    • …
    corecore