15,976 research outputs found
A Trio Neural Model for Dynamic Entity Relatedness Ranking
Measuring entity relatedness is a fundamental task for many natural language
processing and information retrieval applications. Prior work often studies
entity relatedness in static settings and an unsupervised manner. However,
entities in real-world are often involved in many different relationships,
consequently entity-relations are very dynamic over time. In this work, we
propose a neural networkbased approach for dynamic entity relatedness,
leveraging the collective attention as supervision. Our model is capable of
learning rich and different entity representations in a joint framework.
Through extensive experiments on large-scale datasets, we demonstrate that our
method achieves better results than competitive baselines.Comment: In Proceedings of CoNLL 201
Recuperação multimodal e interativa de informação orientada por diversidade
Orientador: Ricardo da Silva TorresTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Os mĂ©todos de Recuperação da Informação, especialmente considerando-se dados multimĂdia, evoluĂram para a integração de mĂșltiplas fontes de evidĂȘncia na anĂĄlise de relevĂąncia de itens em uma tarefa de busca. Neste contexto, para atenuar a distĂąncia semĂąntica entre as propriedades de baixo nĂvel extraĂdas do conteĂșdo dos objetos digitais e os conceitos semĂąnticos de alto nĂvel (objetos, categorias, etc.) e tornar estes sistemas adaptativos Ă s diferentes necessidades dos usuĂĄrios, modelos interativos que consideram o usuĂĄrio mais prĂłximo do processo de recuperação tĂȘm sido propostos, permitindo a sua interação com o sistema, principalmente por meio da realimentação de relevĂąncia implĂcita ou explĂcita. Analogamente, a promoção de diversidade surgiu como uma alternativa para lidar com consultas ambĂguas ou incompletas. Adicionalmente, muitos trabalhos tĂȘm tratado a ideia de minimização do esforço requerido do usuĂĄrio em fornecer julgamentos de relevĂąncia, Ă medida que mantĂ©m nĂveis aceitĂĄveis de eficĂĄcia. Esta tese aborda, propĂ”e e analisa experimentalmente mĂ©todos de recuperação da informação interativos e multimodais orientados por diversidade. Este trabalho aborda de forma abrangente a literatura acerca da recuperação interativa da informação e discute sobre os avanços recentes, os grandes desafios de pesquisa e oportunidades promissoras de trabalho. NĂłs propusemos e avaliamos dois mĂ©todos de aprimoramento do balanço entre relevĂąncia e diversidade, os quais integram mĂșltiplas informaçÔes de imagens, tais como: propriedades visuais, metadados textuais, informação geogrĂĄfica e descritores de credibilidade dos usuĂĄrios. Por sua vez, como integração de tĂ©cnicas de recuperação interativa e de promoção de diversidade, visando maximizar a cobertura de mĂșltiplas interpretaçÔes/aspectos de busca e acelerar a transferĂȘncia de informação entre o usuĂĄrio e o sistema, nĂłs propusemos e avaliamos um mĂ©todo multimodal de aprendizado para ranqueamento utilizando realimentação de relevĂąncia sobre resultados diversificados. Nossa anĂĄlise experimental mostra que o uso conjunto de mĂșltiplas fontes de informação teve impacto positivo nos algoritmos de balanceamento entre relevĂąncia e diversidade. Estes resultados sugerem que a integração de filtragem e re-ranqueamento multimodais Ă© eficaz para o aumento da relevĂąncia dos resultados e tambĂ©m como mecanismo de potencialização dos mĂ©todos de diversificação. AlĂ©m disso, com uma anĂĄlise experimental minuciosa, nĂłs investigamos vĂĄrias questĂ”es de pesquisa relacionadas Ă possibilidade de aumento da diversidade dos resultados e a manutenção ou atĂ© mesmo melhoria da sua relevĂąncia em sessĂ”es interativas. Adicionalmente, nĂłs analisamos como o esforço em diversificar afeta os resultados gerais de uma sessĂŁo de busca e como diferentes abordagens de diversificação se comportam para diferentes modalidades de dados. Analisando a eficĂĄcia geral e tambĂ©m em cada iteração de realimentação de relevĂąncia, nĂłs mostramos que introduzir diversidade nos resultados pode prejudicar resultados iniciais, enquanto que aumenta significativamente a eficĂĄcia geral em uma sessĂŁo de busca, considerando-se nĂŁo apenas a relevĂąncia e diversidade geral, mas tambĂ©m o quĂŁo cedo o usuĂĄrio Ă© exposto ao mesmo montante de itens relevantes e nĂvel de diversidadeAbstract: Information retrieval methods, especially considering multimedia data, have evolved towards the integration of multiple sources of evidence in the analysis of the relevance of items considering a given user search task. In this context, for attenuating the semantic gap between low-level features extracted from the content of the digital objects and high-level semantic concepts (objects, categories, etc.) and making the systems adaptive to different user needs, interactive models have brought the user closer to the retrieval loop allowing user-system interaction mainly through implicit or explicit relevance feedback. Analogously, diversity promotion has emerged as an alternative for tackling ambiguous or underspecified queries. Additionally, several works have addressed the issue of minimizing the required user effort on providing relevance assessments while keeping an acceptable overall effectiveness. This thesis discusses, proposes, and experimentally analyzes multimodal and interactive diversity-oriented information retrieval methods. This work, comprehensively covers the interactive information retrieval literature and also discusses about recent advances, the great research challenges, and promising research opportunities. We have proposed and evaluated two relevance-diversity trade-off enhancement work-flows, which integrate multiple information from images, such as: visual features, textual metadata, geographic information, and user credibility descriptors. In turn, as an integration of interactive retrieval and diversity promotion techniques, for maximizing the coverage of multiple query interpretations/aspects and speeding up the information transfer between the user and the system, we have proposed and evaluated a multimodal learning-to-rank method trained with relevance feedback over diversified results. Our experimental analysis shows that the joint usage of multiple information sources positively impacted the relevance-diversity balancing algorithms. Our results also suggest that the integration of multimodal-relevance-based filtering and reranking was effective on improving result relevance and also boosted diversity promotion methods. Beyond it, with a thorough experimental analysis we have investigated several research questions related to the possibility of improving result diversity and keeping or even improving relevance in interactive search sessions. Moreover, we analyze how much the diversification effort affects overall search session results and how different diversification approaches behave for the different data modalities. By analyzing the overall and per feedback iteration effectiveness, we show that introducing diversity may harm initial results whereas it significantly enhances the overall session effectiveness not only considering the relevance and diversity, but also how early the user is exposed to the same amount of relevant items and diversityDoutoradoCiĂȘncia da ComputaçãoDoutor em CiĂȘncia da ComputaçãoP-4388/2010140977/2012-0CAPESCNP
A probabilistic approach for cluster based polyrepresentative information retrieval
A thesis submitted to the University of Bedfordshire in
partial ful lment of the requirements for the degree of
Doctor of PhilosophyDocument clustering in information retrieval (IR) is considered an alternative to rank-based retrieval approaches, because of its potential to support user interactions
beyond just typing in queries. Similarly, the Principle of Polyrepresentation (multi-evidence: combining multiple cognitively and/or functionally diff erent information need or information object representations for improving
an IR system's performance) is an established approach in cognitive IR with plausible applicability in the domain of information seeking and retrieval. The combination of these two approaches can assimilate their respective individual
strengths in order to further improve the performance of IR systems.
The main goal of this study is to combine cognitive and cluster-based IR approaches for improving the eff ectiveness of (interactive) information retrieval systems. In order to achieve this goal, polyrepresentative information retrieval
strategies for cluster browsing and retrieval have been designed, focusing on the evaluation aspect of such strategies.
This thesis addresses the challenge of designing and evaluating an Optimum Clustering Framework (OCF) based model, implementing probabilistic document clustering for interactive IR. Thus, polyrepresentative cluster browsing
strategies have been devised. With these strategies a simulated user based method has been adopted for evaluating the polyrepresentative cluster browsing
and searching strategies.
The proposed approaches are evaluated for information need based polyrepresentative clustering as well as document based polyrepresentation and the combination thereof. For document-based polyrepresentation, the notion of citation
context is exploited, which has special applications in scientometrics and bibliometrics for science literature modelling. The information need polyrepresentation,
on the other hand, utilizes the various aspects of user information need, which is crucial for enhancing the retrieval performance.
Besides describing a probabilistic framework for polyrepresentative document clustering, one of the main fi ndings of this work is that the proposed combination
of the Principle of Polyrepresentation with document clustering has the potential of enhancing the user interactions with an IR system, provided that the various representations of information need and information objects are utilized.
The thesis also explores interactive IR approaches in the context of polyrepresentative interactive information retrieval when it is combined with document clustering methods. Experiments suggest there is a potential in the proposed
cluster-based polyrepresentation approach, since statistically signifi cant improvements were found when comparing the approach to a BM25-based baseline in an ideal scenario. Further marginal improvements were observed when cluster-based re-ranking and cluster-ranking based comparisons were made.
The performance of the approach depends on the underlying information object and information need representations used, which confi rms fi ndings of previous studies where the Principle of Polyrepresentation was applied in diff erent ways
Recommended from our members
Disparity between General Symptom Relief and Remission Criteria in the Positive and Negative Syndrome Scale (PANSS): A Post-treatment Bifactor Item Response Theory Model.
Objective: Total scale scores derived by summing ratings from the 30-item PANSS are commonly used in clinical trial research to measure overall symptom severity, and percentage reductions in the total scores are sometimes used to document the efficacy of treatment. Acknowledging that some patients may have substantial changes in PANSS total scores but still be sufficiently symptomatic to warrant diagnosis, ratings on a subset of 8 items, referred to here as the "Remission set," are sometimes used to determine if patients' symptoms no longer satisfy diagnostic criteria. An unanswered question remains: is the goal of treatment better conceptualized as reduction in overall symptom severity, or reduction in symptoms below the threshold for diagnosis? We evaluated the psychometric properties of PANSS total scores, to assess whether having low symptom severity post-treatment is equivalent to attaining Remission. Design: We applied a bifactor item response theory (IRT) model to post-treatment PANSS ratings of 3,647 subjects diagnosed with schizophrenia assessed at the termination of 11 clinical trials. The bifactor model specified one general dimension to reflect overall symptom severity, and five domain-specific dimensions. We assessed how PANSS item discrimination and information parameters varied across the range of overall symptom severity (Ξ), with a special focus on low levels of symptoms (i.e., Ξ<-1), which we refer to as "Relief" from symptoms. A score of Ξ=-1 corresponds to an expected PANSS item score of 1.83, a rating between "Absent" and "Minimal" for a PANSS symptom. Results: The application of the bifactor IRT model revealed: (1) 88% of total score variation was attributable to variation in general symptom severity, and only 8% reflected secondary domain factors. This implies that a general factor may provide a good indicator of symptom severity, and that interpretation is not overly complicated by multidimensionality; (2) Post-treatment, 534 individuals (about 15% of the whole sample) scored in the "Relief" range of general symptom severity, but more than twice that number (n = 1351) satisfied Remission criteria (37%). 2 in 3 Remitted patients had scores that were not in a low symptom range (corresponding to Absent or Minimal item scores); (3) PANSS items vary greatly in their ability to measure the general symptom severity dimension; while many items are highly discriminating and relatively "pure" indicators of general symptom severity (delusions, conceptual disorganization), others are better indicators of specific dimensions (blunted affect, depression). The utility of a given PANSS item for assessing a patient depended on the illness level of the patient. Conclusion: Satisfying conventional Remission criteria was not strongly associated with low levels of symptoms. The items providing the most information for patients in the symptom Relief range were Delusions, Preoccupation, Suspiciousness Persecution, Unusual Thought Content, Conceptual Disorganization, Stereotyped Thinking, Active Social Avoidance, and Lack of Judgment and Insight. Lower scores on these items (item scores â€2) were strongly associated with having a low latent trait Ξ or experiencing overall symptom relief. The inter-rater agreement between Remission and Relief subjects suggested that these criteria identified different subsets of patients. Alternative subsets of items may offer better indicators of general symptom severity and provide better discrimination (and lower standard errors) for scaling individuals and judging symptom relief, where the "best" subset of items ultimately depends on the illness range and treatment phase being evaluated
People on Drugs: Credibility of User Statements in Health Communities
Online health communities are a valuable source of information for patients
and physicians. However, such user-generated resources are often plagued by
inaccuracies and misinformation. In this work we propose a method for
automatically establishing the credibility of user-generated medical statements
and the trustworthiness of their authors by exploiting linguistic cues and
distant supervision from expert sources. To this end we introduce a
probabilistic graphical model that jointly learns user trustworthiness,
statement credibility, and language objectivity. We apply this methodology to
the task of extracting rare or unknown side-effects of medical drugs --- this
being one of the problems where large scale non-expert data has the potential
to complement expert medical knowledge. We show that our method can reliably
extract side-effects and filter out false statements, while identifying
trustworthy users that are likely to contribute valuable medical information
Tailored deep learning techniques for information retrieval
La recherche d'information vise Ă trouver des documents pertinents par rapport Ă une requĂȘte. Auparavant, de nombreux modĂšles traditionnels de la Recherche d'Informations ont Ă©tĂ© proposĂ©s. Ils essaient soit d'encoder la requĂȘte et les documents en vecteurs dans l'espace des termes et d'estimer la pertinence en calculant la similaritĂ© des deux vecteurs, soit d'estimer la pertinence par des modĂšles probabilistes. Cependant, pour les modĂšles d'espace vectoriel, l'encodage des requĂȘtes et des documents dans l'espace des termes a ses limites: par exemple, il est difficile d'identifier les termes du document qui ont des sens similaires au termes exactes de la requĂȘte. Il est Ă©galement difficile de reprĂ©senter le contenu du texte Ă diffĂ©rents niveaux d'abstraction pouvant correspondre aux besoins diffĂ©rents d'information exprimĂ©s dans des requĂȘtes.
Avec le dĂ©veloppement rapide des techniques d'apprentissage profond, il est possible d'apprendre des reprĂ©sentations utiles Ă travers une sĂ©rie de couches neurones, ce qui ouvre la voie Ă de meilleures reprĂ©sentations dans un espace dense latent plutĂŽt que dans l'espace des termes, ce qui peut aider Ă identifier les termes non exactes mais qui portent les sens similaires. Il nous permet Ă©galement de crĂ©er de diffĂ©rentes couches de reprĂ©sentation pour la requĂȘte et le document, permettant ainsi des correspondances entre la requĂȘte et les documents Ă diffĂ©rents niveaux d'abstractions, ce qui peut mieux rĂ©pondre aux besoins d'informations pour diffĂ©rents types de requĂȘtes. Enfin, les techniques d'apprentissage profond permettent Ă©galement d'apprendre une meilleure fonction d'appariement.
Dans cette thÚse, nous explorons différentes techniques d'apprentissage profond pour traiter ces problÚmes.
Nous Ă©tudions d'abord la construction de plusieurs couches de reprĂ©sentation avec diffĂ©rents niveaux d'abstraction entre la requĂȘte et le document, pour des modĂšles basĂ©s sur la reprĂ©sentation et l'interaction.
Nous proposons ensuite un modĂšle permettant de faire les matchings croisĂ©s des representations entre la requĂȘte et le document sur diffĂ©rentes couches pour mieux rĂ©pondre au besoin de correspondance terme-phrase. Enfin, nous explorons l'apprentissage intĂ©grĂ© d'une fonction de rang et les reprĂ©sentations de la requĂȘte et du document.
Des expériences sur des jeux de données publics ont montré que nos méthods proposées dans cette thÚse sont plus performantes que les méthodes existantes.Information Retrieval aims to find relevant documents to a query. Previously many traditional information retrieval models have been proposed. They either try to encode query and documents into vectors in term space and estimate the relevance by computing the similarity of the two vectors or estimate the relevance by probabilistic models. However for vector space models, encoding query and documents into term space has its limitations: for example, it's difficult to catch terms of similar meanings to the exact query term in the document. It is also difficult to represent the text in a hierarchy of abstractions to better match the information need expressed in the query.
With the fast development of deep learning techniques, it is possible to learn useful representations through a series of neural layers, which paves the way to learn better representations in latent dense space rather the term space, which may help to match the non exact matched but similar terms. It also allows us to create different layers of representation for query and document thereby enabling matchings between query and documents at different levels of abstractions, which may better serve the information needs for different queries. Finally, deep learning techniques also allows to learn better ranking function.
In this thesis, we explore several deep learning techniques to deal with the above problems.
First, we study the effectiveness of building multiple abstraction layers between query and document, for representation- and interaction-based models. Then we propose a model allowing for cross-matching of query and document representations at different layers to better serve the need of term-phrase matching. Finally we propose an integrated learning framework of ranking function and neural features from query and document.
Experiments on public datasets demonstrate that the methods we propose in this thesis are more effective than the existing ones
Customer Ranking Model for Project Businesses: A Case Study from the Automotive Industry
For technology-orientated enterprises that operate project-based businesses, the goal-oriented allocation of scarce marketing resources has great potential to help consolidate their competitive position. An important precondition for goal-oriented management is the identification of the most valuable customers. This enables technology-orientated enterprises to segment markets in order to make tactical marketing decisions. This theory-based paper aims to develop and test a holistic customer ranking model. By deploying the five steps presented in this paper, customer relationship managers are better able to identify and to rank their customers in project-based businesses. A case study provides an example of the application of the method from the automotive industry in Austria. The experiences derived from this case study show that using a customer ranking framework is a crucial factor for enterprises in narrow technology markets to be successful and to achieve their corporate goals
Evaluation Methodologies for Visual Information Retrieval and Annotation
Die automatisierte Evaluation von Informations-Retrieval-Systemen erlaubt
Performanz und QualitÀt der Informationsgewinnung zu bewerten. Bereits in
den 60er Jahren wurden erste Methodologien fĂŒr die system-basierte
Evaluation aufgestellt und in den Cranfield Experimenten ĂŒberprĂŒft.
Heutzutage gehören Evaluation, Test und QualitÀtsbewertung zu einem aktiven
Forschungsfeld mit erfolgreichen Evaluationskampagnen und etablierten
Methoden. Evaluationsmethoden fanden zunÀchst in der Bewertung von
Textanalyse-Systemen Anwendung. Mit dem rasanten Voranschreiten der
Digitalisierung wurden diese Methoden sukzessive auf die Evaluation von
Multimediaanalyse-Systeme ĂŒbertragen. Dies geschah hĂ€ufig, ohne die
Evaluationsmethoden in Frage zu stellen oder sie an die verÀnderten
Gegebenheiten der Multimediaanalyse anzupassen. Diese Arbeit beschÀftigt
sich mit der system-basierten Evaluation von Indizierungssystemen fĂŒr
Bildkollektionen. Sie adressiert drei Problemstellungen der Evaluation von
Annotationen: Nutzeranforderungen fĂŒr das Suchen und Verschlagworten von
Bildern, EvaluationsmaĂe fĂŒr die QualitĂ€tsbewertung von
Indizierungssystemen und Anforderungen an die Erstellung visueller
Testkollektionen. Am Beispiel der Evaluation automatisierter
Photo-Annotationsverfahren werden relevante Konzepte mit Bezug zu
Nutzeranforderungen diskutiert, Möglichkeiten zur Erstellung einer
zuverlÀssigen Ground Truth bei geringem Kosten- und Zeitaufwand vorgestellt
und EvaluationsmaĂe zur QualitĂ€tsbewertung eingefĂŒhrt, analysiert und
experimentell verglichen. Traditionelle MaĂe zur Ermittlung der Performanz
werden in vier Dimensionen klassifiziert. EvaluationsmaĂe vergeben
ĂŒblicherweise binĂ€re Kosten fĂŒr korrekte und falsche Annotationen. Diese
Annahme steht im Widerspruch zu der Natur von Bildkonzepten. Das gemeinsame
Auftreten von Bildkonzepten bestimmt ihren semantischen Zusammenhang und
von daher sollten diese auch im Zusammenhang auf ihre Richtigkeit hin
ĂŒberprĂŒft werden. In dieser Arbeit wird aufgezeigt, wie semantische
Ăhnlichkeiten visueller Konzepte automatisiert abgeschĂ€tzt und in den
Evaluationsprozess eingebracht werden können. Die Ergebnisse der Arbeit
inkludieren ein Nutzermodell fĂŒr die konzeptbasierte Suche von Bildern,
eine vollstĂ€ndig bewertete Testkollektion und neue EvaluationsmaĂe fĂŒr die
anforderungsgerechte QualitÀtsbeurteilung von Bildanalysesystemen.Performance assessment plays a major role in the research on Information
Retrieval (IR) systems. Starting with the Cranfield experiments in the
early 60ies, methodologies for the system-based performance assessment
emerged and established themselves, resulting in an active research field
with a number of successful benchmarking activities. With the rise of the
digital age, procedures of text retrieval evaluation were often transferred
to multimedia retrieval evaluation without questioning their direct
applicability. This thesis investigates the problem of system-based
performance assessment of annotation approaches in generic image
collections. It addresses three important parts of annotation evaluation,
namely user requirements for the retrieval of annotated visual media,
performance measures for multi-label evaluation, and visual test
collections. Using the example of multi-label image annotation evaluation,
I discuss which concepts to employ for indexing, how to obtain a reliable
ground truth to moderate costs, and which evaluation measures are
appropriate. This is accompanied by a thorough analysis of related work on
system-based performance assessment in Visual Information Retrieval (VIR).
Traditional performance measures are classified into four dimensions and
investigated according to their appropriateness for visual annotation
evaluation. One of the main ideas in this thesis adheres to the common
assumption on the binary nature of the score prediction dimension in
annotation evaluation. However, the predicted concepts and the set of true
indexed concepts interrelate with each other. This work will show how to
utilise these semantic relationships for a fine-grained evaluation
scenario. Outcomes of this thesis result in a user model for concept-based
image retrieval, a fully assessed image annotation test collection, and a
number of novel performance measures for image annotation evaluation
Learning to Rank: Online Learning, Statistical Theory and Applications.
Learning to rank is a supervised machine learning problem, where the output space is the special structured space of emph{permutations}. Learning to rank has diverse application areas, spanning information retrieval, recommendation systems, computational biology and others.
In this dissertation, we make contributions to some of the exciting directions of research in learning to rank. In the first part, we extend the classic, online perceptron algorithm for classification to learning to rank, giving a loss bound which is reminiscent of Novikoff's famous convergence theorem for classification. In the second part, we give strategies for learning ranking functions in an online setting, with a novel, feedback model, where feedback is restricted to labels of top ranked items. The second part of our work is divided into two sub-parts; one without side information and one with side information. In the third part, we provide novel generalization error bounds for algorithms applied to various Lipschitz and/or smooth ranking surrogates. In the last part, we apply ranking losses to learn policies for personalized advertisement recommendations, partially overcoming the problem of click sparsity. We conduct experiments on various simulated and commercial datasets, comparing our strategies with baseline strategies for online learning to rank and personalized advertisement recommendation.PhDStatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/133334/1/sougata_1.pd
- âŠ