Search CORE

7,900 research outputs found

From Frequency to Meaning: Vector Space Models of Semantics

Author: Pantel Patrick
Turney Peter D.
Publication venue: 'AI Access Foundation'
Publication date: 01/01/2010
Field of study

Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are beginning to address these limits. This paper surveys the use of VSMs for semantic processing of text. We organize the literature on VSMs according to the structure of the matrix in a VSM. There are currently three broad classes of VSMs, based on term-document, word-context, and pair-pattern matrices, yielding three classes of applications. We survey a broad range of applications in these three categories and we take a detailed look at a specific open source project in each category. Our goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs for those who are already familiar with the area, and to provide pointers into the literature for those who are less familiar with the field

arXiv.org e-Print Archive

CiteSeerX

NRC Publications Archive

Crossref

Synchronous collaborative information retrieval: techniques and evaluation

Author: A.F. Smeaton
I.J. Aalbersberg
J. Pickens
M.R. Morris
N. Craswell
R.W. White
S.E. Robertson
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Synchronous Collaborative Information Retrieval refers to systems that support multiple users searching together at the same time in order to satisfy a shared information need. To date most SCIR systems have focussed on providing various awareness tools in order to enable collaborating users to coordinate the search task. However, requiring users to both search and coordinate the group activity may prove too demanding. On the other hand without effective coordination policies the group search may not be effective. In this paper we propose and evaluate novel system-mediated techniques for coordinating a group search. These techniques allow for an effective division of labour across the group whereby each group member can explore a subset of the search space.We also propose and evaluate techniques to support automated sharing of knowledge across searchers in SCIR, through novel collaborative and complementary relevance feedback techniques. In order to evaluate these techniques, we propose a framework for SCIR evaluation based on simulations. To populate these simulations we extract data from TREC interactive search logs. This work represent the first simulations of SCIR to date and the first such use of this TREC data

CiteSeerX

Crossref

Irish Universities

DCU Online Research Access Service

mARC: Memory by Association and Reinforcement of Contexts

Author: Descourt Patrice
Rimoux Norbert
Publication venue
Publication date: 10/12/2013
Field of study

This paper introduces the memory by Association and Reinforcement of Contexts (mARC). mARC is a novel data modeling technology rooted in the second quantization formulation of quantum mechanics. It is an all-purpose incremental and unsupervised data storage and retrieval system which can be applied to all types of signal or data, structured or unstructured, textual or not. mARC can be applied to a wide range of information clas-sification and retrieval problems like e-Discovery or contextual navigation. It can also for-mulated in the artificial life framework a.k.a Conway "Game Of Life" Theory. In contrast to Conway approach, the objects evolve in a massively multidimensional space. In order to start evaluating the potential of mARC we have built a mARC-based Internet search en-gine demonstrator with contextual functionality. We compare the behavior of the mARC demonstrator with Google search both in terms of performance and relevance. In the study we find that the mARC search engine demonstrator outperforms Google search by an order of magnitude in response time while providing more relevant results for some classes of queries

arXiv.org e-Print Archive

CiteSeerX

A framework for clustering and adaptive topic tracking on evolving text and social media data streams.

Author: Nutakki Gopi Chand
Publication venue: ThinkIR: The University of Louisville\u27s Institutional Repository
Publication date: 01/12/2017
Field of study

Recent advances and widespread usage of online web services and social media platforms, coupled with ubiquitous low cost devices, mobile technologies, and increasing capacity of lower cost storage, has led to a proliferation of Big data, ranging from, news, e-commerce clickstreams, and online business transactions to continuous event logs and social media expressions. These large amounts of online data, often referred to as data streams, because they get generated at extremely high throughputs or velocity, can make conventional and classical data analytics methodologies obsolete. For these reasons, the issues of management and analysis of data streams have been researched extensively in recent years. The special case of social media Big Data brings additional challenges, particularly because of the unstructured nature of the data, specifically free text. One classical approach to mine text data has been Topic Modeling. Topic Models are statistical models that can be used for discovering the abstract ``topics\u27\u27 that may occur in a corpus of documents. Topic models have emerged as a powerful technique in machine learning and data science, providing a great balance between simplicity and complexity. They also provide sophisticated insight without the need for real natural language understanding. However they have not been designed to cope with the type of text data that is abundant on social media platforms, but rather for traditional medium size corpora consisting of longer documents, adhering to a specific language and typically spanning a stable set of topics. Unlike traditional document corpora, social media messages tend to be very short, sparse, noisy, and do not adhere to a standard vocabulary, linguistic patterns, or stable topic distributions. They are also generated at high velocity that impose high demands on topic modeling; and their evolving or dynamic nature, makes any set of results from topic modeling quickly become stale in the face of changes in the textual content and topics discussed within social media streams. In this dissertation, we propose an integrated topic modeling framework built on top of an existing stream-clustering framework called Stream-Dashboard, which can extract, isolate, and track topics over any given time period. In this new framework, Stream Dashboard first clusters the data stream points into homogeneous groups. Then data from each group is ushered to the topic modeling framework which extracts finer topics from the group. The proposed framework tracks the evolution of the clusters over time to detect milestones corresponding to changes in topic evolution, and to trigger an adaptation of the learned groups and topics at each milestone. The proposed approach to topic modeling is different from a generic Topic Modeling approach because it works in a compartmentalized fashion, where the input document stream is split into distinct compartments, and Topic Modeling is applied on each compartment separately. Furthermore, we propose extensions to existing topic modeling and stream clustering methods, including: an adaptive query reformulation approach to help focus on the topic discovery with time; a topic modeling extension with adaptive hyper-parameter and with infinite vocabulary; an adaptive stream clustering algorithm incorporating the automated estimation of dynamic, cluster-specific temporal scales for adaptive forgetting to help facilitate clustering in a fast evolving data stream. Our experimental results show that the proposed adaptive forgetting clustering algorithm can mine better quality clusters; that our proposed compartmentalized framework is able to mine topics of better quality compared to competitive baselines; and that the proposed framework can automatically adapt to focus on changing topics using the proposed query reformulation strategy

University of Louisville

Explicit diversification of event aspects for temporal summarization

Author: Macdonald Craig
McCreadie Richard
Ounis Iadh
Santos Rodrygo L.T.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/04/2018
Field of study

During major events, such as emergencies and disasters, a large volume of information is reported on newswire and social media platforms. Temporal summarization (TS) approaches are used to automatically produce concise overviews of such events by extracting text snippets from related articles over time. Current TS approaches rely on a combination of event relevance and textual novelty for snippet selection. However, for events that span multiple days, textual novelty is often a poor criterion for selecting snippets, since many snippets are textually unique but are semantically redundant or non-informative. In this article, we propose a framework for the diversification of snippets using explicit event aspects, building on recent works in search result diversification. In particular, we first propose two techniques to identify explicit aspects that a user might want to see covered in a summary for different types of event. We then extend a state-of-the-art explicit diversification framework to maximize the coverage of these aspects when selecting summary snippets for unseen events. Through experimentation over the TREC TS 2013, 2014, and 2015 datasets, we show that explicit diversification for temporal summarization significantly outperforms classical novelty-based diversification, as the use of explicit event aspects reduces the amount of redundant and off-topic snippets returned, while also increasing summary timeliness

Enlighten

Interactive information retrieval

Author: Allan
Barry
Bates
Beaulieu
Beaulieu
Belkin
Belkin
Bhavnani
Blair
Borgman
Borgman
Brajnik
Broder
Buyukkokten
Byström
Campbell
Case
Chen
Cove
Crestani
Crouch
Downie
Dumais
Eastman
Efthimiadis
Ellis
Ellis
Fidel
Ford
Ford
Foster
Fox
Hansen
Harper
Hearst
Hearst
Hearst
Heinström
Hill
Ingwersen
Ingwersen
Jansen
Jansen
Jones
Jones
Kang
Kelly
Kelly
Kim
Konstan
Kruschwitz
Kuhlthau
Legg
Lin
Lin
Lorigo
Lynch
López-Ostenero
Maña-López
Niemi
Norman
Over
Pirkola
Pu
Radev
Reid
Reid
Riedl
Rieh
Robertson
Rosenfeld
Roussinov
Ruthven
Ruthven
Savolainen
Shipman
Shneiderman
Sihvonen
Slone
Smeaton
Spink
Spink
Spink
Spink
Spink
Spink
Spärck Jones
Spärck Jones
Sweeney
Tombros
Tombros
Toms
Topi
Topi
Vakkari
Vakkari
Vakkari
Vakkari
van der Eijk
Vechtomova
Voorhees
White
White
White
White
Wiesman
Wu
Xie
Publication venue: 'Wiley'
Publication date: 01/11/2008
Field of study

Crossref

University of Strathclyde Institutional Repository

CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap

Author: Bardeli Rolf
Boujemaa Nozha
Compañó Ramón
Doch Christoph
Geurts Joost
Gouraud Henri
Joly Alexis
Karlgren Jussi
King Paul
Kompatsiaris Yiannis
Köhler Joachim
Le Moine Jean-Yves
Ortgies Robert
Point Jean-Charles
Rotenberg Boris
Rudström Åsa
Schreer Oliver
Sebe Nicu
Snoek Cees
Publication venue: Chorus Project Consortium
Publication date: 01/01/2008
Field of study

After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in multimedia search engines, we have identified and analyzed gaps within European research effort during our second year. In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio- economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal challenges

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive