22,346 research outputs found
Structured Translation for Cross-Language Information Retrieval
The paper introduces a query translation model that re ects the structure of the cross-language information retrieval task. The model is based on a structured bilingual dictionary in which the translations of each term are clustered into groups with distinct meanings. Query translation is modeled as a two-stage process, with the system rst determining the intended meaning of a query term and then selecting translations appropriate to that meaning that might appear in the document collection. An implementation of structured translation based on automatic dictionary clustering is described and evaluated by using Chinese queries to retrieve English documents. Structured translation achieved an average precision that was statistically indistinguishable from Pirkola's technique for very short queries, but Pirkola's technique outperformed structured translation on long queries. The paper concludes with some observations on future work to improve retrieval e ectiveness and on other potential uses of structured translation in interactive cross-language retrieval applications. 1
Building user interest profiles from wikipedia clusters
Users of search systems are often reluctant to explicitly build profiles to indicate their search interests. Thus automatically building user profiles is an important research area for personalized search. One difficult component of doing this is accessing a knowledge system which provides broad coverage of user search interests. In this work, we describe a
method to build category id based user profiles from a user's
historical search data. Our approach makes significant use
of Wikipedia as an external knowledge resource
Growing Story Forest Online from Massive Breaking News
We describe our experience of implementing a news content organization system
at Tencent that discovers events from vast streams of breaking news and evolves
news story structures in an online fashion. Our real-world system has distinct
requirements in contrast to previous studies on topic detection and tracking
(TDT) and event timeline or graph generation, in that we 1) need to accurately
and quickly extract distinguishable events from massive streams of long text
documents that cover diverse topics and contain highly redundant information,
and 2) must develop the structures of event stories in an online manner,
without repeatedly restructuring previously formed stories, in order to
guarantee a consistent user viewing experience. In solving these challenges, we
propose Story Forest, a set of online schemes that automatically clusters
streaming documents into events, while connecting related events in growing
trees to tell evolving stories. We conducted extensive evaluation based on 60
GB of real-world Chinese news data, although our ideas are not
language-dependent and can easily be extended to other languages, through
detailed pilot user experience studies. The results demonstrate the superior
capability of Story Forest to accurately identify events and organize news text
into a logical structure that is appealing to human readers, compared to
multiple existing algorithm frameworks.Comment: Accepted by CIKM 2017, 9 page
Concept Extraction and Clustering for Topic Digital Library Construction
This paper is to introduce a new approach to build
topic digital library using concept extraction and
document clustering. Firstly, documents in a special
domain are automatically produced by document
classification approach. Then, the keywords of each
document are extracted using the machine learning
approach. The keywords are used to cluster the
documents subset. The clustered result is the taxonomy
of the subset. Lastly, the taxonomy is modified to the
hierarchical structure for user navigation by manual
adjustments. The topic digital library is constructed
after combining the full-text retrieval and hierarchical
navigation function
Large scale evaluations of multimedia information retrieval: the TRECVid experience
Information Retrieval is a supporting technique which underpins a broad range of content-based applications including retrieval, filtering, summarisation, browsing, classification, clustering, automatic linking, and others. Multimedia information retrieval (MMIR) represents those applications when applied to multimedia information such as image, video, music, etc. In this presentation and extended abstract we are primarily concerned with MMIR as applied to information in digital video format. We begin with a brief overview of large scale evaluations of IR tasks in areas such as text, image and music, just to illustrate that this phenomenon is not just restricted to MMIR on video. The main contribution, however, is a set of pointers and a summarisation of the work done as part of TRECVid, the annual benchmarking exercise for video retrieval tasks
- …