Search CORE

4,162 research outputs found

Recommended from our members

Learning to Diversify Web Search Results with a Document Repulsion Model

Author: Li Jingfei
Song Dawei
Wang Benyou
Wu Yue
Zhang Peng
Publication venue: 'Elsevier BV'
Publication date: 01/10/2017
Field of study

Search diversification (also called diversity search), is an important approach to tackling the query ambiguity problem in information retrieval. It aims to diversify the search results that are originally ranked according to their probabilities of relevance to a given query, by re-ranking them to cover as many as possible different aspects (or subtopics) of the query. Most existing diversity search models heuristically balance the relevance ranking and the diversity ranking, yet lacking an efficient learning mechanism to reach an optimized parameter setting. To address this problem, we propose a learning-to-diversify approach which can directly optimize the search diversification performance (in term of any effectiveness metric). We first extend the ranking function of a widely used learning-to-rank framework, i.e., LambdaMART, so that the extended ranking function can correlate relevance and diversity indicators. Furthermore, we develop an effective learning algorithm, namely Document Repulsion Model (DRM), to train the ranking function based on a Document Repulsion Theory (DRT). DRT assumes that two result documents covering similar query aspects (i.e., subtopics) should be mutually repulsive, for the purpose of search diversification. Accordingly, the proposed DRM exerts a repulsion force between each pair of similar documents in the learning process, and includes the diversity effectiveness metric to be optimized as part of the loss function. Although there have been existing learning based diversity search methods, they often involve an iterative sequential selection process in the ranking process, which is computationally complex and time consuming for training, while our proposed learning strategy can largely reduce the time cost. Extensive experiments are conducted on the TREC diversity track data (2009, 2010 and 2011). The results demonstrate that our model significantly outperforms a number of baselines in terms of effectiveness and robustness. Further, an efficiency analysis shows that the proposed DRM has a lower computational complexity than the state of the art learning-to-diversify methods

Open Research Online (The Open University)

An effective approach for topicspecific opinion summarization

Author: GAO Wei
LI Binyang
WEI Zhongyu
WONG Kam-Fai
ZHOU Lanjun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/12/2011
Field of study

Institutional Knowledge at Singapore Management University

Document re-ranking using cluster validation and label propagation

Author: Donghong Ji
Guodong Zhou
Guozheng Xiao
Lingpeng Yang
Yu Nie
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2006
Field of study

This paper proposes a novel document re-ranking approach in information retrieval, which is done by a label propagation-based semi-supervised learning algorithm to utilize the intrinsic structure underlying in the large document data. Since no labeled relevant or irrelevant documents are generally available in IR, our approach tries to extract some pseudo labeled documents from the ranking list of the initial retrieval. For pseudo relevant documents, we determine a cluster of documents from the top ones via cluster validation-based k-means clustering; for pseudo irrelevant ones, we pick a set of documents from the bottom ones. Then the ranking of the documents can be conducted via label propagation. Evaluation on benchmark corpora shows that the approach can achieve significant improvement over standard baselines and performs better than other related approaches

CiteSeerX

Crossref

Korean-Chinese Person Name Translation for Cross Language Information Retrieval

Author: Hsu Wen-Lian
Lee Yi-Hsun
Lin Chu-Cheng
Tsai Tzong-Han Richard
Wang Yu-Chun
Publication venue: The Korean Society for Language and Information (KSLI)
Publication date: 01/01/2007
Field of study

PACLIC 21 / Seoul National University, Seoul, Korea / November 1-3, 200

Waseda University Repository

Relating Dependent Terms in Information Retrieval

Author: Shi Lixin
Publication venue
Publication date: 01/11/2015
Field of study

Les moteurs de recherche font partie de notre vie quotidienne. Actuellement, plus d’un tiers de la population mondiale utilise l’Internet. Les moteurs de recherche leur permettent de trouver rapidement les informations ou les produits qu'ils veulent. La recherche d'information (IR) est le fondement de moteurs de recherche modernes. Les approches traditionnelles de recherche d'information supposent que les termes d'indexation sont indépendants. Pourtant, les termes qui apparaissent dans le même contexte sont souvent dépendants. L’absence de la prise en compte de ces dépendances est une des causes de l’introduction de bruit dans le résultat (résultat non pertinents). Certaines études ont proposé d’intégrer certains types de dépendance, tels que la proximité, la cooccurrence, la contiguïté et de la dépendance grammaticale. Dans la plupart des cas, les modèles de dépendance sont construits séparément et ensuite combinés avec le modèle traditionnel de mots avec une importance constante. Par conséquent, ils ne peuvent pas capturer correctement la dépendance variable et la force de dépendance. Par exemple, la dépendance entre les mots adjacents "Black Friday" est plus importante que celle entre les mots "road constructions". Dans cette thèse, nous étudions différentes approches pour capturer les relations des termes et de leurs forces de dépendance. Nous avons proposé des méthodes suivantes: ─ Nous réexaminons l'approche de combinaison en utilisant différentes unités d'indexation pour la RI monolingue en chinois et la RI translinguistique entre anglais et chinois. En plus d’utiliser des mots, nous étudions la possibilité d'utiliser bi-gramme et uni-gramme comme unité de traduction pour le chinois. Plusieurs modèles de traduction sont construits pour traduire des mots anglais en uni-grammes, bi-grammes et mots chinois avec un corpus parallèle. Une requête en anglais est ensuite traduite de plusieurs façons, et un score classement est produit avec chaque traduction. Le score final de classement combine tous ces types de traduction. Nous considérons la dépendance entre les termes en utilisant la théorie d’évidence de Dempster-Shafer. Une occurrence d'un fragment de texte (de plusieurs mots) dans un document est considérée comme représentant l'ensemble de tous les termes constituants. La probabilité est assignée à un tel ensemble de termes plutôt qu’a chaque terme individuel. Au moment d’évaluation de requête, cette probabilité est redistribuée aux termes de la requête si ces derniers sont différents. Cette approche nous permet d'intégrer les relations de dépendance entre les termes. Nous proposons un modèle discriminant pour intégrer les différentes types de dépendance selon leur force et leur utilité pour la RI. Notamment, nous considérons la dépendance de contiguïté et de cooccurrence à de différentes distances, c’est-à-dire les bi-grammes et les paires de termes dans une fenêtre de 2, 4, 8 et 16 mots. Le poids d’un bi-gramme ou d’une paire de termes dépendants est déterminé selon un ensemble des caractères, en utilisant la régression SVM. Toutes les méthodes proposées sont évaluées sur plusieurs collections en anglais et/ou chinois, et les résultats expérimentaux montrent que ces méthodes produisent des améliorations substantielles sur l'état de l'art.Search engine has become an integral part of our life. More than one-third of world populations are Internet users. Most users turn to a search engine as the quick way to finding the information or product they want. Information retrieval (IR) is the foundation for modern search engines. Traditional information retrieval approaches assume that indexing terms are independent. However, terms occurring in the same context are often dependent. Failing to recognize the dependencies between terms leads to noise (irrelevant documents) in the result. Some studies have proposed to integrate term dependency of different types, such as proximity, co-occurrence, adjacency and grammatical dependency. In most cases, dependency models are constructed apart and then combined with the traditional word-based (unigram) model on a fixed importance proportion. Consequently, they cannot properly capture variable term dependency and its strength. For example, dependency between adjacent words “black Friday” is more important to consider than those of between “road constructions”. In this thesis, we try to study different approaches to capture term relationships and their dependency strengths. We propose the following methods for monolingual IR and Cross-Language IR (CLIR): We re-examine the combination approach by using different indexing units for Chinese monolingual IR, then propose the similar method for CLIR. In addition to the traditional method based on words, we investigate the possibility of using Chinese bigrams and unigrams as translation units. Several translation models from English words to Chinese unigrams, bigrams and words are created based on a parallel corpus. An English query is then translated in several ways, each producing a ranking score. The final ranking score combines all these types of translations. We incorporate dependencies between terms in our model using Dempster-Shafer theory of evidence. Every occurrence of a text fragment in a document is represented as a set which includes all its implied terms. Probability is assigned to such a set of terms instead of individual terms. During query evaluation phase, the probability of the set can be transferred to those of the related query, allowing us to integrate language-dependent relations to IR. We propose a discriminative language model that integrates different term dependencies according to their strength and usefulness to IR. We consider the dependency of adjacency and co-occurrence within different distances, i.e. bigrams, pairs of terms within text window of size 2, 4, 8 and 16. The weight of bigram or a pair of dependent terms in the final model is learnt according to a set of features. All the proposed methods are evaluated on several English and/or Chinese collections, and experimental results show these methods achieve substantial improvements over state-of-the-art baselines

Dépôt Institutionnel Numérique

Fisheries marketing systems and consumer preferences in Puttalam District Sri-Lanka

Author: Little D.C.
Murray F.J.
Publication venue: Institute of Aquaculture, University of Stirling
Publication date: 01/01/2000
Field of study

The aim of this study was to understand the current and historic market situation for inland fish and it’s substitutes in order to identify which of the various production opportunities presented by the seasonal tank resource might have greatest relevance for marginal communities in the Dry-zone. Regional and sub-regional market networks for fish and meat products were investigated, ranking and scoring exercises used to characterise consumer demand in rain-fed areas of North West Province and secondary data sources were used to assess historic patterns of demand and supply [PDF contains 57 pages

Aquatic Commons

Search Result Diversification in Short Text Streams

Author: Croft WB
De Rijke M
Liang S
Shen H
Yilmaz E
Publication venue: ASSOC COMPUTING MACHINERY
Publication date: 01/08/2017
Field of study

We consider the problem of search result diversification for streams of short texts. Diversifying search results in short text streams is more challenging than in the case of long documents, as it is difficult to capture the latent topics of short documents. To capture the changes of topics and the probabilities of documents for a given query at a specific time in a short text stream, we propose a dynamic Dirichlet multinomial mixture topic model, called D2M3, as well as a Gibbs sampling algorithm for the inference. We also propose a streaming diversification algorithm, SDA, that integrates the information captured by D2M3 with our proposed modified version of the PM-2 (Proportionality-based diversification Method -- second version) diversification algorithm. We conduct experiments on a Twitter dataset and find that SDA statistically significantly outperforms state-of-the-art non-streaming retrieval methods, plain streaming retrieval methods, as well as streaming diversification methods that use other dynamic topic models

UCL Discovery

Recommender Systems

Author: Adamic
Adomavicius
Agarwal
Albert
Anderson
Arndt
Balabanović
Barabási
Basu
Bell
Berge
Billsus
Blattner
Blei
Blei
Boccaletti
Bollobás
Bollobás
Bollé
Bollé
Bone
Bonhard
Bouchaud
Breiman
Brin
Brynjolfsson
Buckley
Buckley
Burkard
Burke
Burke
Burke
Cacheda
Caldarelli
Campos
Candés
Candés
Carlin
Castellano
Castells
Cattuto
Cattuto
Chebotarev
Chen
Chevalier
Chi Ho Yeung
Cho
Chou
Cimini
Clauset
Claypool
Cooke
Costa
Dellarocas
Dellarocas
Ding
Dorogovtsev
Ellero
Erdös
Esslimani
Euler
Fortunato
Fouss
Franceschet
Gao
Geman
Gemulla
Ghoshal
Golbeck
Goldberg
Goldberg
Goldstein
Griffiths
Grujić
Gualdi
Gualdi
Guo
Gupta
Hagel
Hanely
He
Herlocker
Herlocker
Herlocker
Herr
Hofmann
Hofmann
Holme
Holmes
Hotho
Hu
Huang
Huang
Huang
Hurley
Hwang
Hwang
Jaccard
Jamali
Jansen
Jeh
Jeong
Jia
Jin
Järvelin
Jøsang
Katz
Kendall
Keshavan
Keshavan
Klamt
Klein
Kobsa
Kolda
Kong
Koren
Koren
Koren
Kwak
Laherrère
Lam
Lambiotte
Lambiotte
Lathia
Lathia
Latora
Laureti
Leicht
Leskovec
Liben-Nowell
Linden
Linyuan Lü
Liu
Liu
Liu
Liu
Liu
Liu
Liu
Liu
Liu
Liu
Liu
Lü
Lü
Lü
Lü
Ma
Mantegna
Maslov
Massa
Massa
Matúš Medo
Mcnee
Medo
Medo
Medo
Melville
Mika
Milgram
Min
Mobasher
Moffat
Moreno
Newman
Newman
Newman
Newman
Newman
Newman
Newman
Newman
Newman
Palla
Pan
Pan
Pastor-Satorras
Pastor-Satorras
Pazzani
Pazzani
Pazzani
Phelps
Popescul
Qiu
Quillian
Ravasz
Ren
Resnick
Resnick
Rodgers
Romero
Sabater
Salganik
Salter
Salton
Schafer
Schein
Shang
Shang
Shang
Shang
Shardanand
Si
Simmel
Smyth
Song
Song
Spearman
Stojmirović
Su
Sun
Symeonidis
Symeonidis
Sørensen
Tang
Tao Zhou
Taramasco
Tong
Tribus
Tso
Turner
van Rijsbergen
Vazquez
Vespignani
Vig
Vázquez
Vázquez
Walter
Wang
Wang
Wang
Wasserman
Watts
Watts
Wei
Weibull
Witten
Wu
Xiang
Xuan
Yang
Yao
Yedidia
Yeung
Yeung
Yi-Cheng Zhang
Yin
Yu
Zeng
Zeng
Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhao
Zheng
Zhou
Zhou
Zhou
Zhou
Zhou
Zhou
Zhou
Zhou
Zhou
Zhou
Zi-Ke Zhang
Ziegler
Ziegler
Zlatić
Publication venue: 'Elsevier BV'
Publication date: 06/02/2012
Field of study

The ongoing rapid expansion of the Internet greatly increases the necessity of effective recommender systems for filtering the abundant information. Extensive research for recommender systems is conducted by a broad range of communities including social and computer scientists, physicists, and interdisciplinary researchers. Despite substantial theoretical and practical achievements, unification and comparison of different approaches are lacking, which impedes further advances. In this article, we review recent developments in recommender systems and discuss the major challenges. We compare and evaluate available algorithms and examine their roles in the future developments. In addition to algorithms, physical aspects are described to illustrate macroscopic behavior of recommender systems. Potential impacts and future directions are discussed. We emphasize that recommendation has a great scientific depth and combines diverse research fields which makes it of interests for physicists as well as interdisciplinary researchers.Comment: 97 pages, 20 figures (To appear in Physics Reports

arXiv.org e-Print Archive

Crossref

Aston Publications Explorer

RERO DOC Digital Library