Search CORE

1,634 research outputs found

Neural Ranking Models with Weak Supervision

Author: Bing Lidong
Bromley Jane
Diaz Fernando
Han Xianpei
Hoffmann Raphael
Huang Po-Sen
Kingma Diederik
Lu Zhengdong
Quoc
Severyn Aliaksei
Shen Yelong
Wauthier Fabian L.
Zamani Hamed
Zamani Hamed
Publication venue
Publication date: 01/01/2017
Field of study

Despite the impressive improvements achieved by unsupervised deep neural networks in computer vision and NLP tasks, such improvements have not yet been observed in ranking for information retrieval. The reason may be the complexity of the ranking problem, as it is not obvious how to learn from queries and documents when no supervised signal is available. Hence, in this paper, we propose to train a neural ranking model using weak supervision, where labels are obtained automatically without human annotators or any external resources (e.g., click data). To this aim, we use the output of an unsupervised ranking model, such as BM25, as a weak supervision signal. We further train a set of simple yet effective ranking models based on feed-forward neural networks. We study their effectiveness under various learning scenarios (point-wise and pair-wise models) and using different input representations (i.e., from encoding query-document pairs into dense/sparse vectors to using word embedding representation). We train our networks using tens of millions of training instances and evaluate it on two standard collections: a homogeneous news collection(Robust) and a heterogeneous large-scale web collection (ClueWeb). Our experiments indicate that employing proper objective functions and letting the networks to learn the input representation based on weakly supervised data leads to impressive performance, with over 13% and 35% MAP improvements over the BM25 model on the Robust and the ClueWeb collections. Our findings also suggest that supervised neural ranking models can greatly benefit from pre-training on large amounts of weakly labeled data that can be easily obtained from unsupervised IR models.Comment: In proceedings of The 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR2017

arXiv.org e-Print Archive

Crossref

UvA-DARE

International Migration, Integration and Social Cohesion online publications

BCS SGAI SMA 2013: the BCS SGAI workshop on social media analysis

Author
Publication venue: M. Jeusfeld
Publication date: 01/01/2013
Field of study

Portsmouth University Research Portal (Pure)

An Exploration of Parliamentary Speeches in the Irish Parliament Using Topic Modeling

Author: Leheny Fiona
Publication venue: Technological University Dublin
Publication date: 01/01/2018
Field of study

The only resource available in the public domain which highlights parliamentary ac tivity is parliamentary questions. Up until the last ten years, manual content analysis was carried out to classify these. More recently, machine learning techniques have been used to automatically classify and analyse these data sets. This study analyses the verbal parliamentary speeches in the Irish Parliament (known as the D´ail) over a ten year period using unsupervised machine learning. It does so by applying a less utilised topic modeling technique, known as Non-negative Matrix Factorisation (NMF), to de tect the latent themes in these speeches. A two-layer dynamic approach using NMF is applied to extract the themes raised in these speeches at a point in time and over the entire period. The ﬁndings suggest that the themes raised vary from very niche subject matter areas to more general areas and have evolved over time. The trend in the topics raised over the entire period give an indication of what the political agenda was during these Da´il terms. Furthermore, reviewing the topics at a party and indi vidual TD level demonstrate what their political priorities are. Conversely, reviewing the topics that parties and TDs are not discussing gives an insight into the themes that they have no interest in

Arrow@TUDublin

Exploring Pattern Mining Algorithms for Hashtag Retrieval Problem

Author: Belhadi Asma
Cano Alberto
Djenouri Youcef
Lin Jerry Chun-Wei
Zhang Chongsheng
Publication venue: VCU Scholars Compass
Publication date: 01/01/2020
Field of study

Hashtag is an iconic feature to retrieve the hot topics of discussion on Twitter or other social networks. This paper incorporates the pattern mining approaches to improve the accuracy of retrieving the relevant information and speeding up the search performance. A novel algorithm called PM-HR (Pattern Mining for Hashtag Retrieval) is designed to first transform the set of tweets into a transactional database by considering two different strategies (trivial and temporal). After that, the set of the relevant patterns is discovered, and then used as a knowledge-based system for finding the relevant tweets based on users\u27 queries under the similarity search process. Extensive results are carried out on large and different tweet collections, and the proposed PM-HR outperforms the baseline hashtag retrieval approaches in terms of runtime, and it is very competitive in terms of accuracy

SINTEF Open

VCU Scholars Compass

NORA - Norwegian Open Research Archives

Exploring Pattern Mining Algorithms for Hashtag Retrieval Problem

Author: Alberto Cano
Asma Belhadi
Chongsheng Zhang
Djenouri Youcef
Jerry Chun-Wei Lin
Publication venue: IEEE
Publication date: 01/01/2020
Field of study

Hashtag is an iconic feature to retrieve the hot topics of discussion on Twitter or other social networks. This paper incorporates the pattern mining approaches to improve the accuracy of retrieving the relevant information and speeding up the search performance. A novel algorithm called PM-HR (Pattern Mining for Hashtag Retrieval) is designed to first transform the set of tweets into a transactional database by considering two different strategies (trivial and temporal). After that, the set of the relevant patterns is discovered, and then used as a knowledge-based system for finding the relevant tweets based on users' queries under the similarity search process. Extensive results are carried out on large and different tweet collections, and the proposed PM-HR outperforms the baseline hashtag retrieval approaches in terms of runtime, and it is very competitive in terms of accuracy.publishedVersio

SINTEF Open

Exploratory search over semi-structured documents

Author: Azarbonyad H.
Publication venue
Publication date: 01/01/2019
Field of study

International Migration, Integration and Social Cohesion online publications

Proceedings of the 9th Dutch-Belgian Information Retrieval Workshop

Author: den Hamer Ida
Publication venue: Centre for Telematics and Information Technology (CTIT)
Publication date: 01/02/2009
Field of study

University of Twente Research Information

Document Meta-Information as Weak Supervision for Machine Translation

Author: Jehl Laura Elisabeth
Publication venue
Publication date: 01/01/2019
Field of study

Data-driven machine translation has advanced considerably since the first pioneering work in the 1990s with recent systems claiming human parity on sentence translation for highresource tasks. However, performance degrades for low-resource domains with no available sentence-parallel training data. Machine translation systems also rarely incorporate the document context beyond the sentence level, ignoring knowledge which is essential for some situations. In this thesis, we aim to address the two issues mentioned above by examining ways to incorporate document-level meta-information into data-driven machine translation. Examples of document meta-information include document authorship and categorization information, as well as cross-lingual correspondences between documents, such as hyperlinks or citations between documents. As this meta-information is much more coarse-grained than reference translations, it constitutes a source of weak supervision for machine translation. We present four cumulatively conducted case studies where we devise and evaluate methods to exploit these sources of weak supervision both in low-resource scenarios where no task-appropriate supervision from parallel data exists, and in a full supervision scenario where weak supervision from document meta-information is used to supplement supervision from sentence-level reference translations. All case studies show improved translation quality when incorporating document meta-information

Heidelberger Dokumentenserver

Multi-National Topics Maps for Parliamentary Debate Analysis

Author: Davis Enno
Mueller Roland M.
Schaal Markus
Publication venue: 'HICSS Conference Office'
Publication date: 03/01/2022
Field of study

In recent years, automated political text processing became an indispensable requirement for providing automatic access to political debate. During the Covid-19 worldwide pandemic, this need became visible not only in social sciences but also in public opinion. We provide a path to operationalize this need in a multi-lingual topic-oriented manner. Using a publicly available data set consisting of parliamentary speeches, we create a novel process pipeline to identify a good reference model and to link national topics to the cross-national topics. We use design science research to create this process pipeline as an artifact

ScholarSpace at University of Hawai'i at Manoa

AIS Electronic Library (AISeL)