1,634 research outputs found
Neural Ranking Models with Weak Supervision
Despite the impressive improvements achieved by unsupervised deep neural
networks in computer vision and NLP tasks, such improvements have not yet been
observed in ranking for information retrieval. The reason may be the complexity
of the ranking problem, as it is not obvious how to learn from queries and
documents when no supervised signal is available. Hence, in this paper, we
propose to train a neural ranking model using weak supervision, where labels
are obtained automatically without human annotators or any external resources
(e.g., click data). To this aim, we use the output of an unsupervised ranking
model, such as BM25, as a weak supervision signal. We further train a set of
simple yet effective ranking models based on feed-forward neural networks. We
study their effectiveness under various learning scenarios (point-wise and
pair-wise models) and using different input representations (i.e., from
encoding query-document pairs into dense/sparse vectors to using word embedding
representation). We train our networks using tens of millions of training
instances and evaluate it on two standard collections: a homogeneous news
collection(Robust) and a heterogeneous large-scale web collection (ClueWeb).
Our experiments indicate that employing proper objective functions and letting
the networks to learn the input representation based on weakly supervised data
leads to impressive performance, with over 13% and 35% MAP improvements over
the BM25 model on the Robust and the ClueWeb collections. Our findings also
suggest that supervised neural ranking models can greatly benefit from
pre-training on large amounts of weakly labeled data that can be easily
obtained from unsupervised IR models.Comment: In proceedings of The 40th International ACM SIGIR Conference on
Research and Development in Information Retrieval (SIGIR2017
An Exploration of Parliamentary Speeches in the Irish Parliament Using Topic Modeling
The only resource available in the public domain which highlights parliamentary ac tivity is parliamentary questions. Up until the last ten years, manual content analysis was carried out to classify these. More recently, machine learning techniques have been used to automatically classify and analyse these data sets. This study analyses the verbal parliamentary speeches in the Irish Parliament (known as the D´ail) over a ten year period using unsupervised machine learning. It does so by applying a less utilised topic modeling technique, known as Non-negative Matrix Factorisation (NMF), to de tect the latent themes in these speeches. A two-layer dynamic approach using NMF is applied to extract the themes raised in these speeches at a point in time and over the entire period. The ďŹndings suggest that the themes raised vary from very niche subject matter areas to more general areas and have evolved over time. The trend in the topics raised over the entire period give an indication of what the political agenda was during these Da´il terms. Furthermore, reviewing the topics at a party and indi vidual TD level demonstrate what their political priorities are. Conversely, reviewing the topics that parties and TDs are not discussing gives an insight into the themes that they have no interest in
Exploring Pattern Mining Algorithms for Hashtag Retrieval Problem
Hashtag is an iconic feature to retrieve the hot topics of discussion on Twitter or other social networks. This paper incorporates the pattern mining approaches to improve the accuracy of retrieving the relevant information and speeding up the search performance. A novel algorithm called PM-HR (Pattern Mining for Hashtag Retrieval) is designed to first transform the set of tweets into a transactional database by considering two different strategies (trivial and temporal). After that, the set of the relevant patterns is discovered, and then used as a knowledge-based system for finding the relevant tweets based on users\u27 queries under the similarity search process. Extensive results are carried out on large and different tweet collections, and the proposed PM-HR outperforms the baseline hashtag retrieval approaches in terms of runtime, and it is very competitive in terms of accuracy
Exploring Pattern Mining Algorithms for Hashtag Retrieval Problem
Hashtag is an iconic feature to retrieve the hot topics of discussion on Twitter or other social networks. This paper incorporates the pattern mining approaches to improve the accuracy of retrieving the relevant information and speeding up the search performance. A novel algorithm called PM-HR (Pattern Mining for Hashtag Retrieval) is designed to first transform the set of tweets into a transactional database by considering two different strategies (trivial and temporal). After that, the set of the relevant patterns is discovered, and then used as a knowledge-based system for finding the relevant tweets based on users' queries under the similarity search process. Extensive results are carried out on large and different tweet collections, and the proposed PM-HR outperforms the baseline hashtag retrieval approaches in terms of runtime, and it is very competitive in terms of accuracy.publishedVersio
Document Meta-Information as Weak Supervision for Machine Translation
Data-driven machine translation has advanced considerably since the first pioneering work
in the 1990s with recent systems claiming human parity on sentence translation for highresource tasks. However, performance degrades for low-resource domains with no available
sentence-parallel training data. Machine translation systems also rarely incorporate the
document context beyond the sentence level, ignoring knowledge which is essential for
some situations. In this thesis, we aim to address the two issues mentioned above by
examining ways to incorporate document-level meta-information into data-driven machine
translation. Examples of document meta-information include document authorship and
categorization information, as well as cross-lingual correspondences between documents,
such as hyperlinks or citations between documents. As this meta-information is much more
coarse-grained than reference translations, it constitutes a source of weak supervision for
machine translation. We present four cumulatively conducted case studies where we devise
and evaluate methods to exploit these sources of weak supervision both in low-resource
scenarios where no task-appropriate supervision from parallel data exists, and in a full
supervision scenario where weak supervision from document meta-information is used to
supplement supervision from sentence-level reference translations. All case studies show
improved translation quality when incorporating document meta-information
Multi-National Topics Maps for Parliamentary Debate Analysis
In recent years, automated political text processing became an indispensable requirement for providing automatic access to political debate. During the Covid-19 worldwide pandemic, this need became visible not only in social sciences but also in public opinion. We provide a path to operationalize this need in a multi-lingual topic-oriented manner. Using a publicly available data set consisting of parliamentary speeches, we create a novel process pipeline to identify a good reference model and to link national topics to the cross-national topics. We use design science research to create this process pipeline as an artifact
- âŚ