6 research outputs found
Vector Space Proximity Based Document Retrieval For Document Embeddings Built By Transformers
Internet publications are staying atop of local and international events, generating hundreds,
sometimes thousands of news articles per day, making it difficult for readers to navigate this stream
of information without assistance. Competition for the reader’s attention has never been greater.
One strategy to keep readers’ attention on a specific article and help them better understand its
content is news recommendation, which automatically provides readers with references to relevant
complementary articles. However, to be effective, news recommendation needs to select from a
large collection of candidate articles only a handful of articles that are relevant yet provide diverse
information.
In this thesis, we propose and experiment with three methods for news recommendation and
evaluate them in the context of the NIST News Track. Our first approach is based on the classic
BM25 information retrieval approach and assumes that relevant articles will share common key-
words with the current article. Our second approach is based on novel document embedding repre-
sentations and uses various proximity measures to retrieve the closest documents. For this approach,
we experimented with a substantial number of models, proximity measures, and hyperparameters,
yielding a total of 47,332 distinct models. Finally, our third approach combines the BM25 and the
embedding models to increase the diversity of the results.
The results on the 2020 TREC News Track show that the performance of the BM25 model
(nDCG@5 of 0.5924) greatly exceeds the TREC median performance (nDCG@5 of 0.5250) and
achieves the highest score at the shared task. The performance of the embedding model alone
(nDCG@5 of 0.4541) is lower than the TREC median and BM25. The performance of the combined
model (nDCG@5 of 0.5873) is rather close to that of the BM25 model; however, an analysis of the
results shows that the recommended articles are different from those proposed by BM25, hence may
constitute a promising approach to reach diversity without much loss in relevance
Big Brother: A Drop-In Website Interaction Logging Service
Fine-grained logging of interactions in user studies is important for studying user behaviour, among other reasons. However, in many research scenarios, the way interactions are logged are usually tied to a monolithic system. We present a generic, application independent service for logging interactions in web-pages, specifically targeting user studies. Our service, Big Brother, can be dropped-in to existing user interfaces with almost no configuration required by researchers. Big Brother has already been used in several user studies to record interactions in a number of user study research scenarios, such as lab-based and crowdsourcing environments. We further demonstrate the ability for Big Brother to scale to very large user studies through benchmarking experiments. Big Brother also provides a number of additional tools for visualising and analysing interactions.
Big Brother significantly lowers the barrier to entry for logging user interactions by providing a minimal but powerful, no configuration necessary, service for researchers and practitioners of user studies that can scale to thousands of concurrent sessions. We have made the source code and releases for Big Brother available for download at https://github.com/hscells/bigbro
Geographic information extraction from texts
A large volume of unstructured texts, containing valuable geographic information, is available online. This information – provided implicitly or explicitly – is useful not only for scientific studies (e.g., spatial humanities) but also for many practical applications (e.g., geographic information retrieval). Although large progress has been achieved in geographic information extraction from texts, there are still unsolved challenges and issues, ranging from methods, systems, and data, to applications and privacy. Therefore, this workshop will provide a timely opportunity to discuss the recent advances, new ideas, and concepts but also identify research gaps in geographic information extraction
Grammar and Corpora 2016
In recent years, the availability of large annotated corpora, together with a new interest in the empirical foundation and validation of linguistic theory and description, has sparked a surge of novel work using corpus methods to study the grammar of natural languages. This volume presents recent developments and advances, firstly, in corpus-oriented grammar research with a special focus on Germanic, Slavic, and Romance languages and, secondly, in corpus linguistic methodology as well as the application of corpus methods to grammar-related fields. The volume results from the sixth international conference Grammar and Corpora (GaC 2016), which took place at the Institute for the German Language (IDS) in Mannheim, Germany, in November 2016.Die Verfügbarkeit großer annotierter und durchsuchbarer Korpora, verbunden mit einem neuerwachten Interesse an der empirischen Grundlegung und Validierung linguistischer Theorie und Beschreibung hat in letzter Zeit zu einer regelrechten Welle interessanter Arbeiten zur Grammatik natürlicher Sprachen geführt. Dieser Band präsentiert zum einen neuere Entwicklungen in der korpusorientierten Forschung zur Grammatik germanischer, romanischer und slawischer Sprachen und zum anderen innovative Ansätze in der einschlägigen korpuslinguistischen Methodologie, die auch Anwendung im Umfeld der Grammatik finden. Der Band fasst die Beiträge der sechsten internationalen Konferenz Grammar and Corpora zusammen, die im November 2016 am Institut für Deutsche Sprache (IDS) in Mannheim stattfand
Recommended from our members
Nostratic Dictionary
A revised edition can be found at http://www.dspace.cam.ac.uk/handle/1810/244080.Aharon Dolgopolsky is the leading authority on the Nostratic macrofamily. His 'Nostratic Dictionary' presented here is, of course, something very much more than a dictionary. It is the most thorough and extensive demonstration and documentation so far of what may be termed the Nostratic hypothesis: that several of the world's best- known language families are related in their origin, their grammar and their lexicon, and that they belong together in a larger unit, of earlier origin, the Nostratic macrofamily. It should at once be noted that several elements of this enterprise are controversial. For while the Nostratic hypothesis has many supporters, it has been criticized on rather fundamental grounds by a number of distinguished linguists. The matter was reviewed some years ago in a symposium held at the McDonald Institute, and positions remain very much polarized. It was a result of that meeting that the decision was taken to invite Aharon Dolgopolsky to publish his Dictionary - a much more substantial treatise than any work hitherto undertaken on the subject - at the McDonald Institute. For it became clear that the diversities of view expressed at that symposium were not likely to be resolved by further polemical exchanges. Instead, a substantial body of data was required, whose examination and evaluation could subsequently lead to more mature judgments. Those data are presented here, and that more mature evaluation can now proceed.McDonald Institute for Archaeological Research
Alfred P. Sloan Foundatio