Search CORE

160 research outputs found

In Memoriam: Karen Sparck Jones

Author: B. Grosz
J. Allan
K. Spärck
K. Spärck Jones
K. Spärck Jones
K. Spärck Jones
K. Spärck Jones
K. Spärck Jones
K. Spärck Jones
K. Spärck Jones
K. Spärck Jones
K. Spärck Jones Synonymy
Peter Willett
Stephen Robertson
Publication venue: 'SAGE Publications'
Publication date: 20/08/2007
Field of study

Crossref

White Rose Research Online

T ${}^2$ K ${}^2$ : The Twitter Top-K Keywords Benchmark

Author: A Guille
AE Gattiker
CD Manning
D Kılınç
DD Lewis
F Ravat
J Darmont
J Ferrarons
J Gray
J O’Shea
JD Cooper
K Spärck Jones
K Spärck Jones
L Wang
S Bringay
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/09/2017
Field of study

Information retrieval from textual data focuses on the construction of vocabularies that contain weighted term tuples. Such vocabularies can then be exploited by various text analysis algorithms to extract new knowledge, e.g., top-k keywords, top-k documents, etc. Top-k keywords are casually used for various purposes, are often computed on-the-fly, and thus must be efficiently computed. To compare competing weighting schemes and database implementations, benchmarking is customary. To the best of our knowledge, no benchmark currently addresses these problems. Hence, in this paper, we present a top-k keywords benchmark, T

{}^2

{}^2

, which features a real tweet dataset and queries with various complexities and selectivities. T

{}^2

{}^2

helps evaluate weighting schemes and database implementations in terms of computing performance. To illustrate T

{}^2

{}^2

's relevance and genericity, we successfully performed tests on the TF-IDF and Okapi BM25 weighting schemes, on one hand, and on different relational (Oracle, PostgreSQL) and document-oriented (MongoDB) database implementations, on the other hand

arXiv.org e-Print Archive

Crossref

HAL

Hal-Diderot

The scholarly impact of TRECVid (2003-2009)

Author: Asknes
Bar-Ilan
Bornmann
Enser
Enser
Freyne
Popper
Price
Robertson
Spärck-Jones
Sugimoto
van Rijsbergen
Wainer
Publication venue
Publication date: 22/02/2011
Field of study

This paper reports on an investigation into the scholarly impact of the TRECVid (TREC Video Retrieval Evaluation) benchmarking conferences between 2003 and 2009. The contribution of TRECVid to research in video retrieval is assessed by analyzing publication content to show the development of techniques and approaches over time and by analyzing publication impact through publication numbers and citation analysis. Popular conference and journal venues for TRECVid publications are identified in terms of number of citations received. For a selection of participants at different career stages, the relative importance of TRECVid publications in terms of citations vis a vis their other publications is investigated. TRECVid, as an evaluation conference, provides data on which research teams ‘scored’ highly against the evaluation criteria and the relationship between ‘top scoring’ teams at TRECVid and the ‘top scoring’ papers in terms of citations is analysed. A strong relationship was found between ‘success’ at TRECVid and ‘success’ at citations both for high scoring and low scoring teams. The implications of the study in terms of the value of TRECVid as a research activity, and the value of bibliometric analysis as a research evaluation tool, are discussed

Crossref

Research Repository UCD

Irish Universities

DCU Online Research Access Service

Automatic identification methods on a corpus of twenty five fine-grained Arabic dialects

Author: J Li
JC Watson
K Spärck Jones
OF Zaidan
S Hochreiter
S Kullback
S Malmasi
Subarno Pal
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 04/10/2019
Field of study

International audienceThis research deals with Arabic dialect identification, a challenging issue related to Arabic NLP. Indeed, the increasing use of Arabic dialects in a written form especially in social media generates new needs in the area of Arabic dialect processing. For discriminating between dialects in a multi-dialect context, we use different approaches based on machine learning techniques. To this end, we explored several methods. We used a classification method based on symmetric Kullback-Leibler, and we experimented classical classification methods such as Naive Bayes Classifiers and more sophisticated methods like Word2Vec and Long Short-Term Memory neural network. We tested our approaches on a large database of 25 Arabic dialects in addition to MSA

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot

Evaluation of a Bayesian inference network for ligand-based virtual screening

Author: A Abdo
A Bender
AG Maldonado
AN Jain
AR Leach
AR Leach
Beining Chen
Christoph Mueller
CX Zhai
D Metzler
EJ Gardiner
EM Voorhees
G Salton
GW Bemis
H Eckert
H Turtle
J Bajorath
J Hert
J Hert
J-F Truchon
JA Grant
JD Holliday
JP Callan
JP Callan
JR Fischer
K Spärck Jones
K Spärck Jones
N Nikolova
P Prathipati
P Willett
P Willett
P Willett
P Willett
P Willett
Peter Willett
RC Glen
RD Brown
RP Sheridan
RP Sheridan
S Siegel
SJ Edgar
T Lengauer
T Strohman
TI Oprea
WR Greiff
X Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Background Bayesian inference networks enable the computation of the probability that an event will occur. They have been used previously to rank textual documents in order of decreasing relevance to a user-defined query. Here, we modify the approach to enable a Bayesian inference network to be used for chemical similarity searching, where a database is ranked in order of decreasing probability of bioactivity. Results Bayesian inference networks were implemented using two different types of network and four different types of belief function. Experiments with the MDDR and WOMBAT databases show that a Bayesian inference network can be used to provide effective ligand-based screening, especially when the active molecules being sought have a high degree of structural homogeneity; in such cases, the network substantially out-performs a conventional, Tanimoto-based similarity searching system. However, the effectiveness of the network is much less when structurally heterogeneous sets of actives are being sought. Conclusion A Bayesian inference network provides an interesting alternative to existing tools for ligand-based virtual screening

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

White Rose Research Online

Discovery of Novel Term Associations in a Document Collection

Author: G. Salton
H.P. Luhn
I. Petrič
Jim Cowie
K. Spärck Jones
M. Segond
M.F. Porter
R.L. Cilibrasi
S. Deerwester
Satanjeev Banerjee
T. Kötter
Publication venue: Springer-Verlag
Publication date: 01/01/2012
Field of study

Non peer reviewe

Crossref

Springer - Publisher Connector

Helsingin yliopiston digitaalinen arkisto

Interactive information retrieval

Author: Allan
Barry
Bates
Beaulieu
Beaulieu
Belkin
Belkin
Bhavnani
Blair
Borgman
Borgman
Brajnik
Broder
Buyukkokten
Byström
Campbell
Case
Chen
Cove
Crestani
Crouch
Downie
Dumais
Eastman
Efthimiadis
Ellis
Ellis
Fidel
Ford
Ford
Foster
Fox
Hansen
Harper
Hearst
Hearst
Hearst
Heinström
Hill
Ingwersen
Ingwersen
Jansen
Jansen
Jones
Jones
Kang
Kelly
Kelly
Kim
Konstan
Kruschwitz
Kuhlthau
Legg
Lin
Lin
Lorigo
Lynch
López-Ostenero
Maña-López
Niemi
Norman
Over
Pirkola
Pu
Radev
Reid
Reid
Riedl
Rieh
Robertson
Rosenfeld
Roussinov
Ruthven
Ruthven
Savolainen
Shipman
Shneiderman
Sihvonen
Slone
Smeaton
Spink
Spink
Spink
Spink
Spink
Spink
Spärck Jones
Spärck Jones
Sweeney
Tombros
Tombros
Toms
Topi
Topi
Vakkari
Vakkari
Vakkari
Vakkari
van der Eijk
Vechtomova
Voorhees
White
White
White
White
Wiesman
Wu
Xie
Publication venue: 'Wiley'
Publication date: 01/11/2008
Field of study

Crossref

University of Strathclyde Institutional Repository

An Arabic Corpus of Fake News: Collection, Analysis and Classification

Author: A Zubiaga
D Lazer
JR Quinlan
K Spärck Jones
N Chomsky
R Procter
V Vapnik
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/10/2019
Field of study

International audienceOver the last years, with the explosive growth of social media, huge amounts of rumors have been rapidly spread on the internet. Indeed, the proliferation of malicious misinformation and nasty rumors in social media can have harmful effects on individuals and society. In this paper, we investigate the content of the fake news in the Arabic world through the information posted on YouTube. Our contribution is threefold. First, we introduce a novel Arab corpus for the task of fake news analysis, covering the topics most concerned by rumors. We describe the corpus and the data collection process in detail. Second, we present several exploratory analysis on the harvested data in order to retrieve some useful knowledge about the transmission of rumors for the studied topics. Third, we test the possibility of discrimination between rumor and no rumor comments using three machine learning classifiers namely, Support Vector Machine (SVM), Decision Tree (DT) and Multinomial Naïve Bayes (MNB)

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot

Bias-variance analysis in estimating true query model for information retrieval

Author: Amati
Banks
Bishop
Collins-Thompson
Dawei Song
Duda
Geman
Hofmann
Jun Wang
Karimzadehgan
Lafferty
Li
Maron
Peng Zhang
Perlich
Robertson
Spärck Jones
Valentini
Yuexian Hou
Zhang
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

The estimation of query model is an important task in language modeling (LM) approaches to information retrieval (IR). The ideal estimation is expected to be not only effective in terms of high mean retrieval performance over all queries, but also stable in terms of low variance of retrieval performance across different queries. In practice, however, improving effectiveness can sacrifice stability, and vice versa. In this paper, we propose to study this tradeoff from a new perspective, i.e., the bias-variance tradeoff, which is a fundamental theory in statistics. We formulate the notion of bias-variance regarding retrieval performance and estimation quality of query models. We then investigate several estimated query models, by analyzing when and why the bias-variance tradeoff will occur, and how the bias and variance can be reduced simultaneously. A series of experiments on four TREC collections have been conducted to systematically evaluate our bias-variance analysis. Our approach and results will potentially form an analysis framework and a novel evaluation strategy for query language modeling

CiteSeerX

Crossref

Open Research Online (The Open University)

Automatic summarisation: 25 years On

Author: Barzilay
Boguraev
Borko
Brill
Carenini
Cleveland
Cohan
Constantin Orăsan
DeJong
Feiguina
Harnly
Hirschman
Hovy
Hovy
Hovy
Kintsch
Li
Lin
Liu
Mani
Mani
Mani
Mann
Marcu
Marcu
Mitkov
Nenkova
Russell
Saggion
Spärck-Jones
Spärck-Jones
Teufel
Tucker
Zhou
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 23/08/2019
Field of study

This is an accepted manuscript of an article published by Cambridge University Press (CUP) in Natural Language Engineering on 19/09/2019, available online: https://doi.org/10.1017/S1351324919000524 The accepted version of the publication may differ from the final published version.Automatic text summarisation is a topic that has been receiving attention from the research community from the early days of computational linguistics, but it really took off around 25 years ago. This article presents the main developments from the last 25 years. It starts by defining what a summary is and how its definition changed over time as a result of the interest in processing new types of documents. The article continues with a brief history of the field and highlights the main challenges posed by the evaluation of summaries. The article finishes with some thoughts about the future of the field.Published onlin

Crossref

Wolverhampton Intellectual Repository and E-theses