6,722 research outputs found
Searching Spontaneous Conversational Speech
The ACM SIGIR Workshop on Searching Spontaneous Conversational Speech was held as part of the 2007 ACM SIGIR Conference in Amsterdam.\ud
The workshop program was a mix of elements, including a keynote speech, paper presentations and panel discussions. This brief report describes the organization of this workshop and summarizes the discussions
Report on the Second International Workshop on the Evaluation on Collaborative Information Seeking and Retrieval (ECol'2017 @ CHIIR)
The 2nd workshop on the evaluation of collaborative information retrieval and seeking (ECol) was held in conjunction with the ACM SIGIR Conference on Human Information Interaction & Retrieval (CHIIR) in Oslo, Norway. The workshop focused on discussing the challenges and difficulties of researching and studying collaborative information retrieval and seeking (CIS/CIR). After an introductory and scene setting overview of developments in CIR/CIS, participants were challenged with devising a range of possible CIR/CIS tasks that could be used for evaluation purposes. Through the brainstorming and discussions, valuable insights regarding the evaluation of CIR/CIS tasks become apparent ? for particular tasks efficiency and/or effectiveness is most important, however for the majority of tasks the success and quality of outcomes along with knowledge sharing and sense-making were most important ? of which these latter attributes are much more difficult to measure and evaluate. Thus the major challenge for CIR/CIS research is to develop methods, measures and methodologies to evaluate these high order attributes
Lucene4IR: Developing information retrieval evaluation resources using Lucene
The workshop and hackathon on developing Information Retrieval Evaluation Resources using Lucene (L4IR) was held on the 8th and 9th of September, 2016 at the University of Strathclyde in Glasgow, UK and funded by the ESF Elias Network. The event featured three main elements: (i) a series of keynote and invited talks on industry, teaching and evaluation; (ii) planning, coding and hacking where a number of groups created modules and infrastructure to use Lucene to undertake TREC based evaluations; and (iii) a number of breakout groups discussing challenges, opportunities and problems in bridging the divide between academia and industry, and how we can use Lucene for teaching and learning Information Retrieval (IR). The event was composed of a mix and blend of academics, experts and students wanting to learn, share and create evaluation resources for the community. The hacking was intense and the discussions lively creating the basis of many useful tools but also raising numerous issues. It was clear that by adopting and contributing to most widely used and supported Open Source IR toolkit, there were many benefits for academics, students, researchers, developers and practitioners - providing a basis for stronger evaluation practices, increased reproducibility, more efficient knowledge transfer, greater collaboration between academia and industry, and shared teaching and training resources
From Frequency to Meaning: Vector Space Models of Semantics
Computers understand very little of the meaning of human language. This
profoundly limits our ability to give instructions to computers, the ability of
computers to explain their actions to us, and the ability of computers to
analyse and process text. Vector space models (VSMs) of semantics are beginning
to address these limits. This paper surveys the use of VSMs for semantic
processing of text. We organize the literature on VSMs according to the
structure of the matrix in a VSM. There are currently three broad classes of
VSMs, based on term-document, word-context, and pair-pattern matrices, yielding
three classes of applications. We survey a broad range of applications in these
three categories and we take a detailed look at a specific open source project
in each category. Our goal in this survey is to show the breadth of
applications of VSMs for semantics, to provide a new perspective on VSMs for
those who are already familiar with the area, and to provide pointers into the
literature for those who are less familiar with the field
Towards Query Logs for Privacy Studies: On Deriving Search Queries from Questions
Translating verbose information needs into crisp search queries is a
phenomenon that is ubiquitous but hardly understood. Insights into this process
could be valuable in several applications, including synthesizing large
privacy-friendly query logs from public Web sources which are readily available
to the academic research community. In this work, we take a step towards
understanding query formulation by tapping into the rich potential of community
question answering (CQA) forums. Specifically, we sample natural language (NL)
questions spanning diverse themes from the Stack Exchange platform, and conduct
a large-scale conversion experiment where crowdworkers submit search queries
they would use when looking for equivalent information. We provide a careful
analysis of this data, accounting for possible sources of bias during
conversion, along with insights into user-specific linguistic patterns and
search behaviors. We release a dataset of 7,000 question-query pairs from this
study to facilitate further research on query understanding.Comment: ECIR 2020 Short Pape
Neural Vector Spaces for Unsupervised Information Retrieval
We propose the Neural Vector Space Model (NVSM), a method that learns
representations of documents in an unsupervised manner for news article
retrieval. In the NVSM paradigm, we learn low-dimensional representations of
words and documents from scratch using gradient descent and rank documents
according to their similarity with query representations that are composed from
word representations. We show that NVSM performs better at document ranking
than existing latent semantic vector space methods. The addition of NVSM to a
mixture of lexical language models and a state-of-the-art baseline vector space
model yields a statistically significant increase in retrieval effectiveness.
Consequently, NVSM adds a complementary relevance signal. Next to semantic
matching, we find that NVSM performs well in cases where lexical matching is
needed.
NVSM learns a notion of term specificity directly from the document
collection without feature engineering. We also show that NVSM learns
regularities related to Luhn significance. Finally, we give advice on how to
deploy NVSM in situations where model selection (e.g., cross-validation) is
infeasible. We find that an unsupervised ensemble of multiple models trained
with different hyperparameter values performs better than a single
cross-validated model. Therefore, NVSM can safely be used for ranking documents
without supervised relevance judgments.Comment: TOIS 201
- …