278,757 research outputs found
The State-of-the-arts in Focused Search
The continuous influx of various text data on the Web requires search engines to improve their retrieval abilities for more specific information. The need for relevant results to a user’s topic of interest has gone beyond search for domain or type specific documents to more focused result (e.g. document fragments or answers to a query). The introduction of XML provides a format standard for data representation, storage, and exchange. It helps focused search to be carried out at different granularities of a structured document with XML markups. This report aims at reviewing the state-of-the-arts in focused search, particularly techniques for topic-specific document retrieval, passage retrieval, XML retrieval, and entity ranking. It is concluded with highlight of open problems
Phase Transitions in Phase Retrieval
Consider a scenario in which an unknown signal is transformed by a known
linear operator, and then the pointwise absolute value of the unknown output
function is reported. This scenario appears in several applications, and the
goal is to recover the unknown signal -- this is called phase retrieval. Phase
retrieval has been a popular subject of research in the last few years, both in
determining whether complete information is available with a given linear
operator, and in finding efficient and stable phase retrieval algorithms in the
cases where complete information is available. Interestingly, there are a few
ways to measure information completeness, and each way appears to be governed
by a phase transition of sorts. This chapter will survey the state of the art
with some of these phase transitions, and identify a few open problems for
further research.Comment: Book chapter, survey of recent literature, submitted to Excursions in
Harmonic Analysis: The February Fourier Talks at the Norbert Wiener Cente
Modeling Spoken Information Queries for Virtual Assistants: Open Problems, Challenges and Opportunities
Virtual assistants are becoming increasingly important speech-driven
Information Retrieval platforms that assist users with various tasks.
We discuss open problems and challenges with respect to modeling spoken
information queries for virtual assistants, and list opportunities where
Information Retrieval methods and research can be applied to improve the
quality of virtual assistant speech recognition.
We discuss how query domain classification, knowledge graphs and user
interaction data, and query personalization can be helpful to improve the
accurate recognition of spoken information domain queries. Finally, we also
provide a brief overview of current problems and challenges in speech
recognition.Comment: SIGIR '23. The 46th International ACM SIGIR Conference on Research &
Development in Information Retrieva
Metadata Augmentation for Semantic- and Context- Based Retrieval of Digital Cultural Objects
Cultural objects are increasingly stored and generated in digital form, yet effective methods for their indexing and retrieval still remain an open area of research. The main problem arises from the disconnection between the content-based indexing approach used by computer scientists and the description-based approach used by information scientists. There is also a lack of representational schemes that allow the alignment of the semantics and context with keywords and low-level features that can be automatically extracted from the content of these cultural objects. This paper presents an integrated approach to address these problems, taking advantage of both computer science and information science approaches. The focus is on the rationale and conceptual design of the system and its various components. In particular, we discuss techniques for augmenting commonly used metadata with visual features and domain knowledge to generate high-level abstract metadata which in turn can be used for semantic and context-based indexing and retrieval. We use a sample collection of Vietnamese traditional woodcuts to demonstrate the usefulness of this approach
La cibermetría en la recuperación de información en el Web
The exponential growth of web and distributed data characteristics, high volatility, unstructured data, redundant and highly heterogeneous, have introduced new problems in information retrieval processes. Therefore it is necessary to open new avenue of research that allow us to obtain good levels of accuracy. The papers are based on exploiting the hypertext features of the site is reaching great fame. The cybermetrics is providing many options for working with links and is offering some interesting options at this time, and much of the techniques used in the same may be useful in the processes of information retrieval on the web
Dataset search: a survey
Generating value from data requires the ability to find, access and make
sense of datasets. There are many efforts underway to encourage data sharing
and reuse, from scientific publishers asking authors to submit data alongside
manuscripts to data marketplaces, open data portals and data communities.
Google recently beta released a search service for datasets, which allows users
to discover data stored in various online repositories via keyword queries.
These developments foreshadow an emerging research field around dataset search
or retrieval that broadly encompasses frameworks, methods and tools that help
match a user data need against a collection of datasets. Here, we survey the
state of the art of research and commercial systems in dataset retrieval. We
identify what makes dataset search a research field in its own right, with
unique challenges and methods and highlight open problems. We look at
approaches and implementations from related areas dataset search is drawing
upon, including information retrieval, databases, entity-centric and tabular
search in order to identify possible paths to resolve these open problems as
well as immediate next steps that will take the field forward.Comment: 20 pages, 153 reference
Lucene4IR: Developing information retrieval evaluation resources using Lucene
The workshop and hackathon on developing Information Retrieval Evaluation Resources using Lucene (L4IR) was held on the 8th and 9th of September, 2016 at the University of Strathclyde in Glasgow, UK and funded by the ESF Elias Network. The event featured three main elements: (i) a series of keynote and invited talks on industry, teaching and evaluation; (ii) planning, coding and hacking where a number of groups created modules and infrastructure to use Lucene to undertake TREC based evaluations; and (iii) a number of breakout groups discussing challenges, opportunities and problems in bridging the divide between academia and industry, and how we can use Lucene for teaching and learning Information Retrieval (IR). The event was composed of a mix and blend of academics, experts and students wanting to learn, share and create evaluation resources for the community. The hacking was intense and the discussions lively creating the basis of many useful tools but also raising numerous issues. It was clear that by adopting and contributing to most widely used and supported Open Source IR toolkit, there were many benefits for academics, students, researchers, developers and practitioners - providing a basis for stronger evaluation practices, increased reproducibility, more efficient knowledge transfer, greater collaboration between academia and industry, and shared teaching and training resources
- …