278,757 research outputs found

    The State-of-the-arts in Focused Search

    Get PDF
    The continuous influx of various text data on the Web requires search engines to improve their retrieval abilities for more specific information. The need for relevant results to a user’s topic of interest has gone beyond search for domain or type specific documents to more focused result (e.g. document fragments or answers to a query). The introduction of XML provides a format standard for data representation, storage, and exchange. It helps focused search to be carried out at different granularities of a structured document with XML markups. This report aims at reviewing the state-of-the-arts in focused search, particularly techniques for topic-specific document retrieval, passage retrieval, XML retrieval, and entity ranking. It is concluded with highlight of open problems

    Phase Transitions in Phase Retrieval

    Full text link
    Consider a scenario in which an unknown signal is transformed by a known linear operator, and then the pointwise absolute value of the unknown output function is reported. This scenario appears in several applications, and the goal is to recover the unknown signal -- this is called phase retrieval. Phase retrieval has been a popular subject of research in the last few years, both in determining whether complete information is available with a given linear operator, and in finding efficient and stable phase retrieval algorithms in the cases where complete information is available. Interestingly, there are a few ways to measure information completeness, and each way appears to be governed by a phase transition of sorts. This chapter will survey the state of the art with some of these phase transitions, and identify a few open problems for further research.Comment: Book chapter, survey of recent literature, submitted to Excursions in Harmonic Analysis: The February Fourier Talks at the Norbert Wiener Cente

    Modeling Spoken Information Queries for Virtual Assistants: Open Problems, Challenges and Opportunities

    Full text link
    Virtual assistants are becoming increasingly important speech-driven Information Retrieval platforms that assist users with various tasks. We discuss open problems and challenges with respect to modeling spoken information queries for virtual assistants, and list opportunities where Information Retrieval methods and research can be applied to improve the quality of virtual assistant speech recognition. We discuss how query domain classification, knowledge graphs and user interaction data, and query personalization can be helpful to improve the accurate recognition of spoken information domain queries. Finally, we also provide a brief overview of current problems and challenges in speech recognition.Comment: SIGIR '23. The 46th International ACM SIGIR Conference on Research & Development in Information Retrieva

    Metadata Augmentation for Semantic- and Context- Based Retrieval of Digital Cultural Objects

    Get PDF
    Cultural objects are increasingly stored and generated in digital form, yet effective methods for their indexing and retrieval still remain an open area of research. The main problem arises from the disconnection between the content-based indexing approach used by computer scientists and the description-based approach used by information scientists. There is also a lack of representational schemes that allow the alignment of the semantics and context with keywords and low-level features that can be automatically extracted from the content of these cultural objects. This paper presents an integrated approach to address these problems, taking advantage of both computer science and information science approaches. The focus is on the rationale and conceptual design of the system and its various components. In particular, we discuss techniques for augmenting commonly used metadata with visual features and domain knowledge to generate high-level abstract metadata which in turn can be used for semantic and context-based indexing and retrieval. We use a sample collection of Vietnamese traditional woodcuts to demonstrate the usefulness of this approach

    La cibermetría en la recuperación de información en el Web

    Get PDF
    The exponential growth of web and distributed data characteristics, high volatility, unstructured data, redundant and highly heterogeneous, have introduced new problems in information retrieval processes. Therefore it is necessary to open new avenue of research that allow us to obtain good levels of accuracy. The papers are based on exploiting the hypertext features of the site is reaching great fame. The cybermetrics is providing many options for working with links and is offering some interesting options at this time, and much of the techniques used in the same may be useful in the processes of information retrieval on the web

    Dataset search: a survey

    Get PDF
    Generating value from data requires the ability to find, access and make sense of datasets. There are many efforts underway to encourage data sharing and reuse, from scientific publishers asking authors to submit data alongside manuscripts to data marketplaces, open data portals and data communities. Google recently beta released a search service for datasets, which allows users to discover data stored in various online repositories via keyword queries. These developments foreshadow an emerging research field around dataset search or retrieval that broadly encompasses frameworks, methods and tools that help match a user data need against a collection of datasets. Here, we survey the state of the art of research and commercial systems in dataset retrieval. We identify what makes dataset search a research field in its own right, with unique challenges and methods and highlight open problems. We look at approaches and implementations from related areas dataset search is drawing upon, including information retrieval, databases, entity-centric and tabular search in order to identify possible paths to resolve these open problems as well as immediate next steps that will take the field forward.Comment: 20 pages, 153 reference

    Lucene4IR: Developing information retrieval evaluation resources using Lucene

    Get PDF
    The workshop and hackathon on developing Information Retrieval Evaluation Resources using Lucene (L4IR) was held on the 8th and 9th of September, 2016 at the University of Strathclyde in Glasgow, UK and funded by the ESF Elias Network. The event featured three main elements: (i) a series of keynote and invited talks on industry, teaching and evaluation; (ii) planning, coding and hacking where a number of groups created modules and infrastructure to use Lucene to undertake TREC based evaluations; and (iii) a number of breakout groups discussing challenges, opportunities and problems in bridging the divide between academia and industry, and how we can use Lucene for teaching and learning Information Retrieval (IR). The event was composed of a mix and blend of academics, experts and students wanting to learn, share and create evaluation resources for the community. The hacking was intense and the discussions lively creating the basis of many useful tools but also raising numerous issues. It was clear that by adopting and contributing to most widely used and supported Open Source IR toolkit, there were many benefits for academics, students, researchers, developers and practitioners - providing a basis for stronger evaluation practices, increased reproducibility, more efficient knowledge transfer, greater collaboration between academia and industry, and shared teaching and training resources
    corecore