11 research outputs found

    Overview of the NTCIR-14 Lifelog-3 task

    Get PDF
    Lifelog-3 was the third instance of the lifelog task at NTCIR. At NTCIR-14, the Lifelog-3 task explored three different lifelog data access related challenges, the search challenge, the annotation challenge and the insights challenge. In this paper we review the activities of participating teams who took part in the challenges and we suggest next steps for the community

    Experiments in lifelog organisation and retrieval at NTCIR

    Get PDF
    Lifelogging can be described as the process by which individuals use various software and hardware devices to gather large archives of multimodal personal data from multiple sources and store them in a personal data archive, called a lifelog. The Lifelog task at NTCIR was a comparative benchmarking exercise with the aim of encouraging research into the organisation and retrieval of data from multimodal lifelogs. The Lifelog task ran for over 4 years from NTCIR-12 until NTCIR-14 (2015.02–2019.06); it supported participants to submit to five subtasks, each tackling a different challenge related to lifelog retrieval. In this chapter, a motivation is given for the Lifelog task and a review of progress since NTCIR-12 is presented. Finally, the lessons learned and challenges within the domain of lifelog retrieval are presented

    Overview of NTCIR-15 MART

    Get PDF
    MART (Micro-activity Retrieval Task) was a NTCIR-15 collaborative benchmarking pilot task. The NTCIR-15 MART pilot aimed to motivate the development of irst generation techniques for high-precision micro-activity detection and retrieval, to support the identiication and retrieval of activities that occur over short time-scales such as minutes, rather than the long-duration event segmentation tasks of the past work. Participating researchers developed and benchmarked approaches to retrieve micro-activities from rich time-aligned multi-modal sensor data. Groups were ranked in decreasing order of micro-activity retrieval accuracy using mAP (mean Average Precision). The dataset used for the task consisted of a detailed lifelog of activities gathered using a controlled protocol of real-world activities (e.g. using a computer, eating, daydreaming, etc). The data included a lifelog camera data stream, biosignal activity (EOG, HR), and computer interactions (mouse movements, screenshots, etc). This task presented a novel set of challenging micro-activity based topics

    Overview of NTCIR-15 MART

    Get PDF
    MART (Micro-activity Retrieval Task) was a NTCIR-15 collaborative benchmarking pilot task. The NTCIR-15 MART pilot aimed to motivate the development of first generation techniques for high-precision micro-activity detection and retrieval, to support the identification and retrieval of activities that occur over short time-scales such as minutes, rather than the long-duration event segmentation tasks of the past work. Participating researchers developed and benchmarked approaches to retrieve micro-activities from rich time-aligned multi-modal sensor data. Groups were ranked in decreasing order of micro-activity retrieval accuracy using mAP (mean Average Precision). The dataset used for the task consisted of a detailed lifelog of activities gathered using a controlled protocol of real-world activities (e.g. using a computer, eating, daydreaming, etc). The data included a lifelog camera data stream, biosignal activity (EOG, HR), and computer interactions (mouse movements, screenshots, etc). This task presented a novel set of challenging micro-activity based topics

    Advances in lifelog data organisation and retrieval at the NTCIR-14 Lifelog-3 task

    Get PDF
    Lifelogging refers to the process of digitally capturing a continuous and detailed trace of life activities in a passive manner. In order to assist the research community to make progress in the organisation and retrieval of data from lifelog archives, a lifelog task was organised at NTCIR since edition 12. Lifelog-3 was the third running of the lifelog task (at NTCIR-14) and the Lifelog-3 task explored three different lifelog data access related challenges, the search challenge, the annotation challenge and the insights challenge. In this paper we review the dataset created for this activity, activities of participating teams who took part in these challenges and we highlight learnings for the community from the NTCIR-Lifelog challenges

    Evaluating Information Retrieval and Access Tasks

    Get PDF
    This open access book summarizes the first two decades of the NII Testbeds and Community for Information access Research (NTCIR). NTCIR is a series of evaluation forums run by a global team of researchers and hosted by the National Institute of Informatics (NII), Japan. The book is unique in that it discusses not just what was done at NTCIR, but also how it was done and the impact it has achieved. For example, in some chapters the reader sees the early seeds of what eventually grew to be the search engines that provide access to content on the World Wide Web, today’s smartphones that can tailor what they show to the needs of their owners, and the smart speakers that enrich our lives at home and on the move. We also get glimpses into how new search engines can be built for mathematical formulae, or for the digital record of a lived human life. Key to the success of the NTCIR endeavor was early recognition that information access research is an empirical discipline and that evaluation therefore lay at the core of the enterprise. Evaluation is thus at the heart of each chapter in this book. They show, for example, how the recognition that some documents are more important than others has shaped thinking about evaluation design. The thirty-three contributors to this volume speak for the many hundreds of researchers from dozens of countries around the world who together shaped NTCIR as organizers and participants. This book is suitable for researchers, practitioners, and students—anyone who wants to learn about past and present evaluation efforts in information retrieval, information access, and natural language processing, as well as those who want to participate in an evaluation task or even to design and organize one

    SAILER: Structure-aware Pre-trained Language Model for Legal Case Retrieval

    Full text link
    Legal case retrieval, which aims to find relevant cases for a query case, plays a core role in the intelligent legal system. Despite the success that pre-training has achieved in ad-hoc retrieval tasks, effective pre-training strategies for legal case retrieval remain to be explored. Compared with general documents, legal case documents are typically long text sequences with intrinsic logical structures. However, most existing language models have difficulty understanding the long-distance dependencies between different structures. Moreover, in contrast to the general retrieval, the relevance in the legal domain is sensitive to key legal elements. Even subtle differences in key legal elements can significantly affect the judgement of relevance. However, existing pre-trained language models designed for general purposes have not been equipped to handle legal elements. To address these issues, in this paper, we propose SAILER, a new Structure-Aware pre-traIned language model for LEgal case Retrieval. It is highlighted in the following three aspects: (1) SAILER fully utilizes the structural information contained in legal case documents and pays more attention to key legal elements, similar to how legal experts browse legal case documents. (2) SAILER employs an asymmetric encoder-decoder architecture to integrate several different pre-training objectives. In this way, rich semantic information across tasks is encoded into dense vectors. (3) SAILER has powerful discriminative ability, even without any legal annotation data. It can distinguish legal cases with different charges accurately. Extensive experiments over publicly available legal benchmarks demonstrate that our approach can significantly outperform previous state-of-the-art methods in legal case retrieval.Comment: 10 pages, accepted by SIGIR 202

    Constructing Tree-based Index for Efficient and Effective Dense Retrieval

    Full text link
    Recent studies have shown that Dense Retrieval (DR) techniques can significantly improve the performance of first-stage retrieval in IR systems. Despite its empirical effectiveness, the application of DR is still limited. In contrast to statistic retrieval models that rely on highly efficient inverted index solutions, DR models build dense embeddings that are difficult to be pre-processed with most existing search indexing systems. To avoid the expensive cost of brute-force search, the Approximate Nearest Neighbor (ANN) algorithm and corresponding indexes are widely applied to speed up the inference process of DR models. Unfortunately, while ANN can improve the efficiency of DR models, it usually comes with a significant price on retrieval performance. To solve this issue, we propose JTR, which stands for Joint optimization of TRee-based index and query encoding. Specifically, we design a new unified contrastive learning loss to train tree-based index and query encoder in an end-to-end manner. The tree-based negative sampling strategy is applied to make the tree have the maximum heap property, which supports the effectiveness of beam search well. Moreover, we treat the cluster assignment as an optimization problem to update the tree-based index that allows overlapped clustering. We evaluate JTR on numerous popular retrieval benchmarks. Experimental results show that JTR achieves better retrieval performance while retaining high system efficiency compared with widely-adopted baselines. It provides a potential solution to balance efficiency and effectiveness in neural retrieval system designs.Comment: 10 pages, accepted at SIGIR 202

    Temporal multimodal video and lifelog retrieval

    Get PDF
    The past decades have seen exponential growth of both consumption and production of data, with multimedia such as images and videos contributing significantly to said growth. The widespread proliferation of smartphones has provided everyday users with the ability to consume and produce such content easily. As the complexity and diversity of multimedia data has grown, so has the need for more complex retrieval models which address the information needs of users. Finding relevant multimedia content is central in many scenarios, from internet search engines and medical retrieval to querying one's personal multimedia archive, also called lifelog. Traditional retrieval models have often focused on queries targeting small units of retrieval, yet users usually remember temporal context and expect results to include this. However, there is little research into enabling these information needs in interactive multimedia retrieval. In this thesis, we aim to close this research gap by making several contributions to multimedia retrieval with a focus on two scenarios, namely video and lifelog retrieval. We provide a retrieval model for complex information needs with temporal components, including a data model for multimedia retrieval, a query model for complex information needs, and a modular and adaptable query execution model which includes novel algorithms for result fusion. The concepts and models are implemented in vitrivr, an open-source multimodal multimedia retrieval system, which covers all aspects from extraction to query formulation and browsing. vitrivr has proven its usefulness in evaluation campaigns and is now used in two large-scale interdisciplinary research projects. We show the feasibility and effectiveness of our contributions in two ways: firstly, through results from user-centric evaluations which pit different user-system combinations against one another. Secondly, we perform a system-centric evaluation by creating a new dataset for temporal information needs in video and lifelog retrieval with which we quantitatively evaluate our models. The results show significant benefits for systems that enable users to specify more complex information needs with temporal components. Participation in interactive retrieval evaluation campaigns over multiple years provides insight into possible future developments and challenges of such campaigns
    corecore