40 research outputs found

    An Asynchronous Scheme for the Distributed Evaluation of Interactive Multimedia Retrieval

    Full text link
    Evaluation campaigns for interactive multimedia retrieval, such as the Video Browser Shodown (VBS) or the Lifelog Search Challenge (LSC), so far imposed constraints on both simultaneity and locality of all participants, requiring them to solve the same tasks in the same place, at the same time and under the same conditions. These constraints are in contrast to other evaluation campaigns that do not focus on interactivity, where participants can process the tasks in any place at any time. The recent travel restrictions necessitated the relaxation of the locality constraint of interactive campaigns, enabling participants to take place from an arbitrary location. Born out of necessity, this relaxation turned out to be a boon since it greatly simplified the evaluation process and enabled organisation of ad-hoc evaluations outside of the large campaigns. However, it also introduced an additional complication in cases where participants were spread over several time zones. In this paper, we introduce an evaluation scheme for interactive retrieval evaluation that relaxes both the simultaneity and locality constraints, enabling participation from any place at any time within a predefined time frame. This scheme, as implemented in the Distributed Retrieval Evaluation Server (DRES), enables novel ways of conducting interactive retrieval evaluation and bridged the gap between interactive campaigns and non-interactive ones

    EXPLOITING BERT FOR MALFORMED SEGMENTATION DETECTION TO IMPROVE SCIENTIFIC WRITINGS

    Get PDF
    Writing a well-structured scientific documents, such as articles and theses, is vital for comprehending the document's argumentation and understanding its messages. Furthermore, it has an impact on the efficiency and time required for studying the document. Proper document segmentation also yields better results when employing automated Natural Language Processing (NLP) manipulation algorithms, including summarization and other information retrieval and analysis functions. Unfortunately, inexperienced writers, such as young researchers and graduate students, often struggle to produce well-structured professional documents. Their writing frequently exhibits improper segmentations or lacks semantically coherent segments, a phenomenon referred to as "mal-segmentation." Examples of mal-segmentation include improper paragraph or section divisions and unsmooth transitions between sentences and paragraphs. This research addresses the issue of mal-segmentation in scientific writing by introducing an automated method for detecting mal-segmentations, and utilizing Sentence Bidirectional Encoder Representations from Transformers (sBERT) as an encoding mechanism. The experimental results section shows a promising results for the detection of mal-segmentation using the sBERT technique

    Audiovisual annotation procedure for multi-view field recordings

    Get PDF
    Audio and video parts of an audiovisual document interact to produce an audiovisual, or multi-modal, perception. Yet, automatic analysis on these documents are usually based on separate audio and video annotations. Regarding the audiovisual content, these annotations could be incomplete, or not relevant. Besides, the expanding possibilities of creating audiovisual documents lead to consider different kinds of contents, including videos filmed in uncontrolled conditions (i.e. fields recordings), or scenes filmed from different points of view (multi-view). In this paper we propose an original procedure to produce manual annotations in different contexts, including multi-modal and multi-view documents. This procedure, based on using both audio and video annotations, ensures consistency considering audio or video only, and provides additionally audiovisual information at a richer level. Finally, different applications are made possible when considering such annotated data. In particular, we present an example application in a network of recordings in which our annotations allow multi-source retrieval using mono or multi-modal queries

    VieLens,: an interactive search engine for LSC2019

    Get PDF
    With the appearance of many wearable devices like smartwatches, recording glasses (such as Google glass), smart phones, digital personal profiles have become more readily available nowadays. However, searching and navigating these multi-source, multi-modal, and often unstructured data to extract useful information is still a relatively challenging task. Therefore, the LSC2019 competition has been organized so that researchers can demonstrate novel search engines, as well as exchange ideas and collaborate on these types of problems. We present in this paper our approach for supporting interactive searches of lifelog data by employing a new retrieval system called VieLens, which is an interactive retrieval system enhanced by natural language processing techniques to extend and improve search results mainly in the context of a user’s activities in their daily life

    MultiVENT: Multilingual Videos of Events with Aligned Natural Text

    Full text link
    Everyday news coverage has shifted from traditional broadcasts towards a wide range of presentation formats such as first-hand, unedited video footage. Datasets that reflect the diverse array of multimodal, multilingual news sources available online could be used to teach models to benefit from this shift, but existing news video datasets focus on traditional news broadcasts produced for English-speaking audiences. We address this limitation by constructing MultiVENT, a dataset of multilingual, event-centric videos grounded in text documents across five target languages. MultiVENT includes both news broadcast videos and non-professional event footage, which we use to analyze the state of online news videos and how they can be leveraged to build robust, factually accurate models. Finally, we provide a model for complex, multilingual video retrieval to serve as a baseline for information retrieval using MultiVENT

    Multimodal Automated Fact-Checking: A Survey

    Full text link
    Misinformation is often conveyed in multiple modalities, e.g. a miscaptioned image. Multimodal misinformation is perceived as more credible by humans, and spreads faster than its text-only counterparts. While an increasing body of research investigates automated fact-checking (AFC), previous surveys mostly focus on text. In this survey, we conceptualise a framework for AFC including subtasks unique to multimodal misinformation. Furthermore, we discuss related terms used in different communities and map them to our framework. We focus on four modalities prevalent in real-world fact-checking: text, image, audio, and video. We survey benchmarks and models, and discuss limitations and promising directions for future researchComment: The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP): Finding
    corecore