40 research outputs found
An Asynchronous Scheme for the Distributed Evaluation of Interactive Multimedia Retrieval
Evaluation campaigns for interactive multimedia retrieval, such as the Video Browser Shodown (VBS) or the Lifelog Search Challenge (LSC), so far imposed constraints on both simultaneity and locality of all participants, requiring them to solve the same tasks in the same place, at the same time and under the same conditions. These constraints are in contrast to other evaluation campaigns that do not focus on interactivity, where participants can process the tasks in any place at any time. The recent travel restrictions necessitated the relaxation of the locality constraint of interactive campaigns, enabling participants to take place from an arbitrary location. Born out of necessity, this relaxation turned out to be a boon since it greatly simplified the evaluation process and enabled organisation of ad-hoc evaluations outside of the large campaigns. However, it also introduced an additional complication in cases where participants were spread over several time zones. In this paper, we introduce an evaluation scheme for interactive retrieval evaluation that relaxes both the simultaneity and locality constraints, enabling participation from any place at any time within a predefined time frame. This scheme, as implemented in the Distributed Retrieval Evaluation Server (DRES), enables novel ways of conducting interactive retrieval evaluation and bridged the gap between interactive campaigns and non-interactive ones
EXPLOITING BERT FOR MALFORMED SEGMENTATION DETECTION TO IMPROVE SCIENTIFIC WRITINGS
Writing a well-structured scientific documents, such as articles and theses, is vital for comprehending the document's argumentation and understanding its messages. Furthermore, it has an impact on the efficiency and time required for studying the document. Proper document segmentation also yields better results when employing automated Natural Language Processing (NLP) manipulation algorithms, including summarization and other information retrieval and analysis functions. Unfortunately, inexperienced writers, such as young researchers and graduate students, often struggle to produce well-structured professional documents. Their writing frequently exhibits improper segmentations or lacks semantically coherent segments, a phenomenon referred to as "mal-segmentation." Examples of mal-segmentation include improper paragraph or section divisions and unsmooth transitions between sentences and paragraphs. This research addresses the issue of mal-segmentation in scientific writing by introducing an automated method for detecting mal-segmentations, and utilizing Sentence Bidirectional Encoder Representations from Transformers (sBERT) as an encoding mechanism. The experimental results section shows a promising results for the detection of mal-segmentation using the sBERT technique
Audiovisual annotation procedure for multi-view field recordings
Audio and video parts of an audiovisual document interact to produce an audiovisual, or multi-modal, perception. Yet, automatic analysis on these documents are usually based on separate audio and video annotations. Regarding the audiovisual content, these annotations could be incomplete, or not relevant. Besides, the expanding possibilities of creating audiovisual documents lead to consider different kinds of contents, including videos filmed in uncontrolled conditions (i.e. fields recordings), or scenes filmed from different points of view (multi-view). In this paper we propose an original procedure to produce manual annotations in different contexts, including multi-modal and multi-view documents. This procedure, based on using both audio and video annotations, ensures consistency considering audio or video only, and provides additionally audiovisual information at a richer level. Finally, different applications are made possible when considering such annotated data. In particular, we present an example application in a network of recordings in which our annotations allow multi-source retrieval using mono or multi-modal queries
VieLens,: an interactive search engine for LSC2019
With the appearance of many wearable devices like smartwatches,
recording glasses (such as Google glass), smart phones, digital personal profiles have become more readily available nowadays. However, searching and navigating these multi-source, multi-modal,
and often unstructured data to extract useful information is still a
relatively challenging task. Therefore, the LSC2019 competition has
been organized so that researchers can demonstrate novel search
engines, as well as exchange ideas and collaborate on these types
of problems. We present in this paper our approach for supporting
interactive searches of lifelog data by employing a new retrieval
system called VieLens, which is an interactive retrieval system enhanced by natural language processing techniques to extend and
improve search results mainly in the context of a user’s activities
in their daily life
MultiVENT: Multilingual Videos of Events with Aligned Natural Text
Everyday news coverage has shifted from traditional broadcasts towards a wide
range of presentation formats such as first-hand, unedited video footage.
Datasets that reflect the diverse array of multimodal, multilingual news
sources available online could be used to teach models to benefit from this
shift, but existing news video datasets focus on traditional news broadcasts
produced for English-speaking audiences. We address this limitation by
constructing MultiVENT, a dataset of multilingual, event-centric videos
grounded in text documents across five target languages. MultiVENT includes
both news broadcast videos and non-professional event footage, which we use to
analyze the state of online news videos and how they can be leveraged to build
robust, factually accurate models. Finally, we provide a model for complex,
multilingual video retrieval to serve as a baseline for information retrieval
using MultiVENT
Multimodal Automated Fact-Checking: A Survey
Misinformation is often conveyed in multiple modalities, e.g. a miscaptioned
image. Multimodal misinformation is perceived as more credible by humans, and
spreads faster than its text-only counterparts. While an increasing body of
research investigates automated fact-checking (AFC), previous surveys mostly
focus on text. In this survey, we conceptualise a framework for AFC including
subtasks unique to multimodal misinformation. Furthermore, we discuss related
terms used in different communities and map them to our framework. We focus on
four modalities prevalent in real-world fact-checking: text, image, audio, and
video. We survey benchmarks and models, and discuss limitations and promising
directions for future researchComment: The 2023 Conference on Empirical Methods in Natural Language
Processing (EMNLP): Finding