73 research outputs found

    Multimodal on-the-fly news media exploration

    Get PDF
    Information is presented to us in many ways and one of the most popular and trustworthy sources of information are the news media. Every day, news events from around the world are broadcasted through digital platforms and comprise a wide range of topics, divided into different categories and written by a diverse number of authors. These are presented to us online in the form of text but also in the form of images that help us to visually contextualize and "witness" the event with our own eyes. This way of presenting news, results in a multimodal news articles format. Most news sites present us on their landing page with the latest and most popular news, allowing users to search for specific topics. However, given the large number of articles, especially on topics such as "COVID-19" or "War in Ukraine", enabling users to get a complete picture of the events and their origins in a dynamic and effective way becomes a particularly difficult task. Having a complete picture of the events also helps the users to be less susceptible to biased interpretations. This thesis investigates zero-shot deep multimodal approaches for the news domain that is, given an image or a relevant text of a news article, we are able to analyze and aggregate related news pieces on-the-fly. Textual and visual processing with deep neural methods transform the text and images into the embeddings needed to reach the desired topic through context. We collected the news’ relevant information which resulted in approximately 4 million documents, processed the multimodal information to enable embedding-based searches and then provided aggregations of news according to topics and visualizations selected by the user using an interface that enabled the exploration of unfolding events. The outcome was a zero-shot news pipeline that made multA informação é-nos apresentada de muitas maneiras e uma das fontes de informação mais populares e fiáveis são os meios noticiosos. Todos os dias, eventos noticiosos de todo o mundo são transmitidos através de plataformas digitais e compreendem uma vasta gama de tópicos, divididos em diferentes categorias e escritos por um número diversificado de autores. Estes são-nos apresentados online sob a forma de texto mas também sob a forma de imagens que nos ajudam a contextualizar visualmente e permitem aos leitores "testemunhar"o evento com os seus próprios olhos. Esta forma de apresentação de notícias resulta num formato de artigos de notícias multimodais. Amaioria dos sites de notícias apresenta-nos na sua página de destino as últimas e mais populares notícias e permite ao utilizador pesquisar tópicos específicos. Contudo, dado o grande número de artigos, especialmente sobre tópicos como "COVID-19"ou "Guerra na Ucrânia", permitir aos utilizadores obter uma imagem completa dos acontecimentos e das suas origens de uma forma dinâmica e eficaz torna-se uma tarefa particularmente difícil. Esta tese investiga abordagens multimodais profundas de zero-shot para o domínio das notícias que, dada uma imagem e um texto relevante de um artigo noticioso, é capaz de analisar e agregar peças jornalísticas em tempo real. O processamento textual e visual transforma o texto e imagens nos "embeddings"necessários para chegar ao tópico desejado através do contexto. Recolhemos a informação relevante das notícias que resultou em aproximadamente 4 milhões de documentos, processámos a informação multimodal para permitir pesquisas baseadas em "embeddings"e depois fornecemos agregações de notícias de acordo com os tópicos e visualizações que foram selecionadas pelo utilizador utilizando uma interface que permite a exploração de acontecimentos em desenvolvimento. O resultado foi um fluxo de notícias "zero-shot"que torna as notícias multimodais prontamente disponíveis para navegar de uma forma semântica e eficiente

    Spoken Corpora Good Practice Guide 2006

    Get PDF
    International audienceThere is currently a vast amount of fundamental or applied research, which is based on the exploitation of oral corpora (organized recorded collections of oral and multimodal language productions). Created as a result of linguists becoming aware of the importance to ensure the durability of sources and a diversified access to the oral documents they produce, this Guide to good practice mainly deals with “oral corpora”, created for and used by linguists. But the questions raised by the creation and documentary exploitation of these corpora can be found in numerous disciplines: ethnology, anthropology, sociology, psychology, demography, oral history notably use oral surveys, testimonies, interviews, life stories. Based on a linguistic approach, this Guide also touches on the preoccupations of other researchers who use oral corpora (for example in the field of speech synthesis and recognition), even if their specific needs aren’t consistently dealt with in the present document

    Essential Speech and Language Technology for Dutch: Results by the STEVIN-programme

    Get PDF
    Computational Linguistics; Germanic Languages; Artificial Intelligence (incl. Robotics); Computing Methodologie

    A connected history of audiovisual translation Elements for consideration

    Get PDF
    Why do we need a history of audiovisual translation? The elements of such a history cannot be tackled without any context, especially outside of a history of cinema, understood both as an art made of techniques and a business. And what kind of history do we need? We try here to define the conditions and resources for a connected and comparative history and deal with a few methodological challenges

    Advances in Functional Discourse Grammar

    Get PDF

    Corpus Linguistics software:Understanding their usages and delivering two new tools

    Get PDF
    The increasing availability of computers to ordinary users in the last few decades has led to an exponential increase in the use of Corpus Linguistics (CL) methodologies. The people exploring this data come from a variety of backgrounds and, in many cases, are not proficient corpus linguists. Despite the ongoing development of new tools, there is still an immense gap between what CL can offer and what is currently being done by researchers. This study has two outcomes. It (a) identifies the gap between potential and actual uses of CL methods and tools, and (b) enhances the usability of CL software and complement statistical application through the use of data visualization and user-friendly interfaces. The first outcome is achieved through (i) an investigation of how CL methods are reported in academic publications; (ii) a systematic observation of users of CL software as they engage in the routine tasks; and (iii) a review of four well-established pieces of software used for corpus exploration. Based on the findings, two new statistical tools for CL studies with high usability were developed and implemented on to an existing system, CQPweb. The Advanced Dispersion tool allows users to graphically explore how queries are distributed in a corpus, which makes it easier for users to understand the concept of dispersion. The tool also provides accurate dispersion measures. The Parlink Tool was designed having as its primary target audience beginners with interest in translations studies and second language education. The tool’s primary function is to make it easier for users to see possible translations for corpus queries in the parallel concordances, without the need to use external resources, such as translation memories

    CLARIN

    Get PDF
    The book provides a comprehensive overview of the Common Language Resources and Technology Infrastructure – CLARIN – for the humanities. It covers a broad range of CLARIN language resources and services, its underlying technological infrastructure, the achievements of national consortia, and challenges that CLARIN will tackle in the future. The book is published 10 years after establishing CLARIN as an Europ. Research Infrastructure Consortium
    • …
    corecore