5,030 research outputs found

    Video Storytelling: Textual Summaries for Events

    Full text link
    Bridging vision and natural language is a longstanding goal in computer vision and multimedia research. While earlier works focus on generating a single-sentence description for visual content, recent works have studied paragraph generation. In this work, we introduce the problem of video storytelling, which aims at generating coherent and succinct stories for long videos. Video storytelling introduces new challenges, mainly due to the diversity of the story and the length and complexity of the video. We propose novel methods to address the challenges. First, we propose a context-aware framework for multimodal embedding learning, where we design a Residual Bidirectional Recurrent Neural Network to leverage contextual information from past and future. Second, we propose a Narrator model to discover the underlying storyline. The Narrator is formulated as a reinforcement learning agent which is trained by directly optimizing the textual metric of the generated story. We evaluate our method on the Video Story dataset, a new dataset that we have collected to enable the study. We compare our method with multiple state-of-the-art baselines, and show that our method achieves better performance, in terms of quantitative measures and user study.Comment: Published in IEEE Transactions on Multimedi

    Mapping Science Based on Research Content Similarity

    Get PDF
    Maps of science representing the structure of science help us understand science and technology development. Thus, research in scientometrics has developed techniques for analyzing research activities and for measuring their relationships; however, navigating the recent scientific landscape is still challenging, since conventional inter-citation and co-citation analysis has difficulty in applying to recently published articles and ongoing projects. Therefore, to characterize what is being attempted in the current scientific landscape, this article proposes a content-based method of locating research articles/projects in a multi-dimensional space using word/paragraph embedding. Specifically, for addressing an unclustered problem, we introduced cluster vectors based on the information entropies of technical concepts. The experimental results showed that our method formed a clustered map from approx. 300 k IEEE articles and NSF projects from 2012 to 2016. Finally, we confirmed that formation of specific research areas can be captured as changes in the network structure

    Content-based Map of Science using Cross-lingual Document Embedding - A Comparison of US-Japan Funded Projects

    Get PDF
    Maps depicting the structure of science help us understand the development of science and technology. However, as it is difficult to apply inter-citation and co-citation analysis to recently published papers and ongoing projects that have few or no references, our previous work developed a content-based map by locating research papers and funding projects using word/document embedding. Because difficulties arise when comparing the content-based map in different languages, this paper improves our content-based map by developing a method for generating multi-dimensional vectors in the same space from cross-lingual (English and Japanese) documents. Using 1,000 IEEE papers, we confirmed a similarity of 0.76 for matching bilingual contents. Finally, we constructed a map from 34,000 projects of the National Science Foundation and Japan Society for the Promotion of Science from 2012 to 2015, and we indicate the findings obtained from a comparison of the US-Japan funded projects

    Academia/Industry DynAmics (AIDA): A knowledge Graph within the scholarly domain and its applications

    Get PDF
    Scholarly knowledge graphs are a form of knowledge representation that aims to capture and organize the information and knowledge contained in scholarly publications, such as research papers, books, patents, and datasets. Scholarly knowledge graphs can provide a comprehensive and structured view of the scholarly domain, covering various aspects such as authors, affiliations, research topics, methods, results, citations, and impact. Scholarly knowledge graphs can enable various applications and services that can facilitate and enhance scholarly communication, such as information retrieval, data analysis, recommendation systems, semantic search, and knowledge discovery. However, constructing and maintaining scholarly knowledge graphs is a challenging task that requires dealing with large-scale, heterogeneous, and dynamic data sources. Moreover, extracting and integrating the relevant information and knowledge from unstructured or semi-structured text is not trivial, as it involves natural language processing, machine learning, ontology engineering, and semantic web technologies. Furthermore, ensuring the quality and validity of the scholarly knowledge graphs is essential for their usability and reliability

    A visual analytics platform for competitive intelligence

    Get PDF
    Silva, D., & Bação, F. (2023). MapIntel: A visual analytics platform for competitive intelligence. Expert Systems, [e13445]. https://doi.org/https://www.authorea.com/doi/full/10.22541/au.166785335.50477185, https://doi.org/10.1111/exsy.13445 --- Funding Information: This work was supported by the (research grant under the DSAIPA/DS/0116/2019 project). Fundação para a Ciência e Tecnologia of Ministério da Ciência e Tecnologia e Ensino SuperiorCompetitive Intelligence allows an organization to keep up with market trends and foresee business opportunities. This practice is mainly performed by analysts scanning for any piece of valuable information in a myriad of dispersed and unstructured sources. Here we present MapIntel, a system for acquiring intelligence from vast collections of text data by representing each document as a multidimensional vector that captures its own semantics. The system is designed to handle complex Natural Language queries and visual exploration of the corpus, potentially aiding overburdened analysts in finding meaningful insights to help decision-making. The system searching module uses a retriever and re-ranker engine that first finds the closest neighbours to the query embedding and then sifts the results through a cross-encoder model that identifies the most relevant documents. The browsing or visualization module also leverages the embeddings by projecting them onto two dimensions while preserving the multidimensional landscape, resulting in a map where semantically related documents form topical clusters which we capture using topic modelling. This map aims at promoting a fast overview of the corpus while allowing a more detailed exploration and interactive information encountering process. We evaluate the system and its components on the 20 newsgroups data set, using the semantic document labels provided, and demonstrate the superiority of Transformer-based components. Finally, we present a prototype of the system in Python and show how some of its features can be used to acquire intelligence from a news article corpus we collected during a period of 8 months.preprintauthorsversionepub_ahead_of_prin

    From Free Text to Clusters of Content in Health Records: An Unsupervised Graph Partitioning Approach

    Full text link
    Electronic Healthcare records contain large volumes of unstructured data in different forms. Free text constitutes a large portion of such data, yet this source of richly detailed information often remains under-used in practice because of a lack of suitable methodologies to extract interpretable content in a timely manner. Here we apply network-theoretical tools to the analysis of free text in Hospital Patient Incident reports in the English National Health Service, to find clusters of reports in an unsupervised manner and at different levels of resolution based directly on the free text descriptions contained within them. To do so, we combine recently developed deep neural network text-embedding methodologies based on paragraph vectors with multi-scale Markov Stability community detection applied to a similarity graph of documents obtained from sparsified text vector similarities. We showcase the approach with the analysis of incident reports submitted in Imperial College Healthcare NHS Trust, London. The multiscale community structure reveals levels of meaning with different resolution in the topics of the dataset, as shown by relevant descriptive terms extracted from the groups of records, as well as by comparing a posteriori against hand-coded categories assigned by healthcare personnel. Our content communities exhibit good correspondence with well-defined hand-coded categories, yet our results also provide further medical detail in certain areas as well as revealing complementary descriptors of incidents beyond the external classification. We also discuss how the method can be used to monitor reports over time and across different healthcare providers, and to detect emerging trends that fall outside of pre-existing categories.Comment: 25 pages, 2 tables, 8 figures and 5 supplementary figure

    Use What You Have: Video Retrieval Using Representations From Collaborative Experts

    Full text link
    The rapid growth of video on the internet has made searching for video content using natural language queries a significant challenge. Human-generated queries for video datasets `in the wild' vary a lot in terms of degree of specificity, with some queries describing specific details such as the names of famous identities, content from speech, or text available on the screen. Our goal is to condense the multi-modal, extremely high dimensional information from videos into a single, compact video representation for the task of video retrieval using free-form text queries, where the degree of specificity is open-ended. For this we exploit existing knowledge in the form of pre-trained semantic embeddings which include 'general' features such as motion, appearance, and scene features from visual content. We also explore the use of more 'specific' cues from ASR and OCR which are intermittently available for videos and find that these signals remain challenging to use effectively for retrieval. We propose a collaborative experts model to aggregate information from these different pre-trained experts and assess our approach empirically on five retrieval benchmarks: MSR-VTT, LSMDC, MSVD, DiDeMo, and ActivityNet. Code and data can be found at www.robots.ox.ac.uk/~vgg/research/collaborative-experts/. This paper contains a correction to results reported in the previous version.Comment: This update contains a correction to previously reported result

    Visual Storytelling: Captioning of Image Sequences

    Get PDF
    In the space of automated captioning, the task of visual storytelling is a dimension. Given sequences of images as inputs, visual storytelling (VIST) is about automatically generating textual narratives as outputs. Automatically producing stories for an order of pictures or video frames have several potential applications in diverse domains ranging from multimedia consumption to autonomous systems. The task has evolved over recent years and is moving into adolescence. The availability of a dedicated VIST dataset for the task has mainstreamed research for visual storytelling and related sub-tasks. This thesis work systematically reports the developments of standard captioning as a parent task with accompanying facets like dense captioning and gradually delves into the domain of visual storytelling. Existing models proposed for VIST are described by examining respective characteristics and scope. All the methods for VIST adapt from the typical encoder-decoder style design, owing to its success in addressing the standard image captioning task. Several subtle differences in the underlying intentions of these methods for approaching the VIST are subsequently summarized. Additionally, alternate perspectives around the existing approaches are explored by re-modeling and modifying their learning mechanisms. Experiments with different objective functions are reported with subjective comparisons and relevant results. Eventually, the sub-field of character relationships within storytelling is studied and a novel idea called character-centric storytelling is proposed to account for prospective characters in the extent of data modalities
    corecore