95 research outputs found

    Textual summarization of data using linguistic protoform summaries

    Get PDF
    With the unseen quantities of data being generated in all walks of life, varying from social media to health domain, introduction of new techniques in order to better understand this information content is imperative. Contrary to the popular methods such as visualization or statistical summarization which requires the user to adapt to the technology, summarization of this information in the language pertaining to the audience, has recently gained a lot of traction. This thesis deals with textual summarization of data using Linguistic Protoform Summaries (LPS). We start by studying the existing techniques present in the literature to produce LPS, propose a new method and demonstrate its robustness with a mathematical proof. The usefulness of LPS is then illustrated with a novel application in the healthcare domain where the textual summaries are tailored to a clinical population. This is followed by an extensive study on the use of LPS as features in order to process data. There, we present our thoughts on the ways to handle LPS as data features and provide reasoning of this choice. We illustrate this with a real data example where we find a prototypical set of days of the activity of people living in an eldercare facility. Throughout this thesis we design various sets of experiments with synthetic data in order to explain the details of techniques presented

    Text-based Semantic Annotation Service for Multimedia Content in the Esperonto project

    Get PDF
    Within the Esperonto project, an integration of NLP, ontologies and other knowledge bases, is being performed with the goal to implement a semantic annotation service that upgrades the actual Web towards the emerging Semantic Web. Research is being currently conducted on how to apply the Esperonto semantic annotation service to text material associated with still images in web pages. In doing so, the project will allow for semantic querying of still images in the web, but also (automatically) create a large set of text-based semantic annotations of still images, which can be used by the Multimedia community in order to support the task of content indexing of image material, possibly combining the Esperonto type of annotations with the annotations resulting from image analysis

    Semantics-Consistent Cross-domain Summarization via Optimal Transport Alignment

    Full text link
    Multimedia summarization with multimodal output (MSMO) is a recently explored application in language grounding. It plays an essential role in real-world applications, i.e., automatically generating cover images and titles for news articles or providing introductions to online videos. However, existing methods extract features from the whole video and article and use fusion methods to select the representative one, thus usually ignoring the critical structure and varying semantics. In this work, we propose a Semantics-Consistent Cross-domain Summarization (SCCS) model based on optimal transport alignment with visual and textual segmentation. In specific, our method first decomposes both video and article into segments in order to capture the structural semantics, respectively. Then SCCS follows a cross-domain alignment objective with optimal transport distance, which leverages multimodal interaction to match and select the visual and textual summary. We evaluated our method on three recent multimodal datasets and demonstrated the effectiveness of our method in producing high-quality multimodal summaries

    Semantic Analysis Based Text Summarization

    Get PDF
    Automatic summarization has become an important part in the study of natural language processing since the advent of the 21st century, since a majority of the data online is textual. Summarization of text will lead to a reduction of data while maintaining the context of it. Having such summarization activity being done automatically also helps in reducing human effort. Summarization is the process of generation of the summary of input text by extracting the representative sentences from it. In this project, we present a novel technique for generating the summarization of domain specific text by using Semantic Analysis for text summarization, which is a subset of Natural Language Processing

    MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos

    Full text link
    Multimodal summarization with multimodal output (MSMO) has emerged as a promising research direction. Nonetheless, numerous limitations exist within existing public MSMO datasets, including insufficient maintenance, data inaccessibility, limited size, and the absence of proper categorization, which pose significant challenges. To address these challenges and provide a comprehensive dataset for this new direction, we have meticulously curated the \textbf{MMSum} dataset. Our new dataset features (1) Human-validated summaries for both video and textual content, providing superior human instruction and labels for multimodal learning. (2) Comprehensively and meticulously arranged categorization, spanning 17 principal categories and 170 subcategories to encapsulate a diverse array of real-world scenarios. (3) Benchmark tests performed on the proposed dataset to assess various tasks and methods, including \textit{video summarization}, \textit{text summarization}, and \textit{multimodal summarization}. To champion accessibility and collaboration, we will release the \textbf{MMSum} dataset and the data collection tool as fully open-source resources, fostering transparency and accelerating future developments. Our project website can be found at~\url{https://mmsum-dataset.github.io/}Comment: Project website: https://mmsum-dataset.github.io

    Hierarchical3D Adapters for Long Video-to-text Summarization

    Full text link
    In this paper, we focus on video-to-text summarization and investigate how to best utilize multimodal information for summarizing long inputs (e.g., an hour-long TV show) into long outputs (e.g., a multi-sentence summary). We extend SummScreen (Chen et al., 2021), a dialogue summarization dataset consisting of transcripts of TV episodes with reference summaries, and create a multimodal variant by collecting corresponding full-length videos. We incorporate multimodal information into a pre-trained textual summarizer efficiently using adapter modules augmented with a hierarchical structure while tuning only 3.8\% of model parameters. Our experiments demonstrate that multimodal information offers superior performance over more memory-heavy and fully fine-tuned textual summarization methods
    corecore