95 research outputs found
Textual summarization of data using linguistic protoform summaries
With the unseen quantities of data being generated in all walks of life, varying from social media to health domain, introduction of new techniques in order to better understand this information content is imperative. Contrary to the popular methods such as visualization or statistical summarization which requires the user to adapt to the technology, summarization of this information in the language pertaining to the audience, has recently gained a lot of traction. This thesis deals with textual summarization of data using Linguistic Protoform Summaries (LPS). We start by studying the existing techniques present in the literature to produce LPS, propose a new method and demonstrate its robustness with a mathematical proof. The usefulness of LPS is then illustrated with a novel application in the healthcare domain where the textual summaries are tailored to a clinical population. This is followed by an extensive study on the use of LPS as features in order to process data. There, we present our thoughts on the ways to handle LPS as data features and provide reasoning of this choice. We illustrate this with a real data example where we find a prototypical set of days of the activity of people living in an eldercare facility. Throughout this thesis we design various sets of experiments with synthetic data in order to explain the details of techniques presented
Text-based Semantic Annotation Service for Multimedia Content in the Esperonto project
Within the Esperonto project, an integration of NLP, ontologies and other knowledge bases, is being performed with the goal to implement a semantic annotation service that upgrades the actual Web towards the emerging Semantic Web. Research is being currently conducted on how to apply the Esperonto semantic annotation service to text material associated with still images in web pages. In doing so, the project will allow for semantic querying of still images in the web, but also (automatically) create a large set of text-based semantic annotations of still images, which can be used by the Multimedia community in order to support the task of content indexing of image material, possibly combining the Esperonto type of annotations with the annotations resulting from image analysis
Semantics-Consistent Cross-domain Summarization via Optimal Transport Alignment
Multimedia summarization with multimodal output (MSMO) is a recently explored
application in language grounding. It plays an essential role in real-world
applications, i.e., automatically generating cover images and titles for news
articles or providing introductions to online videos. However, existing methods
extract features from the whole video and article and use fusion methods to
select the representative one, thus usually ignoring the critical structure and
varying semantics. In this work, we propose a Semantics-Consistent Cross-domain
Summarization (SCCS) model based on optimal transport alignment with visual and
textual segmentation. In specific, our method first decomposes both video and
article into segments in order to capture the structural semantics,
respectively. Then SCCS follows a cross-domain alignment objective with optimal
transport distance, which leverages multimodal interaction to match and select
the visual and textual summary. We evaluated our method on three recent
multimodal datasets and demonstrated the effectiveness of our method in
producing high-quality multimodal summaries
Semantic Analysis Based Text Summarization
Automatic summarization has become an important part in the study of natural language processing since the advent of the 21st century, since a majority of the data online is textual. Summarization of text will lead to a reduction of data while maintaining the context of it. Having such summarization activity being done automatically also helps in reducing human effort. Summarization is the process of generation of the summary of input text by extracting the representative sentences from it. In this project, we present a novel technique for generating the summarization of domain specific text by using Semantic Analysis for text summarization, which is a subset of Natural Language Processing
MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos
Multimodal summarization with multimodal output (MSMO) has emerged as a
promising research direction. Nonetheless, numerous limitations exist within
existing public MSMO datasets, including insufficient maintenance, data
inaccessibility, limited size, and the absence of proper categorization, which
pose significant challenges. To address these challenges and provide a
comprehensive dataset for this new direction, we have meticulously curated the
\textbf{MMSum} dataset. Our new dataset features (1) Human-validated summaries
for both video and textual content, providing superior human instruction and
labels for multimodal learning. (2) Comprehensively and meticulously arranged
categorization, spanning 17 principal categories and 170 subcategories to
encapsulate a diverse array of real-world scenarios. (3) Benchmark tests
performed on the proposed dataset to assess various tasks and methods,
including \textit{video summarization}, \textit{text summarization}, and
\textit{multimodal summarization}. To champion accessibility and collaboration,
we will release the \textbf{MMSum} dataset and the data collection tool as
fully open-source resources, fostering transparency and accelerating future
developments. Our project website can be found
at~\url{https://mmsum-dataset.github.io/}Comment: Project website: https://mmsum-dataset.github.io
Hierarchical3D Adapters for Long Video-to-text Summarization
In this paper, we focus on video-to-text summarization and investigate how to
best utilize multimodal information for summarizing long inputs (e.g., an
hour-long TV show) into long outputs (e.g., a multi-sentence summary). We
extend SummScreen (Chen et al., 2021), a dialogue summarization dataset
consisting of transcripts of TV episodes with reference summaries, and create a
multimodal variant by collecting corresponding full-length videos. We
incorporate multimodal information into a pre-trained textual summarizer
efficiently using adapter modules augmented with a hierarchical structure while
tuning only 3.8\% of model parameters. Our experiments demonstrate that
multimodal information offers superior performance over more memory-heavy and
fully fine-tuned textual summarization methods
- …