429 research outputs found
Recommended from our members
Multimodal News Summarization, Tracking and Annotation Incorporating Tensor Analysis of Memes
We demonstrate four novel multimodal methods for efficient video summarization and comprehensive cross-cultural news video understanding.
First, For video quick browsing, we demonstrate a multimedia event recounting system. Based on nine people-oriented design principles, it summarizes YouTube-like videos into short visual segments (812sec) and textual words (less than 10 terms). In the 2013 Trecvid Multimedia Event Recounting competition, this system placed first in recognition time efficiency, while remaining above average in description accuracy.
Secondly, we demonstrate the summarization of large amounts of online international news videos. In order to understand an international event such as Ebola virus, AirAsia Flight 8501 and Zika virus comprehensively, we present a novel and efficient constrained tensor factorization algorithm that first represents a video archive of multimedia news stories concerning a news event as a sparse tensor of order 4. The dimensions correspond to extracted visual memes, verbal tags, time periods, and cultures. The iterative algorithm approximately but accurately extracts coherent quad-clusters, each of which represents a significant summary of an important independent aspect of the news event. We give examples of quad-clusters extracted from tensors with at least 108 entries derived from international news coverage. We show the method is fast, can be tuned to give preferences to any subset of its four dimensions, and exceeds three existing methods in performance.
Thirdly, noting that the co-occurrence of visual memes and tags in our summarization result is sparse, we show how to model cross-cultural visual meme influence based on normalized PageRank, which more accurately captures the rates at which visual memes are reposted in a specified time period in a specified culture.
Lastly, we establish the correspondences of videos and text descriptions in different cultures by reliable visual cues, detect culture-specific tags for visual memes and then annotate videos in a cultural settings. Starting with any video with less text or no text in one culture (say, US), we select candidate annotations in the text of another culture (say, China) to annotate US video. Through analyzing the similarity of images annotated by those candidates, we can derive a set of proper tags from the viewpoints of another culture (China). We illustrate cultural-based annotation examples by segments of international news. We evaluate the generated tags by cross-cultural tag frequency, tag precision, and user studies
Foundations and Recent Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions
Multimodal machine learning is a vibrant multi-disciplinary research field
that aims to design computer agents with intelligent capabilities such as
understanding, reasoning, and learning through integrating multiple
communicative modalities, including linguistic, acoustic, visual, tactile, and
physiological messages. With the recent interest in video understanding,
embodied autonomous agents, text-to-image generation, and multisensor fusion in
application domains such as healthcare and robotics, multimodal machine
learning has brought unique computational and theoretical challenges to the
machine learning community given the heterogeneity of data sources and the
interconnections often found between modalities. However, the breadth of
progress in multimodal research has made it difficult to identify the common
themes and open questions in the field. By synthesizing a broad range of
application domains and theoretical frameworks from both historical and recent
perspectives, this paper is designed to provide an overview of the
computational and theoretical foundations of multimodal machine learning. We
start by defining two key principles of modality heterogeneity and
interconnections that have driven subsequent innovations, and propose a
taxonomy of 6 core technical challenges: representation, alignment, reasoning,
generation, transference, and quantification covering historical and recent
trends. Recent technical achievements will be presented through the lens of
this taxonomy, allowing researchers to understand the similarities and
differences across new approaches. We end by motivating several open problems
for future research as identified by our taxonomy
A Survey of Location Prediction on Twitter
Locations, e.g., countries, states, cities, and point-of-interests, are
central to news, emergency events, and people's daily lives. Automatic
identification of locations associated with or mentioned in documents has been
explored for decades. As one of the most popular online social network
platforms, Twitter has attracted a large number of users who send millions of
tweets on daily basis. Due to the world-wide coverage of its users and
real-time freshness of tweets, location prediction on Twitter has gained
significant attention in recent years. Research efforts are spent on dealing
with new challenges and opportunities brought by the noisy, short, and
context-rich nature of tweets. In this survey, we aim at offering an overall
picture of location prediction on Twitter. Specifically, we concentrate on the
prediction of user home locations, tweet locations, and mentioned locations. We
first define the three tasks and review the evaluation metrics. By summarizing
Twitter network, tweet content, and tweet context as potential inputs, we then
structurally highlight how the problems depend on these inputs. Each dependency
is illustrated by a comprehensive review of the corresponding strategies
adopted in state-of-the-art approaches. In addition, we also briefly review two
related problems, i.e., semantic location prediction and point-of-interest
recommendation. Finally, we list future research directions.Comment: Accepted to TKDE. 30 pages, 1 figur
A Survey on Semantic Processing Techniques
Semantic processing is a fundamental research domain in computational
linguistics. In the era of powerful pre-trained language models and large
language models, the advancement of research in this domain appears to be
decelerating. However, the study of semantics is multi-dimensional in
linguistics. The research depth and breadth of computational semantic
processing can be largely improved with new technologies. In this survey, we
analyzed five semantic processing tasks, e.g., word sense disambiguation,
anaphora resolution, named entity recognition, concept extraction, and
subjectivity detection. We study relevant theoretical research in these fields,
advanced methods, and downstream applications. We connect the surveyed tasks
with downstream applications because this may inspire future scholars to fuse
these low-level semantic processing tasks with high-level natural language
processing tasks. The review of theoretical research may also inspire new tasks
and technologies in the semantic processing domain. Finally, we compare the
different semantic processing techniques and summarize their technical trends,
application trends, and future directions.Comment: Published at Information Fusion, Volume 101, 2024, 101988, ISSN
1566-2535. The equal contribution mark is missed in the published version due
to the publication policies. Please contact Prof. Erik Cambria for detail
- …