447 research outputs found
Foundations and Recent Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions
Multimodal machine learning is a vibrant multi-disciplinary research field
that aims to design computer agents with intelligent capabilities such as
understanding, reasoning, and learning through integrating multiple
communicative modalities, including linguistic, acoustic, visual, tactile, and
physiological messages. With the recent interest in video understanding,
embodied autonomous agents, text-to-image generation, and multisensor fusion in
application domains such as healthcare and robotics, multimodal machine
learning has brought unique computational and theoretical challenges to the
machine learning community given the heterogeneity of data sources and the
interconnections often found between modalities. However, the breadth of
progress in multimodal research has made it difficult to identify the common
themes and open questions in the field. By synthesizing a broad range of
application domains and theoretical frameworks from both historical and recent
perspectives, this paper is designed to provide an overview of the
computational and theoretical foundations of multimodal machine learning. We
start by defining two key principles of modality heterogeneity and
interconnections that have driven subsequent innovations, and propose a
taxonomy of 6 core technical challenges: representation, alignment, reasoning,
generation, transference, and quantification covering historical and recent
trends. Recent technical achievements will be presented through the lens of
this taxonomy, allowing researchers to understand the similarities and
differences across new approaches. We end by motivating several open problems
for future research as identified by our taxonomy
2017 Annual Research Symposium Abstract Book
2017 annual volume of abstracts for science research projects conducted by students at Trinity College
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop
Indonesian and Malay are underrepresented in the development of natural language processing (NLP) technologies and available resources are difficult to find. A clear picture of existing work can invigorate and inform how researchers conceptualise worthwhile projects. Using an education sector project to motivate the study, we conducted a wide-ranging overview of Indonesian and Malay human language technologies and corpus work. We charted 657 included studies according to Hirschberg and Manning's 2015 description of NLP, concluding that the field was dominated by exploratory corpus work, machine reading of text gathered from the Internet, and sentiment analysis. In this paper, we identify most published authors and research hubs, and make a number of recommendations to encourage future collaboration and efficiency within NLP in Indonesian and Malay
Understanding Health Video Engagement: An Interpretable Deep Learning Approach
Health misinformation on social media devastates physical and mental health,
invalidates health gains, and potentially costs lives. Understanding how health
misinformation is transmitted is an urgent goal for researchers, social media
platforms, health sectors, and policymakers to mitigate those ramifications.
Deep learning methods have been deployed to predict the spread of
misinformation. While achieving the state-of-the-art predictive performance,
deep learning methods lack the interpretability due to their blackbox nature.
To remedy this gap, this study proposes a novel interpretable deep learning
approach, Generative Adversarial Network based Piecewise Wide and Attention
Deep Learning (GAN-PiWAD), to predict health misinformation transmission in
social media. Improving upon state-of-the-art interpretable methods, GAN-PiWAD
captures the interactions among multi-modal data, offers unbiased estimation of
the total effect of each feature, and models the dynamic total effect of each
feature when its value varies. We select features according to social exchange
theory and evaluate GAN-PiWAD on 4,445 misinformation videos. The proposed
approach outperformed strong benchmarks. Interpretation of GAN-PiWAD indicates
video description, negative video content, and channel credibility are key
features that drive viral transmission of misinformation. This study
contributes to IS with a novel interpretable deep learning method that is
generalizable to understand other human decision factors. Our findings provide
direct implications for social media platforms and policymakers to design
proactive interventions to identify misinformation, control transmissions, and
manage infodemics.Comment: WITS 2021 Best Paper Awar
- …