1,615 research outputs found
Foundations and Recent Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions
Multimodal machine learning is a vibrant multi-disciplinary research field
that aims to design computer agents with intelligent capabilities such as
understanding, reasoning, and learning through integrating multiple
communicative modalities, including linguistic, acoustic, visual, tactile, and
physiological messages. With the recent interest in video understanding,
embodied autonomous agents, text-to-image generation, and multisensor fusion in
application domains such as healthcare and robotics, multimodal machine
learning has brought unique computational and theoretical challenges to the
machine learning community given the heterogeneity of data sources and the
interconnections often found between modalities. However, the breadth of
progress in multimodal research has made it difficult to identify the common
themes and open questions in the field. By synthesizing a broad range of
application domains and theoretical frameworks from both historical and recent
perspectives, this paper is designed to provide an overview of the
computational and theoretical foundations of multimodal machine learning. We
start by defining two key principles of modality heterogeneity and
interconnections that have driven subsequent innovations, and propose a
taxonomy of 6 core technical challenges: representation, alignment, reasoning,
generation, transference, and quantification covering historical and recent
trends. Recent technical achievements will be presented through the lens of
this taxonomy, allowing researchers to understand the similarities and
differences across new approaches. We end by motivating several open problems
for future research as identified by our taxonomy
Learning Language from a Large (Unannotated) Corpus
A novel approach to the fully automated, unsupervised extraction of
dependency grammars and associated syntax-to-semantic-relationship mappings
from large text corpora is described. The suggested approach builds on the
authors' prior work with the Link Grammar, RelEx and OpenCog systems, as well
as on a number of prior papers and approaches from the statistical language
learning literature. If successful, this approach would enable the mining of
all the information needed to power a natural language comprehension and
generation system, directly from a large, unannotated corpus.Comment: 29 pages, 5 figures, research proposa
Unsupervised Chunking with Hierarchical RNN
In Natural Language Processing (NLP), predicting linguistic structures, such
as parsing and chunking, has mostly relied on manual annotations of syntactic
structures. This paper introduces an unsupervised approach to chunking, a
syntactic task that involves grouping words in a non-hierarchical manner. We
present a two-layer Hierarchical Recurrent Neural Network (HRNN) designed to
model word-to-chunk and chunk-to-sentence compositions. Our approach involves a
two-stage training process: pretraining with an unsupervised parser and
finetuning on downstream NLP tasks. Experiments on the CoNLL-2000 dataset
reveal a notable improvement over existing unsupervised methods, enhancing
phrase F1 score by up to 6 percentage points. Further, finetuning with
downstream tasks results in an additional performance improvement.
Interestingly, we observe that the emergence of the chunking structure is
transient during the neural model's downstream-task training. This study
contributes to the advancement of unsupervised syntactic structure discovery
and opens avenues for further research in linguistic theory
- …