162 research outputs found
Image sense disambiguation : a multimodal approach
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Cataloged from PDF version of thesis.Includes bibliographical references (p. 131-136).If a picture is worth a thousand words, can a thousand words be worth a training image? Most successful object recognition algorithms require manually annotated images of objects to be collected for training. The amount of human effort required to collect training data has limited most approaches to the several hundred object categories available in the labeled datasets. While human-annotated image data is scarce, additional sources of information can be used as weak labels, reducing the need for human supervision. In this thesis, we use three types of information to learn models of object categories: speech, text and dictionaries. We demonstrate that our use of non-traditional information sources facilitates automatic acquisition of visual object models for arbitrary words without requiring any labeled image examples. Spoken object references occur in many scenarios: interaction with an assistant robot, voice-tagging of photos, etc. Existing reference resolution methods are unimodal, relying either only on image features, or only on speech recognition. We propose a method that uses both the image of the object and the speech segment referring to it to disambiguate the underlying object label. We show that even noisy speech input helps visual recognition, and vice versa. We also explore two sources of linguistic sense information: the words surrounding images on web pages, and dictionary entries for nouns that refer to objects. Keywords that index images on the web have been used as weak object labels, but these tend to produce noisy datasets with many unrelated images. We use unlabeled text, dictionary definitions, and semantic relations between concepts to learn a refined model of image sense. Our model can work with as little supervision as a single English word. We apply this model to a dataset of web images indexed by polysemous keywords, and show that it improves both retrieval of specific senses, and the resulting object classifiers.by Kate Saenko.Ph.D
Tune your brown clustering, please
Brown clustering, an unsupervised hierarchical clustering technique based on ngram mutual information, has proven useful in many NLP applications. However, most uses of Brown clustering employ the same default configuration; the appropriateness of this configuration has gone predominantly unexplored. Accordingly, we present information for practitioners on the behaviour of Brown clustering in order to assist hyper-parametre tuning, in the form of a theoretical model of Brown clustering utility. This model is then evaluated empirically in two sequence labelling tasks over two text types. We explore the dynamic between the input corpus size, chosen number of classes, and quality of the resulting clusters, which has an impact for any approach using Brown clustering. In every scenario that we examine, our results reveal that the values most commonly used for the clustering are sub-optimal
Semantic Audio Analysis Utilities and Applications.
PhDExtraction, representation, organisation and application of metadata about audio recordings
are in the concern of semantic audio analysis. Our broad interpretation, aligned with recent
developments in the field, includes methodological aspects of semantic audio, such as
those related to information management, knowledge representation and applications of the
extracted information. In particular, we look at how Semantic Web technologies may be used
to enhance information management practices in two audio related areas: music informatics
and music production.
In the first area, we are concerned with music information retrieval (MIR) and related
research. We examine how structured data may be used to support reproducibility and
provenance of extracted information, and aim to support multi-modality and context adaptation
in the analysis. In creative music production, our goals can be summarised as follows:
O↵-the-shelf sound editors do not hold appropriately structured information about the edited
material, thus human-computer interaction is inefficient. We believe that recent developments
in sound analysis and music understanding are capable of bringing about significant improvements
in the music production workflow. Providing visual cues related to music structure can
serve as an example of intelligent, context-dependent functionality.
The central contributions of this work are a Semantic Web ontology for describing recording
studios, including a model of technological artefacts used in music production, methodologies
for collecting data about music production workflows and describing the work of
audio engineers which facilitates capturing their contribution to music production, and finally
a framework for creating Web-based applications for automated audio analysis. This
has applications demonstrating how Semantic Web technologies and ontologies can facilitate
interoperability between music research tools, and the creation of semantic audio software, for
instance, for music recommendation, temperament estimation or multi-modal music tutorin
Extracting Temporal and Causal Relations between Events
Structured information resulting from temporal information processing is
crucial for a variety of natural language processing tasks, for instance to
generate timeline summarization of events from news documents, or to answer
temporal/causal-related questions about some events. In this thesis we present
a framework for an integrated temporal and causal relation extraction system.
We first develop a robust extraction component for each type of relations, i.e.
temporal order and causality. We then combine the two extraction components
into an integrated relation extraction system, CATENA---CAusal and Temporal
relation Extraction from NAtural language texts---, by utilizing the
presumption about event precedence in causality, that causing events must
happened BEFORE resulting events. Several resources and techniques to improve
our relation extraction systems are also discussed, including word embeddings
and training data expansion. Finally, we report our adaptation efforts of
temporal information processing for languages other than English, namely
Italian and Indonesian.Comment: PhD Thesi
- …