10 research outputs found
Data fusion and matching by maximizing statistical dependencies
The core aim of machine learning is to make a computer program learn from the experience. Learning from data is usually defined as a task of learning regularities or patterns in data in order to extract useful information, or to learn the underlying concept. An important sub-field of machine learning is called multi-view learning where the task is to learn from multiple data sets or views describing the same underlying concept. A typical example of such scenario would be to study a biological concept using several biological measurements like gene expression, protein expression and metabolic profiles, or to classify web pages based on their content and the contents of their hyperlinks.
In this thesis, novel problem formulations and methods for multi-view learning are presented. The contributions include a linear data fusion approach during exploratory data analysis, a new measure to evaluate different kinds of representations for textual data, and an extension of multi-view learning for novel scenarios where the correspondence of samples in the different views or data sets is not known in advance. In order to infer the one-to-one correspondence of samples between two views, a novel concept of multi-view matching is proposed. The matching algorithm is completely data-driven and is demonstrated in several applications such as matching of metabolites between humans and mice, and matching of sentences between documents in two languages.Koneoppimisessa pyritään luomaan tietokoneohjelmia, jotka oppivat kokemuksen kautta. Tehtävänä on usein oppia tietoaineistoista säännönmukaisuuksia joiden avulla saadaan uutta tietoa aineiston taustalla olevasta ilmiöstä ja voidaan ymmärtää ilmiötä paremmin. Eräs keskeinen koneoppimisen alahaara käsittelee oppimista useita samaa ilmiötä käsitteleviä tietoaineistoja yhdistelemällä. Tavoitteena voi olla esimerkiksi solutason biologisen ilmiön ymmärtäminen tarkastelemalla geenien aktiivisuusmittauksia, proteiinien konsentraatioita ja metabolista aktiivisuutta samanaikaisesti. Toisena esimerkkinä verkkosivuja voidaan luokitella samanaikaisesti sekä niiden tekstisisällön että hyperlinkkirakenteen perusteella.
Tässä väitöskirjassa esitellään uusia periaatteita ja menetelmiä useiden tietolähteiden yhdistelemiseen. Työn päätuloksina esitellään lineaarinen tietoaineistojen yhdistelemismenetelmä tutkivaan analysiin, uusi menetelmä tekstiaineistojen erilaisten esitystapojen vertailuun sekä uusi yhdistelemisperiaate tilanteisiin joissa aineistojen näytteiden vastaavuutta toisiinsa ei tunneta ennalta. Työssä esitetään kuinka vastaavuus voidaan oppia tietoaineistoista itsestään, ilman ulkopuolista ohjausta. Uutta menetelmää sovelletaan työssä esimerkiksi hakemaan vastaavuuksia ihmisten ja hiirten metaboliamittauksista sekä etsimään samaa merkitseviä lauseita kahdella eri kielellä kirjoitetuista teksteistä
Multiview Learning with Sparse and Unannotated data.
PhD ThesisObtaining annotated training data for supervised learning, is a bottleneck in many
contemporary machine learning applications. The increasing prevalence of multi-modal
and multi-view data creates both new opportunities for circumventing this issue, and
new application challenges. In this thesis we explore several approaches to alleviating
annotation issues in multi-view scenarios.
We start by studying the problem of zero-shot learning (ZSL) for image recognition,
where class-level annotations for image recognition are eliminated by transferring information
from text modality instead. We next look at cross-modal matching, where
paired instances across views provide the supervised label information for learning. We
develop methodology for unsupervised and semi-supervised learning of pairing, thus
eliminating the need for annotation requirements.
We rst apply these ideas to unsupervised multi-view matching in the context of
bilingual dictionary induction (BLI), where instances are words in two languages and
nding a correspondence between the words produces a cross-lingual word translation
model. We then return to vision and language and look at learning unsupervised pairing
between images and text. We will see that this can be seen as a limiting case of ZSL
where text-image pairing annotation requirements are completely eliminated.
Overall these contributions in multi-view learning provide a suite of methods for
reducing annotation requirements: both in conventional classi cation and cross-view
matching settings
Bandits on graphs and structures
We investigate the structural properties of certain sequential decision-making problems with limited feedback (bandits) in order to bring the known algorithmic solutions closer to a practical use. In the first part, we put a special emphasis on structures that can be represented as graphs on actions, in the second part we study the large action spaces that can be of exponential size in the number of base actions or even infinite. We show how to take advantage of structures over the actions and (provably) learn faster
Recommended from our members
Multi-instance multi-label learning : algorithms and applications to bird bioacoustics
We consider the problem of supervised classification of bird species from audio recordings in a real-world acoustic monitoring scenario (i.e. audio data is collected in the field with an omnidirectional microphone, without human supervision). Obtaining better data about bird activity can assist conservation efforts, and improve our understanding of their interactions with the environment and other organisms. However, traditional observation methods are labor- intensive. Most prior work on machine learning for bird song is not applicable to real-world acoustic monitoring, because it assumes recordings contain only a single species of bird, while recordings typically contain multiple simultaneously vocalizing birds. We propose to use the multi-instance multi-label (MIML) framework in machine learning for the species classification problem, where the dataset is viewed as a collection of bags of instances paired with sets of labels. Furthermore, we formalize MIML instance annotation, where the goal is to predict instance labels while learning only from bag label sets. We develop the first MIML representation for audio, and several new algorithms for MIML instance annotation based on support vector machines or classifier chains. The proposed methods classify either the set of species present in a recording, or individual calls, while learning only from recordings paired with a set of species. This form of training data requires less human effort to obtain than individually labeled calls. These methods are successfully applied to audio collected in the field which included multiple simultaneously vocalizing species. The proposed algorithms for MIML classification are general, and are also applied to object recognition in images
Uncertainty modeling : fundamental concepts and models
This book series represents a commendable effort in compiling the latest developments on three important Engineering subjects: discrete modeling, inverse methods, and uncertainty structural integrity. Although academic publications on these subjects are
plenty, this book series may be the first time that these modern topics are compiled together, grouped in volumes, and made available for the community. The application of numerical or analytical techniques to model complex Engineering problems, fed by experimental data, usually translated in the form of stochastic information collected from the problem in hand, is much closer to real-world situations than the conventional solution of PDEs. Moreover, inverse problems are becoming almost as common as direct problems, given the need in the industry to maintain current processes working efficiently, as well as to create new solutions based on the immense amount of information available digitally these days. On top of all this, deterministic analysis is slowly giving space to statistically driven structural analysis, delivering upper and lower bound solutions which help immensely the analyst in the decisionmaking process. All these trends have been topics of investigation for decades, and in recent years the
application of these methods in the industry proves that they have achieved the necessary maturity to be definitely incorporated into the roster of modern Engineering tools. The present book series fulfills its role by collecting and organizing these topics,
found otherwise scattered in the literature and not always accessible to industry. Moreover, many of the chapters compiled in these books present ongoing research topics conducted by capable fellows from academia and research institutes. They contain novel contributions to several investigation fields and constitute therefore a useful source of bibliographical reference and results repository. The Latin American Journal of Solids and Structures (LAJSS) is honored in supporting the publication of this book series, for it contributes academically and carries technologically significant content in the field of structural mechanics