560 research outputs found

    Bayesian nonparametric clusterings in relational and high-dimensional settings with applications in bioinformatics.

    Get PDF
    Recent advances in high throughput methodologies offer researchers the ability to understand complex systems via high dimensional and multi-relational data. One example is the realm of molecular biology where disparate data (such as gene sequence, gene expression, and interaction information) are available for various snapshots of biological systems. This type of high dimensional and multirelational data allows for unprecedented detailed analysis, but also presents challenges in accounting for all the variability. High dimensional data often has a multitude of underlying relationships, each represented by a separate clustering structure, where the number of structures is typically unknown a priori. To address the challenges faced by traditional clustering methods on high dimensional and multirelational data, we developed three feature selection and cross-clustering methods: 1) infinite relational model with feature selection (FIRM) which incorporates the rich information of multirelational data; 2) Bayesian Hierarchical Cross-Clustering (BHCC), a deterministic approximation to Cross Dirichlet Process mixture (CDPM) and to cross-clustering; and 3) randomized approximation (RBHCC), based on a truncated hierarchy. An extension of BHCC, Bayesian Congruence Measuring (BCM), is proposed to measure incongruence between genes and to identify sets of congruent loci with identical evolutionary histories. We adapt our BHCC algorithm to the inference of BCM, where the intended structure of each view (congruent loci) represents consistent evolutionary processes. We consider an application of FIRM on categorizing mRNA and microRNA. The model uses latent structures to encode the expression pattern and the gene ontology annotations. We also apply FIRM to recover the categories of ligands and proteins, and to predict unknown drug-target interactions, where latent categorization structure encodes drug-target interaction, chemical compound similarity, and amino acid sequence similarity. BHCC and RBHCC are shown to have improved predictive performance (both in terms of cluster membership and missing value prediction) compared to traditional clustering methods. Our results suggest that these novel approaches to integrating multi-relational information have a promising future in the biological sciences where incorporating data related to varying features is often regarded as a daunting task

    An Unsupervised Approach to Modelling Visual Data

    Get PDF
    For very large visual datasets, producing expert ground-truth data for training supervised algorithms can represent a substantial human effort. In these situations there is scope for the use of unsupervised approaches that can model collections of images and automatically summarise their content. The primary motivation for this thesis comes from the problem of labelling large visual datasets of the seafloor obtained by an Autonomous Underwater Vehicle (AUV) for ecological analysis. It is expensive to label this data, as taxonomical experts for the specific region are required, whereas automatically generated summaries can be used to focus the efforts of experts, and inform decisions on additional sampling. The contributions in this thesis arise from modelling this visual data in entirely unsupervised ways to obtain comprehensive visual summaries. Firstly, popular unsupervised image feature learning approaches are adapted to work with large datasets and unsupervised clustering algorithms. Next, using Bayesian models the performance of rudimentary scene clustering is boosted by sharing clusters between multiple related datasets, such as regular photo albums or AUV surveys. These Bayesian scene clustering models are extended to simultaneously cluster sub-image segments to form unsupervised notions of “objects” within scenes. The frequency distribution of these objects within scenes is used as the scene descriptor for simultaneous scene clustering. Finally, this simultaneous clustering model is extended to make use of whole image descriptors, which encode rudimentary spatial information, as well as object frequency distributions to describe scenes. This is achieved by unifying the previously presented Bayesian clustering models, and in so doing rectifies some of their weaknesses and limitations. Hence, the final contribution of this thesis is a practical unsupervised algorithm for modelling images from the super-pixel to album levels, and is applicable to large datasets

    Real-Time Monophonic and Polyphonic Audio Classification from Power Spectra

    Get PDF
    International audienceThis work addresses the recurring challenge of real-time monophonic and polyphonic audio source classification. The whole normalized power spectrum (NPS) is directly involved in the proposed process, avoiding complex and hazardous traditional feature extraction. It is also a natural candidate for polyphonic events thanks to its additive property in such cases. The classification task is performed through a nonparametric kernel-based generative modeling of the power spectrum. Advantage of this model is twofold: it is almost hypothesis free and it allows to straightforwardly obtain the maximum a posteriori classification rule of online signals. Moreover it makes use of the monophonic dataset to build the polyphonic one. Then, to reach the real-time target, the complexity of the method can be tuned by using a standard hierarchical clustering preprocessing of the prototypes, revealing a particularly efficient computation time and classification accuracy trade-off. The proposed method, called RARE (for Real-time Audio Recognition Engine) reveals encouraging results both in monophonic and polyphonic classification tasks on benchmark and owned datasets, including also the targeted real-time situation. In particular, this method benefits from several advantages compared to the state-of-the-art methods including a reduced training time, no feature extraction, the ability to control the computation - accuracy trade-off and no training on already mixed sounds for polyphonic classification

    Learning motion patterns using hierarchical Bayesian models

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Cataloged from PDF version of thesis.Includes bibliographical references (p. 163-179).In far-field visual surveillance, one of the key tasks is to monitor activities in the scene. Through learning motion patterns of objects, computers can help people understand typical activities, detect abnormal activities, and learn the models of semantically meaningful scene structures, such as paths commonly taken by objects. In medical imaging, some issues similar to learning motion patterns arise. Diffusion Tensor Magnetic Resonance Imaging (DT-MRI) is one of the first methods to visualize and quantify the organization of white matter in the brain in vivo. Using methods of tractography segmentation, one can connect local diffusion measurements to create global fiber trajectories, which can then be clustered into anatomically meaningful bundles. This is similar to clustering trajectories of objects in visual surveillance. In this thesis, we develop several unsupervised frameworks to learn motion patterns from complicated and large scale data sets using hierarchical Bayesian models. We explore their applications to activity analysis in far-field visual surveillance and tractography segmentation in medical imaging. Many existing activity analysis approaches in visual surveillance are ad hoc, relying on predefined rules or simple probabilistic models, which prohibits them from modeling complicated activities. Our hierarchical Bayesian models can structure dependency among a large number of variables to model complicated activities. Various constraints and knowledge can be nicely added into a Bayesian framework as priors. When the number of clusters is not well defined in advance, our nonparametric Bayesian models can learn it driven by data with Dirichlet Processes priors.(cont.) In this work, several hierarchical Bayesian models are proposed considering different types of scenes and different settings of cameras. If the scenes are crowded, it is difficult to track objects because of frequent occlusions and difficult to separate different types of co-occurring activities. We jointly model simple activities and complicated global behaviors at different hierarchical levels directly from moving pixels without tracking objects. If the scene is sparse and there is only a single camera view, we first track objects and then cluster trajectories into different activity categories. In the meanwhile, we learn the models of paths commonly taken by objects. Under the Bayesian framework, using the models of activities learned from historical data as priors, the models of activities can be dynamically updated over time. When multiple camera views are used to monitor a large area, by adding a smoothness constraint as a prior, our hierarchical Bayesian model clusters trajectories in multiple camera views without tracking objects across camera views. The topology of multiple camera views is assumed to be unknown and arbitrary. In tractography segmentation, our approach can cluster much larger scale data sets than existing approaches and automatically learn the number of bundles from data. We demonstrate the effectiveness of our approaches on multiple visual surveillance and medical imaging data sets.by Xiaogang Wang.Ph.D
    corecore