130,349 research outputs found
Robust correlated and individual component analysis
© 1979-2012 IEEE.Recovering correlated and individual components of two, possibly temporally misaligned, sets of data is a fundamental task in disciplines such as image, vision, and behavior computing, with application to problems such as multi-modal fusion (via correlated components), predictive analysis, and clustering (via the individual ones). Here, we study the extraction of correlated and individual components under real-world conditions, namely i) the presence of gross non-Gaussian noise and ii) temporally misaligned data. In this light, we propose a method for the Robust Correlated and Individual Component Analysis (RCICA) of two sets of data in the presence of gross, sparse errors. We furthermore extend RCICA in order to handle temporal incongruities arising in the data. To this end, two suitable optimization problems are solved. The generality of the proposed methods is demonstrated by applying them onto 4 applications, namely i) heterogeneous face recognition, ii) multi-modal feature fusion for human behavior analysis (i.e., audio-visual prediction of interest and conflict), iii) face clustering, and iv) thetemporal alignment of facial expressions. Experimental results on 2 synthetic and 7 real world datasets indicate the robustness and effectiveness of the proposed methodson these application domains, outperforming other state-of-the-art methods in the field
Integrative clustering by non-negative matrix factorization can reveal coherent functional groups from gene profile data
Recent developments in molecular biology and tech- niques for genome-wide data acquisition have resulted in abun- dance of data to profile genes and predict their function. These data sets may come from diverse sources and it is an open question how to commonly address them and fuse them into a joint prediction model. A prevailing technique to identify groups of related genes that exhibit similar profiles is profile-based clustering. Cluster inference may benefit from consensus across different clustering models. In this paper we propose a technique that develops separate gene clusters from each of available data sources and then fuses them by means of non-negative matrix factorization. We use gene profile data on the budding yeast S. cerevisiae to demonstrate that this approach can successfully integrate heterogeneous data sets and yields high-quality clusters that could otherwise not be inferred by simply merging the gene profiles prior to clustering
Integrative clustering by non-negative matrix factorization can reveal coherent functional groups from gene profile data
Recent developments in molecular biology and tech- niques for genome-wide data acquisition have resulted in abun- dance of data to profile genes and predict their function. These data sets may come from diverse sources and it is an open question how to commonly address them and fuse them into a joint prediction model. A prevailing technique to identify groups of related genes that exhibit similar profiles is profile-based clustering. Cluster inference may benefit from consensus across different clustering models. In this paper we propose a technique that develops separate gene clusters from each of available data sources and then fuses them by means of non-negative matrix factorization. We use gene profile data on the budding yeast S. cerevisiae to demonstrate that this approach can successfully integrate heterogeneous data sets and yields high-quality clusters that could otherwise not be inferred by simply merging the gene profiles prior to clustering
MLCut : exploring Multi-Level Cuts in dendrograms for biological data
Choosing a single similarity threshold for cutting dendrograms is not sufficient for performing hierarchical clustering analysis of heterogeneous data sets. In addition, alternative automated or semi-automated methods that cut dendrograms in multiple levels make assumptions about the data in hand. In an attempt to help the user to find patterns in the data and resolve ambiguities in cluster assignments, we developed MLCut: a tool that provides visual support for exploring dendrograms of heterogeneous data sets in different levels of detail. The interactive exploration of the dendrogram is coordinated with a representation of the original data, shown as parallel coordinates. The tool supports three analysis steps. Firstly, a single-height similarity threshold can be applied using a dynamic slider to identify the main clusters. Secondly, a distinctiveness threshold can be applied using a second dynamic slider to identify “weak-edges” that indicate heterogeneity within clusters. Thirdly, the user can drill-down to further explore the dendrogram structure - always in relation to the original data - and cut the branches of the tree at multiple levels. Interactive drill-down is supported using mouse events such as hovering, pointing and clicking on elements of the dendrogram. Two prototypes of this tool have been developed in collaboration with a group of biologists for analysing their own data sets. We found that enabling the users to cut the tree at multiple levels, while viewing the effect in the original data, isa promising method for clustering which could lead to scientific discoveries.Postprin
Learning Bayesian Networks with Heterogeneous Agronomic Data Sets via Mixed-Effect Models and Hierarchical Clustering
Research involving diverse but related data sets, where associations between
covariates and outcomes may vary, is prevalent in various fields including
agronomic studies. In these scenarios, hierarchical models, also known as
multilevel models, are frequently employed to assimilate information from
different data sets while accommodating their distinct characteristics.
However, their structure extend beyond simple heterogeneity, as variables often
form complex networks of causal relationships.
Bayesian networks (BNs) provide a powerful framework for modelling such
relationships using directed acyclic graphs to illustrate the connections
between variables. This study introduces a novel approach that integrates
random effects into BN learning. Rooted in linear mixed-effects models, this
approach is particularly well-suited for handling hierarchical data. Results
from a real-world agronomic trial suggest that employing this approach enhances
structural learning, leading to the discovery of new connections and the
improvement of improved model specification. Furthermore, we observe a
reduction in prediction errors from 28\% to 17\%. By extending the
applicability of BNs to complex data set structures, this approach contributes
to the effective utilisation of BNs for hierarchical agronomic data. This, in
turn, enhances their value as decision-support tools in the field.Comment: 28 pages, 5 figure
A Method for Integrating Heterogeneous Datasets based on GO Term Similarity
This thesis presents a method for integrating heterogeneous gene/protein datasets at the functional level based on Gene Ontology term similarity. Often biologists want to integrate heterogeneous data sets obtain from different biological samples. A major challenge in this process is how to link the heterogeneous datasets. Currently, the most common approach is to link them through common reference database identifiers which tend to result in small number of matching identifiers. This is due to lack of standard accession schemes. Due to this problem, biologists may not recognize the underlying biological phenomena revealed by a combination of the data but by each data set individually. We discuss an approach for integrating heterogeneous datasets by computing the similarity among them based on the similarity of their GO annotations. Then we group the genes and/or proteins with similar annotations by applying a hierarchical clustering algorithm. The results demonstrate a more comprehensive understanding of the biological processes involved
- …