830 research outputs found
Applying weighted network measures to microarray distance matrices
In recent work we presented a new approach to the analysis of weighted
networks, by providing a straightforward generalization of any network measure
defined on unweighted networks. This approach is based on the translation of a
weighted network into an ensemble of edges, and is particularly suited to the
analysis of fully connected weighted networks. Here we apply our method to
several such networks including distance matrices, and show that the clustering
coefficient, constructed by using the ensemble approach, provides meaningful
insights into the systems studied. In the particular case of two data sets from
microarray experiments the clustering coefficient identifies a number of
biologically significant genes, outperforming existing identification
approaches.Comment: Accepted for publication in J. Phys.
Non-equilibrium dynamics of gene expression and the Jarzynski equality
In order to express specific genes at the right time, the transcription of
genes is regulated by the presence and absence of transcription factor
molecules. With transcription factor concentrations undergoing constant
changes, gene transcription takes place out of equilibrium. In this paper we
discuss a simple mapping between dynamic models of gene expression and
stochastic systems driven out of equilibrium. Using this mapping, results of
nonequilibrium statistical mechanics such as the Jarzynski equality and the
fluctuation theorem are demonstrated for gene expression dynamics. Applications
of this approach include the determination of regulatory interactions between
genes from experimental gene expression data
Dynamics of gene expression and the regulatory inference problem
From the response to external stimuli to cell division and death, the
dynamics of living cells is based on the expression of specific genes at
specific times. The decision when to express a gene is implemented by the
binding and unbinding of transcription factor molecules to regulatory DNA.
Here, we construct stochastic models of gene expression dynamics and test them
on experimental time-series data of messenger-RNA concentrations. The models
are used to infer biophysical parameters of gene transcription, including the
statistics of transcription factor-DNA binding and the target genes controlled
by a given transcription factor.Comment: revised version to appear in Europhys. Lett., new titl
A single-sample method for normalizing and combining full-resolution copy numbers from multiple platforms, labs and analysis methods
Motivation: The rapid expansion of whole-genome copy number (CN) studies brings a demand for increased precision and resolution of CN estimates. Recent studies have obtained CN estimates from more than one platform for the same set of samples, and it is natural to want to combine the different estimates in order to meet this demand. Estimates from different platforms show different degrees of attenuation of the true CN changes. Similar differences can be observed in CNs from the same platform run in different labs, or in the same lab, with different analytical methods. This is the reason why it is not straightforward to combine CN estimates from different sources (platforms, labs and analysis methods)
The Iterative Signature Algorithm for the analysis of large scale gene expression data
We present a new approach for the analysis of genome-wide expression data.
Our method is designed to overcome the limitations of traditional techniques,
when applied to large-scale data. Rather than alloting each gene to a single
cluster, we assign both genes and conditions to context-dependent and
potentially overlapping transcription modules. We provide a rigorous definition
of a transcription module as the object to be retrieved from the expression
data. An efficient algorithm, that searches for the modules encoded in the data
by iteratively refining sets of genes and conditions until they match this
definition, is established. Each iteration involves a linear map, induced by
the normalized expression matrix, followed by the application of a threshold
function. We argue that our method is in fact a generalization of Singular
Value Decomposition, which corresponds to the special case where no threshold
is applied. We show analytically that for noisy expression data our approach
leads to better classification due to the implementation of the threshold. This
result is confirmed by numerical analyses based on in-silico expression data.
We discuss briefly results obtained by applying our algorithm to expression
data from the yeast S. cerevisiae.Comment: Latex, 36 pages, 8 figure
Improved functional prediction of proteins by learning kernel combinations in multilabel settings
Background
We develop a probabilistic model for combining kernel matrices to predict the function of proteins. It extends previous approaches in that it can handle multiple labels which naturally appear in the context of protein function.
Results
Explicit modeling of multilabels significantly improves the capability of learning protein function from multiple kernels. The performance and the interpretability of the inference model are further improved by simultaneously predicting the subcellular localization of proteins and by combining pairwise classifiers to consistent class membership estimates.
Conclusion
For the purpose of functional prediction of proteins, multilabels provide valuable information that should be included adequately in the training process of classifiers. Learning of functional categories gains from co-prediction of subcellular localization. Pairwise separation rules allow very detailed insights into the relevance of different measurements like sequence, structure, interaction data, or expression data. A preliminary version of the software can be downloaded from http://www.inf.ethz.ch/personal/vroth/KernelHMM/.ISSN:1471-210
SMART: Unique splitting-while-merging framework for gene clustering
Copyright @ 2014 Fa et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted
use, distribution, and reproduction in any medium, provided the original author and source are credited.Successful clustering algorithms are highly dependent on parameter settings. The clustering performance degrades significantly unless parameters are properly set, and yet, it is difficult to set these parameters a priori. To address this issue, in this paper, we propose a unique splitting-while-merging clustering framework, named “splitting merging awareness tactics” (SMART), which does not require any a priori knowledge of either the number of clusters or even the possible range of this number. Unlike existing self-splitting algorithms, which over-cluster the dataset to a large number of clusters and then merge some similar clusters, our framework has the ability to split and merge clusters automatically during the process and produces the the most reliable clustering results, by intrinsically integrating many clustering techniques and tasks. The SMART framework is implemented with two distinct clustering paradigms in two algorithms: competitive learning and finite mixture model. Nevertheless, within the proposed SMART framework, many other algorithms can be derived for different clustering paradigms. The minimum message length algorithm is integrated into the framework as the clustering selection criterion. The usefulness of the SMART framework and its algorithms is tested in demonstration datasets and simulated gene expression datasets. Moreover, two real microarray gene expression datasets are studied using this approach. Based on the performance of many metrics, all numerical results show that SMART is superior to compared existing self-splitting algorithms and traditional algorithms. Three main properties of the proposed SMART framework are summarized as: (1) needing no parameters dependent on the respective dataset or a priori knowledge about the datasets, (2) extendible to many different applications, (3) offering superior performance compared with counterpart algorithms.National Institute for Health Researc
An assessment of the Jenkinson and Collison synoptic classification to a continental mid-latitude location
A weather-type catalogue based on the Jenkinson and Collison method was developed for an area in south-west Russia for the period 1961--2010. Gridded sea level pressure data was obtained from the National Centers for Environmental Prediction/National Center for Atmospheric Research (NCEP/NCAR) reanalysis. The resulting catalogue was analysed for frequency of individual types and groups of weather types to characterise long-term atmospheric circulation in this region. Overall, the most frequent type is anticyclonic (A) (23.3 {%}) followed by cyclonic (C) (11.9 {%}); however, there are some key seasonal patterns with westerly circulation being significantly more common in winter than summer. The utility of this synoptic classification is evaluated by modelling daily rainfall amounts. A low level of error is found using a simple model based on the prevailing weather type. Finally, characteristics of the circulation classification are compared to those for the original JC British Isles catalogue and a much more equal distribution of flow types is seen in the former classification
- …