27 research outputs found
The impact of quantitative optimization of hybridization conditions on gene expression analysis.
BACKGROUND: With the growing availability of entire genome sequences, an increasing number of scientists can exploit oligonucleotide microarrays for genome-scale expression studies. While probe-design is a major research area, relatively little work has been reported on the optimization of microarray protocols. RESULTS: As shown in this study, suboptimal conditions can have considerable impact on biologically relevant observations. For example, deviation from the optimal temperature by one degree Celsius lead to a loss of up to 44% of differentially expressed genes identified. While genes from thousands of Gene Ontology categories were affected, transcription factors and other low-copy-number regulators were disproportionately lost. Calibrated protocols are thus required in order to take full advantage of the large dynamic range of microarrays.For an objective optimization of protocols we introduce an approach that maximizes the amount of information obtained per experiment. A comparison of two typical samples is sufficient for this calibration. We can ensure, however, that optimization results are independent of the samples and the specific measures used for calibration. Both simulations and spike-in experiments confirmed an unbiased determination of generally optimal experimental conditions. CONCLUSIONS: Well calibrated hybridization conditions are thus easily achieved and necessary for the efficient detection of differential expression. They are essential for the sensitive pro filing of low-copy-number molecules. This is particularly critical for studies of transcription factor expression, or the inference and study of regulatory networks.RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are
Predictive monitoring of shake flask cultures with online estimated growth models
Simplicity renders shake flasks ideal for strain selection and substrate optimization in biotechnology. Uncertainty during initial experiments may, however, cause adverse growth conditions and mislead conclusions. Using growth models for online predictions of future biomass (BM) and the arrival of critical events like low dissolved oxygen (DO) levels or when to harvest is hence important to optimize protocols. Established knowledge that unfavorable metabolites of growing microorganisms interfere with the substrate suggests that growth dynamics and, as a consequence, the growth model parameters may vary in the course of an experiment. Predictive monitoring of shake flask cultures will therefore benefit from estimating growth model parameters in an online and adaptive manner. This paper evaluates a newly developed particle filter (PF) which is specifically tailored to the requirements of biotechnological shake flask experiments. By combining stationary accuracy with fast adaptation to change the proposed PF estimates time-varying growth model parameters from iteratively measured BM and DO sensor signals in an optimal manner. Such proposition of inferring time varying parameters of Gompertz and Logistic growth models is to our best knowledge novel and here for the first time assessed for predictive monitoring of Escherichia coli (E. coli) shake flask experiments. Assessments that mimic real-time predictions of BM and DO levels under previously untested growth conditions demonstrate the efficacy of the approach. After allowing for an initialization phase where the PF learns appropriate model parameters, we obtain accurate predictions of future BM and DO levels and important temporal characteristics like when to harvest. Statically parameterized growth models that represent the dynamics of a specific setting will in general provide poor characterizations of the dynamics when we change strain or substrate. The proposed approach is thus an important innovation for scientists working on strain characterization and substrate optimization as providing accurate forecasts will improve reproducibility and efficiency in early-stage bioprocess development
An analysis of single amino acid repeats as use case for application specific background models
Background
Sequence analysis aims to identify biologically relevant signals against a backdrop of functionally meaningless variation. Increasingly, it is recognized that the quality of the background model directly affects the performance of analyses. State-of-the-art approaches rely on classical sequence models that are adapted to the studied dataset. Although performing well in the analysis of globular protein domains, these models break down in regions of stronger compositional bias or low complexity. While these regions are typically filtered, there is increasing anecdotal evidence of functional roles. This motivates an exploration of more complex sequence models and application-specific approaches for the investigation of biased regions.
Results
Traditional Markov-chains and application-specific regression models are compared using the example of predicting runs of single amino acids, a particularly simple class of biased regions. Cross-fold validation experiments reveal that the alternative regression models capture the multi-variate trends well, despite their low dimensionality and in contrast even to higher-order Markov-predictors. We show how the significance of unusual observations can be computed for such empirical models. The power of a dedicated model in the detection of biologically interesting signals is then demonstrated in an analysis identifying the unexpected enrichment of contiguous leucine-repeats in signal-peptides. Considering different reference sets, we show how the question examined actually defines what constitutes the 'background'. Results can thus be highly sensitive to the choice of appropriate model training sets. Conversely, the choice of reference data determines the questions that can be investigated in an analysis.
Conclusions
Using a specific case of studying biased regions as an example, we have demonstrated that the construction of application-specific background models is both necessary and feasible in a challenging sequence analysis situation
Bayesian Classifiers are Large Margin Hyperplanes in a Hilbert Space
Bayesian algorithms for Neural Networks are known to produce classifiers which are very resistant to overfitting. It is often claimed that one of the main distinctive features of Bayesian Learning Algorithms is that they don't simply output one hypothesis, but rather an entire distribution of probability over an hypothesis set: the Bayes posterior. An alternative perspective is that they output a linear combination of classifiers, whose coefficients are given by Bayes theorem. One of the concepts used to deal with thresholded convex combinations is the `margin' of the hyperplane with respect to the training sample, which is correlated to the predictive power of the hypothesis itself. We provide a novel theoretical analysis of such classifiers, based on Data-Dependent VC theory, proving that they can be expected to be large margin hyperplanes in a Hilbert space. We then present experimental evidence that the predictions of our model are correct, i.e. that bayesian classifiers really fin..
Equivalent Error Bars For Neural Network Classifiers Trained By Bayesian Inference
The topic of this paper is the problem of outlier detection for neural networks trained by Bayesian inference. I will show that marginalization is not a good method to get moderated probabilities for classes in outlying regions. The reason why marginalization fails to indicate outliers is analysed and an alternative measure, that is a more reliable indicator for outliers, is proposed. A simple artificial classification problem is used to visualize the differences. Finally both methods are used to classify a real world problem, where outlier detection is mandatory. 1 Introduction Neural networks are often used in safety-critical applications for regression or classification purpose. Since neural networks are unable to extrapolate into regions not covered by the training data (see [6]), one should not use their predictions in such regions. Consequently methods for outlier detection got a lot of attraction. Outliers may be detected by assigning a confidence measure to network decisions. ..
Bayesian Time Series Classification
This paper proposes an approach to classification of adjacent segments of a time series as being either of K classes. We use a hierarchical model that consists of a feature extraction stage and a generative classifier which is built on top of these features. Such two stage approaches are often used in signal and image processing. The novel part of our work is that we link these stages probabilistically by using a latent feature space. To use one joint model is a Bayesian requirement, which has the advantage to fuse information according to its certainty
Programming Abstractions in C
We propose in this paper a probabilistic approach for adaptive inference of generalized nonlinear classification that combines the computational advantage of a parametric solution with the flexibility of sequential sampling techniques. We regard the parameters of the classifier as latent states in a first order Markov process and propose an algorithm which can be regarded as variational generalization of standard Kalman filtering. The variational Kalman filter is based on two novel lower bounds that enable us to use a non-degenerate distribution over the adaptation rate. An extensive empirical evaluation demonstrates that the proposed method is capable of infering competitive classifiers both in stationary and non-stationary environments. Although we focus on classification, the algorithm is easily extended to other generalized nonlinear models.
Bayesian Classifiers are Large Margin Hyperplanes in a Hilbert Space
Bayesian algorithms for Neural Networks are known to produce classifiers which are very resistent to overfitting. It is often claimed that one of the main distinctive features of Bayesian Learning Algorithms is that they don't simply output one hypothesis, but rather an entire distribution of probability over an hypothesis set: the Bayes posterior. An alternative perspective is that they output a linear combination of classifiers, whose coefficients are given by Bayes theorem. One of the concepts used to deal with thresholded convex combinations is the `margin' of the hyperplane with respect to the training sample, which is correlated to the predictive power of the hypothesis itself. We provide a novel theoretical analysis of such classifiers, based on Data-Dependent VC theory, proving that they can be expected to be large margin hyperplanes in a Hilbert space. We then present experimental evidence that the predictions of our model are correct, i.e. that bayesian classifers really find..