194,576 research outputs found
Principal sequence pattern analysis: A new approach to classifying the evolution of atmospheric systems
A new eigentechnique approach, Principal Sequence Pattern Analysis (PSPA), is introduced for the analysis of spatial pattern sequence, as an extension of the traditional Principal Component Analysis set in the T-Mode. In this setting, the variables are sequences of k spatial fields of a given meteorological variable. PSPA is described and applied to a sample of 256 consecutive daily 1000 hPa geopotential height fields. The results of the application of the technique to 5-day sequences demonstrate the advantages of this procedure in identifying field pattern sequences, thereby allowing the determination of the evolution and development of the systems, together with cyclogenesis and anticyclogenesis processes. In order to complete the study, the more traditional Extended Empirical Orthogonal Function (EEOF) analysis, which is the S-mode equivalent of the PSPA, was applied to the same dataset. For EEOF, it was not possible to identify any real sequences that could correspond to the sequences of patterns yielded by the EEOF. Furthermore, the explained variance distribution in the EEOF was significantly different from that obtained with PSPA. Conversely, the PSPA approach allowed for the identification of the sequences corresponding to those sequences observed in the data. Using diagrams of the squares of the component loadings values, as a function of time, the study of the times of occurrence of dominant field characteristics was also possible. In other words, successful determination of periods where the basic flow was dominant and times when strongly perturbed transient events with a significant meridional component occurred, was facilitated by PSPA.Laboratorio de InvestigaciĂłn de Sistemas EcolĂłgicos y Ambientale
Causes that Make a Difference
Biologists studying complex causal systems typically identify some factors as causes and treat other factors as background conditions. For example, when geneticists explain biological phenomena, they often foreground genes and relegate the cellular milieu to the background. But factors in the milieu are as causally necessary as genes for the production of phenotypic traits, even traits at the molecular level such as amino acid sequences. Gene-centered biology has been criticized on the grounds that because there is parity among causes, the âprivilegingâ of genes reflects a reductionist bias, not an ontological difference. The idea that there is an ontological parity among causes is related to a philosophical puzzle identified by John Stuart Mill: what, other than our interests or biases, could possibly justify identifying some causes as the actual or operative ones, and other causes as mere background? The aim of this paper is to solve this conceptual puzzle and to explain why there is not an ontological parity among genes and the other factors. It turns out that solving this puzzle helps answer a seemingly unrelated philosophical question: what kind of causal generality matters in biology
Quantification of abnormal repetitive behaviour in captive European starlings (Sturnus vulgaris).
Stereotypies are repetitive, unvarying and goalless behaviour patterns that are often considered indicative of poor welfare in captive animals. Quantifying stereotypies can be difficult, particularly during the early stages of their development when behaviour is still flexible. We compared two methods for objectively quantifying the development of route-tracing stereotypies in caged starlings. We used Markov chains and T-pattern analysis (implemented by the software package, Theme) to identify patterns in the sequence of locations a bird occupied within its cage. Pattern metrics produced by both methods correlated with the frequency of established measures of stereotypic behaviour and abnormal behaviour patterns counted from video recordings, suggesting that both methods could be useful for identifying stereotypic individuals and quantifying stereotypic behaviour. We discuss the relative benefits and disadvantages of the two approaches
DNA ANALYSIS USING GRAMMATICAL INFERENCE
An accurate language definition capable of distinguishing between coding and non-coding DNA has important applications and analytical significance to the field of computational biology. The method proposed here uses positive sample grammatical inference and statistical information to infer languages for coding DNA.
An algorithm is proposed for the searching of an optimal subset of input sequences for the inference of regular grammars by optimizing a relevant accuracy metric. The algorithm does not guarantee the finding of the optimal subset; however, testing shows improvement in accuracy and performance over the basis algorithm.
Testing shows that the accuracy of inferred languages for components of DNA are consistently accurate. By using the proposed algorithm languages are inferred for coding DNA with average conditional probability over 80%. This reveals that languages for components of DNA can be inferred and are useful independent of the process that created them. These languages can then be analyzed or used for other tasks in computational biology.
To illustrate potential applications of regular grammars for DNA components, an inferred language for exon sequences is applied as post processing to Hidden Markov exon prediction to reduce the number of wrong exons detected and improve the specificity of the model significantly
Selection of sequence motifs and generative Hopfield-Potts models for protein familiesilies
Statistical models for families of evolutionary related proteins have
recently gained interest: in particular pairwise Potts models, as those
inferred by the Direct-Coupling Analysis, have been able to extract information
about the three-dimensional structure of folded proteins, and about the effect
of amino-acid substitutions in proteins. These models are typically requested
to reproduce the one- and two-point statistics of the amino-acid usage in a
protein family, {\em i.e.}~to capture the so-called residue conservation and
covariation statistics of proteins of common evolutionary origin. Pairwise
Potts models are the maximum-entropy models achieving this. While being
successful, these models depend on huge numbers of {\em ad hoc} introduced
parameters, which have to be estimated from finite amount of data and whose
biophysical interpretation remains unclear. Here we propose an approach to
parameter reduction, which is based on selecting collective sequence motifs. It
naturally leads to the formulation of statistical sequence models in terms of
Hopfield-Potts models. These models can be accurately inferred using a mapping
to restricted Boltzmann machines and persistent contrastive divergence. We show
that, when applied to protein data, even 20-40 patterns are sufficient to
obtain statistically close-to-generative models. The Hopfield patterns form
interpretable sequence motifs and may be used to clusterize amino-acid
sequences into functional sub-families. However, the distributed collective
nature of these motifs intrinsically limits the ability of Hopfield-Potts
models in predicting contact maps, showing the necessity of developing models
going beyond the Hopfield-Potts models discussed here.Comment: 26 pages, 16 figures, to app. in PR
Mining State-Based Models from Proof Corpora
Interactive theorem provers have been used extensively to reason about
various software/hardware systems and mathematical theorems. The key challenge
when using an interactive prover is finding a suitable sequence of proof steps
that will lead to a successful proof requires a significant amount of human
intervention. This paper presents an automated technique that takes as input
examples of successful proofs and infers an Extended Finite State Machine as
output. This can in turn be used to generate proofs of new conjectures. Our
preliminary experiments show that the inferred models are generally accurate
(contain few false-positive sequences) and that representing existing proofs in
such a way can be very useful when guiding new ones.Comment: To Appear at Conferences on Intelligent Computer Mathematics 201
Identifying statistical dependence in genomic sequences via mutual information estimates
Questions of understanding and quantifying the representation and amount of
information in organisms have become a central part of biological research, as
they potentially hold the key to fundamental advances. In this paper, we
demonstrate the use of information-theoretic tools for the task of identifying
segments of biomolecules (DNA or RNA) that are statistically correlated. We
develop a precise and reliable methodology, based on the notion of mutual
information, for finding and extracting statistical as well as structural
dependencies. A simple threshold function is defined, and its use in
quantifying the level of significance of dependencies between biological
segments is explored. These tools are used in two specific applications. First,
for the identification of correlations between different parts of the maize
zmSRp32 gene. There, we find significant dependencies between the 5'
untranslated region in zmSRp32 and its alternatively spliced exons. This
observation may indicate the presence of as-yet unknown alternative splicing
mechanisms or structural scaffolds. Second, using data from the FBI's Combined
DNA Index System (CODIS), we demonstrate that our approach is particularly well
suited for the problem of discovering short tandem repeats, an application of
importance in genetic profiling.Comment: Preliminary version. Final version in EURASIP Journal on
Bioinformatics and Systems Biology. See http://www.hindawi.com/journals/bsb
- âŠ