194,576 research outputs found

    Principal sequence pattern analysis: A new approach to classifying the evolution of atmospheric systems

    Get PDF
    A new eigentechnique approach, Principal Sequence Pattern Analysis (PSPA), is introduced for the analysis of spatial pattern sequence, as an extension of the traditional Principal Component Analysis set in the T-Mode. In this setting, the variables are sequences of k spatial fields of a given meteorological variable. PSPA is described and applied to a sample of 256 consecutive daily 1000 hPa geopotential height fields. The results of the application of the technique to 5-day sequences demonstrate the advantages of this procedure in identifying field pattern sequences, thereby allowing the determination of the evolution and development of the systems, together with cyclogenesis and anticyclogenesis processes. In order to complete the study, the more traditional Extended Empirical Orthogonal Function (EEOF) analysis, which is the S-mode equivalent of the PSPA, was applied to the same dataset. For EEOF, it was not possible to identify any real sequences that could correspond to the sequences of patterns yielded by the EEOF. Furthermore, the explained variance distribution in the EEOF was significantly different from that obtained with PSPA. Conversely, the PSPA approach allowed for the identification of the sequences corresponding to those sequences observed in the data. Using diagrams of the squares of the component loadings values, as a function of time, the study of the times of occurrence of dominant field characteristics was also possible. In other words, successful determination of periods where the basic flow was dominant and times when strongly perturbed transient events with a significant meridional component occurred, was facilitated by PSPA.Laboratorio de InvestigaciĂłn de Sistemas EcolĂłgicos y Ambientale

    Causes that Make a Difference

    Get PDF
    Biologists studying complex causal systems typically identify some factors as causes and treat other factors as background conditions. For example, when geneticists explain biological phenomena, they often foreground genes and relegate the cellular milieu to the background. But factors in the milieu are as causally necessary as genes for the production of phenotypic traits, even traits at the molecular level such as amino acid sequences. Gene-centered biology has been criticized on the grounds that because there is parity among causes, the “privileging” of genes reflects a reductionist bias, not an ontological difference. The idea that there is an ontological parity among causes is related to a philosophical puzzle identified by John Stuart Mill: what, other than our interests or biases, could possibly justify identifying some causes as the actual or operative ones, and other causes as mere background? The aim of this paper is to solve this conceptual puzzle and to explain why there is not an ontological parity among genes and the other factors. It turns out that solving this puzzle helps answer a seemingly unrelated philosophical question: what kind of causal generality matters in biology

    Quantification of abnormal repetitive behaviour in captive European starlings (Sturnus vulgaris).

    Get PDF
    Stereotypies are repetitive, unvarying and goalless behaviour patterns that are often considered indicative of poor welfare in captive animals. Quantifying stereotypies can be difficult, particularly during the early stages of their development when behaviour is still flexible. We compared two methods for objectively quantifying the development of route-tracing stereotypies in caged starlings. We used Markov chains and T-pattern analysis (implemented by the software package, Theme) to identify patterns in the sequence of locations a bird occupied within its cage. Pattern metrics produced by both methods correlated with the frequency of established measures of stereotypic behaviour and abnormal behaviour patterns counted from video recordings, suggesting that both methods could be useful for identifying stereotypic individuals and quantifying stereotypic behaviour. We discuss the relative benefits and disadvantages of the two approaches

    DNA ANALYSIS USING GRAMMATICAL INFERENCE

    Get PDF
    An accurate language definition capable of distinguishing between coding and non-coding DNA has important applications and analytical significance to the field of computational biology. The method proposed here uses positive sample grammatical inference and statistical information to infer languages for coding DNA. An algorithm is proposed for the searching of an optimal subset of input sequences for the inference of regular grammars by optimizing a relevant accuracy metric. The algorithm does not guarantee the finding of the optimal subset; however, testing shows improvement in accuracy and performance over the basis algorithm. Testing shows that the accuracy of inferred languages for components of DNA are consistently accurate. By using the proposed algorithm languages are inferred for coding DNA with average conditional probability over 80%. This reveals that languages for components of DNA can be inferred and are useful independent of the process that created them. These languages can then be analyzed or used for other tasks in computational biology. To illustrate potential applications of regular grammars for DNA components, an inferred language for exon sequences is applied as post processing to Hidden Markov exon prediction to reduce the number of wrong exons detected and improve the specificity of the model significantly

    Selection of sequence motifs and generative Hopfield-Potts models for protein familiesilies

    Full text link
    Statistical models for families of evolutionary related proteins have recently gained interest: in particular pairwise Potts models, as those inferred by the Direct-Coupling Analysis, have been able to extract information about the three-dimensional structure of folded proteins, and about the effect of amino-acid substitutions in proteins. These models are typically requested to reproduce the one- and two-point statistics of the amino-acid usage in a protein family, {\em i.e.}~to capture the so-called residue conservation and covariation statistics of proteins of common evolutionary origin. Pairwise Potts models are the maximum-entropy models achieving this. While being successful, these models depend on huge numbers of {\em ad hoc} introduced parameters, which have to be estimated from finite amount of data and whose biophysical interpretation remains unclear. Here we propose an approach to parameter reduction, which is based on selecting collective sequence motifs. It naturally leads to the formulation of statistical sequence models in terms of Hopfield-Potts models. These models can be accurately inferred using a mapping to restricted Boltzmann machines and persistent contrastive divergence. We show that, when applied to protein data, even 20-40 patterns are sufficient to obtain statistically close-to-generative models. The Hopfield patterns form interpretable sequence motifs and may be used to clusterize amino-acid sequences into functional sub-families. However, the distributed collective nature of these motifs intrinsically limits the ability of Hopfield-Potts models in predicting contact maps, showing the necessity of developing models going beyond the Hopfield-Potts models discussed here.Comment: 26 pages, 16 figures, to app. in PR

    Mining State-Based Models from Proof Corpora

    Full text link
    Interactive theorem provers have been used extensively to reason about various software/hardware systems and mathematical theorems. The key challenge when using an interactive prover is finding a suitable sequence of proof steps that will lead to a successful proof requires a significant amount of human intervention. This paper presents an automated technique that takes as input examples of successful proofs and infers an Extended Finite State Machine as output. This can in turn be used to generate proofs of new conjectures. Our preliminary experiments show that the inferred models are generally accurate (contain few false-positive sequences) and that representing existing proofs in such a way can be very useful when guiding new ones.Comment: To Appear at Conferences on Intelligent Computer Mathematics 201

    Identifying statistical dependence in genomic sequences via mutual information estimates

    Get PDF
    Questions of understanding and quantifying the representation and amount of information in organisms have become a central part of biological research, as they potentially hold the key to fundamental advances. In this paper, we demonstrate the use of information-theoretic tools for the task of identifying segments of biomolecules (DNA or RNA) that are statistically correlated. We develop a precise and reliable methodology, based on the notion of mutual information, for finding and extracting statistical as well as structural dependencies. A simple threshold function is defined, and its use in quantifying the level of significance of dependencies between biological segments is explored. These tools are used in two specific applications. First, for the identification of correlations between different parts of the maize zmSRp32 gene. There, we find significant dependencies between the 5' untranslated region in zmSRp32 and its alternatively spliced exons. This observation may indicate the presence of as-yet unknown alternative splicing mechanisms or structural scaffolds. Second, using data from the FBI's Combined DNA Index System (CODIS), we demonstrate that our approach is particularly well suited for the problem of discovering short tandem repeats, an application of importance in genetic profiling.Comment: Preliminary version. Final version in EURASIP Journal on Bioinformatics and Systems Biology. See http://www.hindawi.com/journals/bsb
    • 

    corecore