1,744 research outputs found

    Bioinformatics of Phosphoproteomics

    Get PDF

    Taxonomic evidence applying intelligent information algorithm and the principle of maximum entropy: the case of asteroids families

    Get PDF
    The Numeric Taxonomy aims to group operational taxonomic units in clusters (OTUs or taxons or taxa), using the denominated structure analysis by means of numeric methods. These clusters that constitute families are the purpose of this series of projects and they emerge of the structural analysis, of their phenotypical characteristic, exhibiting the relationships in terms of grades of similarity of the OTUs, employing tools such as i) the Euclidean distance and ii) nearest neighbor techniques. Thus taxonomic evidence is gathered so as to quantify the similarity for each pair of OTUs (pair-group method) obtained from the basic data matrix and in this way the significant concept of spectrum of the OTUs is introduced, being based the same one on the state of their characters. A new taxonomic criterion is thereby formulated and a new approach to Computational Taxonomy is presented, that has been already employed with reference to Data Mining, when apply of Machine Learning techniques, in particular to the C4.5 algorithms, created by Quinlan, the degree of efficiency achieved by the TDIDT family´s algorithms when are generating valid models of the data in classification problems with the Gain of Entropy through Maximum Entropy Principle.Facultad de Ciencias Astronómicas y GeofísicasFacultad de Ciencias Exacta

    Taxonomic evidence applying intelligent information algorithm and the principle of maximum entropy: the case of asteroids families

    Get PDF
    The Numeric Taxonomy aims to group operational taxonomic units in clusters (OTUs or taxons or taxa), using the denominated structure analysis by means of numeric methods. These clusters that constitute families are the purpose of this series of projects and they emerge of the structural analysis, of their phenotypical characteristic, exhibiting the relationships in terms of grades of similarity of the OTUs, employing tools such as i) the Euclidean distance and ii) nearest neighbor techniques. Thus taxonomic evidence is gathered so as to quantify the similarity for each pair of OTUs (pair-group method) obtained from the basic data matrix and in this way the significant concept of spectrum of the OTUs is introduced, being based the same one on the state of their characters. A new taxonomic criterion is thereby formulated and a new approach to Computational Taxonomy is presented, that has been already employed with reference to Data Mining, when apply of Machine Learning techniques, in particular to the C4.5 algorithms, created by Quinlan, the degree of efficiency achieved by the TDIDT family´s algorithms when are generating valid models of the data in classification problems with the Gain of Entropy through Maximum Entropy Principle.Facultad de Ciencias Astronómicas y GeofísicasFacultad de Ciencias Exacta

    Machine Learning for Fluid Mechanics

    Full text link
    The field of fluid mechanics is rapidly advancing, driven by unprecedented volumes of data from field measurements, experiments and large-scale simulations at multiple spatiotemporal scales. Machine learning offers a wealth of techniques to extract information from data that could be translated into knowledge about the underlying fluid mechanics. Moreover, machine learning algorithms can augment domain knowledge and automate tasks related to flow control and optimization. This article presents an overview of past history, current developments, and emerging opportunities of machine learning for fluid mechanics. It outlines fundamental machine learning methodologies and discusses their uses for understanding, modeling, optimizing, and controlling fluid flows. The strengths and limitations of these methods are addressed from the perspective of scientific inquiry that considers data as an inherent part of modeling, experimentation, and simulation. Machine learning provides a powerful information processing framework that can enrich, and possibly even transform, current lines of fluid mechanics research and industrial applications.Comment: To appear in the Annual Reviews of Fluid Mechanics, 202

    BioSilicoSystems - A Multipronged Approach Towards Analysis and Representation of Biological Data (PhD Thesis)

    Get PDF
    The rising field of integrative bioinformatics provides the vital methods to integrate, manage and also to analyze the diverse data and allows gaining new and deeper insights and a clear understanding of the intricate biological systems. The difficulty is not only to facilitate the study of heterogeneous data within the biological context, but it also more fundamental, how to represent and make the available knowledge accessible. Moreover, adding valuable information and functions that persuade the user to discover the interesting relations hidden within the data is, in itself, a great challenge. Also, the cumulative information can provide greater biological insight than is possible with individual information sources. Furthermore, the rapidly growing number of databases and data types poses the challenge of integrating the heterogeneous data types, especially in biology. This rapid increase in the volume and number of data resources drive for providing polymorphic views of the same data and often overlap in multiple resources. 

In this thesis a multi-pronged approach is proposed that deals with various methods for the analysis and representation of the diverse biological data which are present in different data sources. This is an effort to explain and emphasize on different concepts which are developed for the analysis of molecular data and also to explain its biological significance. The hypotheses proposed are in context with various other results and findings published in the past. The approach demonstrated also explains different ways to integrate the molecular data from various sources along with the need for a comprehensive understanding and clear projection of the concept or the algorithm and its results, but with simple means and methods. The multifarious approach proposed in this work comprises of different tools or methods spanning significant areas of bioinformatics research such as data integration, data visualization, biological network construction / reconstruction and alignment of biological pathways. Each tool deals with a unique approach to utilize the molecular data for different areas of biological research and is built based on the kernel of the thesis. Furthermore these methods are combined with graphical representation that make things simple and comprehensible and also helps to understand with ease the underlying biological complexity. Moreover the human eye is often used to and it is more comfortable with the visual representation of the facts

    Prédiction de la détérioration du comportement à l’aide de l’apprentissage automatique

    Get PDF
    Les plateformes de médias sociaux rassemblent des individus pour interagir de manière amicale et civilisée tout en ayant des convictions et des croyances diversifiées. Certaines personnes adoptent des comportements répréhensibles qui nuisent à la sérénité et affectent négativement l’équanimité des autres utilisateurs. Certains cas de mauvaise conduite peuvent initialement avoir de petits effets statistiques, mais leur accumulation persistante pourrait entraîner des conséquences majeures et dévastatrices. L’accumulation persistante des mauvais comportements peut être un prédicteur valide des facteurs de risque de détérioration du comportement. Le problème de la détérioration du comportement n’a pas été largement étudié dans le contexte des médias sociaux. La détection précoce de la détérioration du comportement peut être d’une importance cruciale pour éviter que le mauvais comportement des individus ne s’aggrave. Cette thèse aborde le problème de la détérioration du comportement dans le contexte des médias sociaux. Nous proposons de nouvelles méthodes basées sur l’apprentissage automatique qui (1) explorent les séquences comportementales et leurs motifs temporels pour faciliter la compréhension des comportements manifestés par les individus et (2) prédisent la détérioration du comportement à partir de combinaisons consécutives de motifs séquentiels correspondant à des comportements inappropriés. Nous menons des expériences approfondies à l’aide d’ensembles de données du monde réel et démontrons la capacité de nos modèles à prédire la détérioration du comportement avec un haut degré de précision, c’est-à-dire des scores F-1 supérieurs à 0,8. En outre, nous examinons la trajectoire de détérioration du comportement afin de découvrir les états émotionnels que les individus présentent progressivement et d’évaluer si ces états émotionnels conduisent à la détérioration du comportement au fil du temps. Nos résultats suggèrent que la colère pourrait être un état émotionnel potentiel qui pourrait contribuer substantiellement à la détérioration du comportement

    Bayesian nonparametric clusterings in relational and high-dimensional settings with applications in bioinformatics.

    Get PDF
    Recent advances in high throughput methodologies offer researchers the ability to understand complex systems via high dimensional and multi-relational data. One example is the realm of molecular biology where disparate data (such as gene sequence, gene expression, and interaction information) are available for various snapshots of biological systems. This type of high dimensional and multirelational data allows for unprecedented detailed analysis, but also presents challenges in accounting for all the variability. High dimensional data often has a multitude of underlying relationships, each represented by a separate clustering structure, where the number of structures is typically unknown a priori. To address the challenges faced by traditional clustering methods on high dimensional and multirelational data, we developed three feature selection and cross-clustering methods: 1) infinite relational model with feature selection (FIRM) which incorporates the rich information of multirelational data; 2) Bayesian Hierarchical Cross-Clustering (BHCC), a deterministic approximation to Cross Dirichlet Process mixture (CDPM) and to cross-clustering; and 3) randomized approximation (RBHCC), based on a truncated hierarchy. An extension of BHCC, Bayesian Congruence Measuring (BCM), is proposed to measure incongruence between genes and to identify sets of congruent loci with identical evolutionary histories. We adapt our BHCC algorithm to the inference of BCM, where the intended structure of each view (congruent loci) represents consistent evolutionary processes. We consider an application of FIRM on categorizing mRNA and microRNA. The model uses latent structures to encode the expression pattern and the gene ontology annotations. We also apply FIRM to recover the categories of ligands and proteins, and to predict unknown drug-target interactions, where latent categorization structure encodes drug-target interaction, chemical compound similarity, and amino acid sequence similarity. BHCC and RBHCC are shown to have improved predictive performance (both in terms of cluster membership and missing value prediction) compared to traditional clustering methods. Our results suggest that these novel approaches to integrating multi-relational information have a promising future in the biological sciences where incorporating data related to varying features is often regarded as a daunting task
    corecore