120 research outputs found

    Adaptive matrix metrics for molecular descriptor assessment in QSPR classification

    Get PDF
    QSPR methods represent a useful approach in the drug discovery process, since they allow to predict in advance biological or physicochemical properties of a candidate drug. For this goal, it is necessary that the QSPR method be as accurate as possible to provide reliable predictions. Moreover, the selection of the molecular descriptors is an important task to create QSPR prediction models of low complexity which, at the same time, provide accurate predictions. In this work, a matrix-based method is used to transform the original data space of chemical compounds into an alternative space where compounds with different target properties can be better separated. For using this approach, QSPR is considered as a classification problem. The advantage of using adaptive matrix metrics is twofold: it can be used to identify important molecular descriptors and at the same time it allows improving the classification accuracy. A recently proposed method making use of this concept is extended to multi-class data. The new method is related to linear discriminant analysis and shows better results at yet higher computational costs. An application for relating chemical descriptors to hydrophobicity property shows promising results.Fil: Soto, Axel Juan. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Planta Piloto de Ingeniería Química. Universidad Nacional del Sur. Planta Piloto de Ingeniería Química; ArgentinaFil: Strickert, Marc. Leibniz Institute of Plant Genetics and Crop Plant Research; AlemaniaFil: Vazquez, Gustavo Esteban. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Planta Piloto de Ingeniería Química. Universidad Nacional del Sur. Planta Piloto de Ingeniería Química; Argentin

    Correlation-maximizing surrogate gene space for visual mining of gene expression patterns in developing barley endosperm tissue

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Micro- and macroarray technologies help acquire thousands of gene expression patterns covering important biological processes during plant ontogeny. Particularly, faithful visualization methods are beneficial for revealing interesting gene expression patterns and functional relationships of coexpressed genes. Such screening helps to gain deeper insights into regulatory behavior and cellular responses, as will be discussed for expression data of developing barley endosperm tissue. For that purpose, high-throughput multidimensional scaling (HiT-MDS), a recent method for similarity-preserving data embedding, is substantially refined and used for (a) assessing the quality and reliability of centroid gene expression patterns, and for (b) derivation of functional relationships of coexpressed genes of endosperm tissue during barley grain development (0–26 days after flowering).</p> <p>Results</p> <p>Temporal expression profiles of 4824 genes at 14 time points are faithfully embedded into two-dimensional displays. Thereby, similar shapes of coexpressed genes get closely grouped by a correlation-based similarity measure. As a main result, by using power transformation of correlation terms, a characteristic cloud of points with bipolar sandglass shape is obtained that is inherently connected to expression patterns of pre-storage, intermediate and storage phase of endosperm development.</p> <p>Conclusion</p> <p>The new HiT-MDS-2 method helps to create global views of expression patterns and to validate centroids obtained from clustering programs. Furthermore, functional gene annotation for developing endosperm barley tissue is successfully mapped to the visualization, making easy localization of major centroids of enriched functional categories possible.</p

    DIY meteorology: Use of citizen science to monitor snow dynamics in a data-sparse city

    Get PDF
    Cities are under pressure to operate their services effectively and project costs of operations across various timeframes. In high-latitude and high-altitude urban centers, snow management is one of the larger unknowns and has both operational and budgetary limitations. Snowfall and snow depth observations within urban environments are important to plan snow clearing and prepare for the effects of spring runoff on cities' drainage systems. In-house research functions are expensive, but one way to overcome that expense and still produce effective data is through citizen science. In this paper, we examine the potential to use citizen science for snowfall data collection in urban environments. A group of volunteers measured daily snowfall and snow depth at an urban site in Saskatoon (Canada) during two winters. Reliability was assessed with a statistical consistency analysis and a comparison with other data sets collected around Saskatoon. We found that citizen-science-derived data were more reliable and relevant for many urban management stakeholders. Feedback from the participants demonstrated reflexivity about social learning and a renewed sense of community built around generating reliable and useful data. We conclude that citizen science holds great potential to improve data provision for effective and sustainable city planning and greater social learning benefits overall

    DIY meteorology: Use of citizen science to monitor snow dynamics in a data-sparse city

    Get PDF
    Cities are under pressure to operate their services effectively and project costs of operations across various timeframes. In high-latitude and high-altitude urban centers, snow management is one of the larger unknowns and has both operational and budgetary limitations. Snowfall and snow depth observations within urban environments are important to plan snow clearing and prepare for the effects of spring runoff on cities' drainage systems. In-house research functions are expensive, but one way to overcome that expense and still produce effective data is through citizen science. In this paper, we examine the potential to use citizen science for snowfall data collection in urban environments. A group of volunteers measured daily snowfall and snow depth at an urban site in Saskatoon (Canada) during two winters. Reliability was assessed with a statistical consistency analysis and a comparison with other data sets collected around Saskatoon. We found that citizen-science-derived data were more reliable and relevant for many urban management stakeholders. Feedback from the participants demonstrated reflexivity about social learning and a renewed sense of community built around generating reliable and useful data. We conclude that citizen science holds great potential to improve data provision for effective and sustainable city planning and greater social learning benefits overall

    Genotyping by sequencing and a newly developed mRNA-GBS approach to link population genetic and transcriptome analyses reveal pattern differences between sites and treatments in red clover (Trifolium pratense L.)

    Get PDF
    The important worldwide forage crop red clover (Trifolium pratense L.) is widely cultivated as cattle feed and for soil improvement. Wild populations and landraces have great natural diversity that could be used to improve cultivated red clover. However, to date, there is still insufficient knowledge about the natural genetic and phenotypic diversity of the species. Here, we developed a low-cost complexity reduced mRNA analysis (mRNA-GBS) and compared the results with population genetic (GBS) and previously published mRNA-Seq data, to assess whether analysis of intraspecific variation within and between populations and transcriptome responses is possible simultaneously. The mRNA-GBS approach was successful. SNP analyses from the mRNA-GBS approach revealed comparable patterns to the GBS results, but due to site-specific multifactorial influences of environmental responses as well as conceptual and methodological limitations of mRNA-GBS, it was not possible to link transcriptome analyses with reduced complexity and sequencing depth to previously published greenhouse and field expression studies. Nevertheless, the use of short sequences upstream of the poly(A) tail of mRNA to reduce complexity are promising approaches that combine population genetics and expression profiling to analyze many individuals with trait differences simultaneously and cost-effectively, even in non-model species. Nevertheless, our study design across different regions in Germany was also challenging. The use of reduced complexity differential expression analyses most likely overlays site-specific patterns due to highly complex plant responses under natural conditions

    Utilizing gene pair orientations for HMM-based analysis of promoter array ChIP-chip data

    Get PDF
    Motivation: Array-based analysis of chromatin immunoprecipitation (ChIP-chip) data is a powerful technique for identifying DNA target regions of individual transcription factors. The identification of these target regions from comprehensive promoter array ChIP-chip data is challenging. Here, three approaches for the identification of transcription factor target genes from promoter array ChIP-chip data are presented. We compare (i) a standard log-fold-change analysis (LFC); (ii) a basic method based on a Hidden Markov Model (HMM); and (iii) a new extension of the HMM approach to an HMM with scaled transition matrices (SHMM) that incorporates information about the relative orientation of adjacent gene pairs on DNA

    Unifying generative and discriminative learning principles

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The recognition of functional binding sites in genomic DNA remains one of the fundamental challenges of genome research. During the last decades, a plethora of different and well-adapted models has been developed, but only little attention has been payed to the development of different and similarly well-adapted learning principles. Only recently it was noticed that discriminative learning principles can be superior over generative ones in diverse bioinformatics applications, too.</p> <p>Results</p> <p>Here, we propose a generalization of generative and discriminative learning principles containing the maximum likelihood, maximum a posteriori, maximum conditional likelihood, maximum supervised posterior, generative-discriminative trade-off, and penalized generative-discriminative trade-off learning principles as special cases, and we illustrate its efficacy for the recognition of vertebrate transcription factor binding sites.</p> <p>Conclusions</p> <p>We find that the proposed learning principle helps to improve the recognition of transcription factor binding sites, enabling better computational approaches for extracting as much information as possible from valuable wet-lab data. We make all implementations available in the open-source library Jstacs so that this learning principle can be easily applied to other classification problems in the field of genome and epigenome analysis.</p
    corecore