11,628 research outputs found

    Automation on the generation of genome scale metabolic models

    Full text link
    Background: Nowadays, the reconstruction of genome scale metabolic models is a non-automatized and interactive process based on decision taking. This lengthy process usually requires a full year of one person's work in order to satisfactory collect, analyze and validate the list of all metabolic reactions present in a specific organism. In order to write this list, one manually has to go through a huge amount of genomic, metabolomic and physiological information. Currently, there is no optimal algorithm that allows one to automatically go through all this information and generate the models taking into account probabilistic criteria of unicity and completeness that a biologist would consider. Results: This work presents the automation of a methodology for the reconstruction of genome scale metabolic models for any organism. The methodology that follows is the automatized version of the steps implemented manually for the reconstruction of the genome scale metabolic model of a photosynthetic organism, {\it Synechocystis sp. PCC6803}. The steps for the reconstruction are implemented in a computational platform (COPABI) that generates the models from the probabilistic algorithms that have been developed. Conclusions: For validation of the developed algorithm robustness, the metabolic models of several organisms generated by the platform have been studied together with published models that have been manually curated. Network properties of the models like connectivity and average shortest mean path of the different models have been compared and analyzed.Comment: 24 pages, 2 figures, 2 table

    Automated Classification of Stellar Spectra. II: Two-Dimensional Classification with Neural Networks and Principal Components Analysis

    Get PDF
    We investigate the application of neural networks to the automation of MK spectral classification. The data set for this project consists of a set of over 5000 optical (3800-5200 AA) spectra obtained from objective prism plates from the Michigan Spectral Survey. These spectra, along with their two-dimensional MK classifications listed in the Michigan Henry Draper Catalogue, were used to develop supervised neural network classifiers. We show that neural networks can give accurate spectral type classifications (sig_68 = 0.82 subtypes, sig_rms = 1.09 subtypes) across the full range of spectral types present in the data set (B2-M7). We show also that the networks yield correct luminosity classes for over 95% of both dwarfs and giants with a high degree of confidence. Stellar spectra generally contain a large amount of redundant information. We investigate the application of Principal Components Analysis (PCA) to the optimal compression of spectra. We show that PCA can compress the spectra by a factor of over 30 while retaining essentially all of the useful information in the data set. Furthermore, it is shown that this compression optimally removes noise and can be used to identify unusual spectra.Comment: To appear in MNRAS. 15 pages, 17 figures, 7 tables. 2 large figures (nos. 4 and 15) are supplied as separate GIF files. The complete paper can be obtained as a single gziped PS file from http://wol.ra.phy.cam.ac.uk/calj/p1.htm

    The Data Big Bang and the Expanding Digital Universe: High-Dimensional, Complex and Massive Data Sets in an Inflationary Epoch

    Get PDF
    Recent and forthcoming advances in instrumentation, and giant new surveys, are creating astronomical data sets that are not amenable to the methods of analysis familiar to astronomers. Traditional methods are often inadequate not merely because of the size in bytes of the data sets, but also because of the complexity of modern data sets. Mathematical limitations of familiar algorithms and techniques in dealing with such data sets create a critical need for new paradigms for the representation, analysis and scientific visualization (as opposed to illustrative visualization) of heterogeneous, multiresolution data across application domains. Some of the problems presented by the new data sets have been addressed by other disciplines such as applied mathematics, statistics and machine learning and have been utilized by other sciences such as space-based geosciences. Unfortunately, valuable results pertaining to these problems are mostly to be found only in publications outside of astronomy. Here we offer brief overviews of a number of concepts, techniques and developments, some "old" and some new. These are generally unknown to most of the astronomical community, but are vital to the analysis and visualization of complex datasets and images. In order for astronomers to take advantage of the richness and complexity of the new era of data, and to be able to identify, adopt, and apply new solutions, the astronomical community needs a certain degree of awareness and understanding of the new concepts. One of the goals of this paper is to help bridge the gap between applied mathematics, artificial intelligence and computer science on the one side and astronomy on the other.Comment: 24 pages, 8 Figures, 1 Table. Accepted for publication: "Advances in Astronomy, special issue "Robotic Astronomy

    Homogenous Ensemble Phonotactic Language Recognition Based on SVM Supervector Reconstruction

    Get PDF
    Currently, acoustic spoken language recognition (SLR) and phonotactic SLR systems are widely used language recognition systems. To achieve better performance, researchers combine multiple subsystems with the results often much better than a single SLR system. Phonotactic SLR subsystems may vary in the acoustic features vectors or include multiple language-specific phone recognizers and different acoustic models. These methods achieve good performance but usually compute at high computational cost. In this paper, a new diversification for phonotactic language recognition systems is proposed using vector space models by support vector machine (SVM) supervector reconstruction (SSR). In this architecture, the subsystems share the same feature extraction, decoding, and N-gram counting preprocessing steps, but model in a different vector space by using the SSR algorithm without significant additional computation. We term this a homogeneous ensemble phonotactic language recognition (HEPLR) system. The system integrates three different SVM supervector reconstruction algorithms, including relative SVM supervector reconstruction, functional SVM supervector reconstruction, and perturbing SVM supervector reconstruction. All of the algorithms are incorporated using a linear discriminant analysis-maximum mutual information (LDA-MMI) backend for improving language recognition evaluation (LRE) accuracy. Evaluated on the National Institute of Standards and Technology (NIST) LRE 2009 task, the proposed HEPLR system achieves better performance than a baseline phone recognition-vector space modeling (PR-VSM) system with minimal extra computational cost. The performance of the HEPLR system yields 1.39%, 3.63%, and 14.79% equal error rate (EER), representing 6.06%, 10.15%, and 10.53% relative improvements over the baseline system, respectively, for the 30-, 10-, and 3-s test conditions

    Revisiting the Training of Logic Models of Protein Signaling Networks with a Formal Approach based on Answer Set Programming

    Get PDF
    A fundamental question in systems biology is the construction and training to data of mathematical models. Logic formalisms have become very popular to model signaling networks because their simplicity allows us to model large systems encompassing hundreds of proteins. An approach to train (Boolean) logic models to high-throughput phospho-proteomics data was recently introduced and solved using optimization heuristics based on stochastic methods. Here we demonstrate how this problem can be solved using Answer Set Programming (ASP), a declarative problem solving paradigm, in which a problem is encoded as a logical program such that its answer sets represent solutions to the problem. ASP has significant improvements over heuristic methods in terms of efficiency and scalability, it guarantees global optimality of solutions as well as provides a complete set of solutions. We illustrate the application of ASP with in silico cases based on realistic networks and data

    Genome-scale architecture of small molecule regulatory networks and the fundamental trade-off between regulation and enzymatic activity

    Get PDF
    Metabolic flux is in part regulated by endogenous small molecules that modulate the catalytic activity of an enzyme, e.g., allosteric inhibition. In contrast to transcriptional regulation of enzymes, technical limitations have hindered the production of a genome-scale atlas of small molecule-enzyme regulatory interactions. Here, we develop a framework leveraging the vast, but fragmented, biochemical literature to reconstruct and analyze the small molecule regulatory network (SMRN) of the model organism Escherichia coli, including the primary metabolite regulators and enzyme targets. Using metabolic control analysis, we prove a fundamental trade-off between regulation and enzymatic activity, and we combine it with metabolomic measurements and the SMRN to make inferences on the sensitivity of enzymes to their regulators. Generalizing the analysis to other organisms, we identify highly conserved regulatory interactions across evolutionarily divergent species, further emphasizing a critical role for small molecule interactions in the maintenance of metabolic homeostasis.P30 CA008748 - NCI NIH HHS; R01 GM121950 - NIGMS NIH HH
    corecore