1,417 research outputs found

    Transcript assembly and abundance estimation with high-throughput RNA sequencing

    Get PDF
    We present algorithms and statistical methods for the reconstruction and abundance estimation of transcript sequences from high throughput RNA sequencing ("RNA-Seq"). We evaluate these approaches through large-scale experiments of a well studied model of muscle development. We begin with an overview of sequencing assays and outline why the short read alignment problem is fundamental to the analysis of these assays. We then describe two approaches to the contiguous alignment problem, one of which uses massively parallel graphics hardware to accelerate alignment, and one of which exploits an indexing scheme based on the Burrows-Wheeler transform. We then turn to the spliced alignment problem, which is fundamental to RNA-Seq, and present an algorithm, TopHat. TopHat is the first algorithm that can align the reads from an entire RNA-Seq experiment to a large genome without the aid of reference gene models. In the second part of the thesis, we present the first comparative RNA-Seq as- sembly algorithm, Cufflinks, which is adapted from a constructive proof of Dilworth's Theorem, a classic result in combinatorics. We evaluate Cufflinks by assembling the transcriptome from a time course RNA-Seq experiment of developing skeletal muscle cells. The assembly contains 13,689 known transcripts and 3,724 novel ones. Of the novel transcripts, 62% were strongly supported by earlier sequencing experiments or by homologous transcripts in other organisms. We further validated interesting genes with isoform-specific RT-PCR. We then present a statistical model for RNA-Seq included in Cufflinks and with which we estimate abundances of transcripts from RNA-seq data. Simulation studies demonstrate that the model is highly accurate. We apply this model to the muscle data, and track the abundances of individual isoforms over development. Finally, we present significance tests for changes in relative and absolute abundances between time points, which we employ to uncover differential expression and differential regulation. By testing for relative abundance changes within and between transcripts sharing a transcription start site, we find significant shifts in the rates of alternative splicing and promoter preference in hundreds of genes, including those believed to regulate muscle development

    Simulation of networks of spiking neurons: A review of tools and strategies

    Full text link
    We review different aspects of the simulation of spiking neural networks. We start by reviewing the different types of simulation strategies and algorithms that are currently implemented. We next review the precision of those simulation strategies, in particular in cases where plasticity depends on the exact timing of the spikes. We overview different simulators and simulation environments presently available (restricted to those freely available, open source and documented). For each simulation tool, its advantages and pitfalls are reviewed, with an aim to allow the reader to identify which simulator is appropriate for a given task. Finally, we provide a series of benchmark simulations of different types of networks of spiking neurons, including Hodgkin-Huxley type, integrate-and-fire models, interacting with current-based or conductance-based synapses, using clock-driven or event-driven integration strategies. The same set of models are implemented on the different simulators, and the codes are made available. The ultimate goal of this review is to provide a resource to facilitate identifying the appropriate integration strategy and simulation tool to use for a given modeling problem related to spiking neural networks.Comment: 49 pages, 24 figures, 1 table; review article, Journal of Computational Neuroscience, in press (2007

    Machine Learning na previsão de Cancro Colorretal em função de alterações metabólicas

    Get PDF
    No mundo atual, a quantidade de informação disponível nos mais variados setores é cada vez maior. É o caso da área da saúde, onde a recolha e tratamento de dados biomédicos procuram melhorar a tomada de decisão no tratamento a aplicar a um doente, recorrendo a ferramentas baseadas em Machine Learning. Machine Learning é uma área da Inteligência Artificial em que através da aplicação de algoritmos a um conjunto de dados é possível prever resultados ou até descobrir relações entre estes que seriam impercetíveis à primeira vista. Com este projeto pretende-se realizar um estudo em que o objetivo é investigar diversos algoritmos e técnicas de Machine Learning, de modo a identificar se o perfil de acilcarnitinas pode constituir um novo marcador bioquímico para a predição e prognóstico do Cancro Colorretal. No decurso do trabalho, foram testados diferentes algoritmos e técnicas de pré-processamento de dados. Foram realizadas três experiências distintas com o objetivo de validar as previsões dos modelos construídos para diferentes cenários, nomeadamente: prever se o paciente tem Cancro Colorretal, prever qual a doença que o paciente tem (Cancro Colorretal e outras doenças metabólicas) e prever se este tem ou não alguma doença. Numa primeira análise, os modelos desenvolvidos apresentam bons resultados na triagem de Cancro Colorretal. Os melhores resultados foram obtidos pelos algoritmos Random Forest e Gradient Boosting, em conjunto com técnicas de balanceamento dos dados e Feature Selection, nomeadamente Random Oversampling, Synthetic Oversampling e Recursive Feature SelectionIn today´s world, the amount of information available in various sectors is increasing. That is the case in the healthcare area, where the collection and treatment of biochemical data seek to improve the decision-making in the treatment to be applied to a patient, using Machine Learning-based tools. Machine learning is an area of Artificial Intelligence in which applying algorithms to a dataset makes it possible to predict results or even discover relationships that would be unnoticeable at first glance. This project’s main objective is to study several algorithms and techniques of Machine Learning to identify if the acylcarnitine profile may constitute a new biochemical marker for the prediction and prognosis of rectal cancer. In the course of the work, different algorithms and data preprocessing techniques were tested. Three different experiments were carried out to validate the predictions of the models built for different scenarios, namely: predicting whether the patient has Colorectal Cancer, predicting which disease the patient has (Colorectal Cancer and other metabolic diseases) and predicting whether he has any disease. As a first analysis, the developed models showed good results in Colorectal Cancer screening. The best results were obtained by the Random Forest and Gradient Boosting algorithms, together with data balancing and feature selection techniques, namely Random Oversampling, Synthetic Oversampling and Recursive Feature Selectio

    Coevolutionary optimization of fuzzy logic intelligence for strategic decision support

    Get PDF
    ©2005 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.We present a description and initial results of a computer code that coevolves fuzzy logic rules to play a two-sided zero-sum competitive game. It is based on the TEMPO Military Planning Game that has been used to teach resource allocation to over 20 000 students over the past 40 years. No feasible algorithm for optimal play is known. The coevolved rules, when pitted against human players, usually win the first few competitions. For reasons not yet understood, the evolved rules (found in a symmetrical competition) place little value on information concerning the play of the opponent.Rodney W. Johnson, Michael E. Melich, Zbigniew Michalewicz, and Martin Schmid

    Improving Compute & Data Efficiency of Flexible Architectures

    Get PDF
    • …
    corecore