70 research outputs found

    Application of machine learning in systems biology

    Get PDF
    Biological systems are composed of a large number of molecular components. Understanding their behavior as a result of the interactions between the individual components is one of the aims of systems biology. Computational modelling is a powerful tool commonly used in systems biology, which relies on mathematical models that capture the properties and interactions between molecular components to simulate the behavior of the whole system. However, in many biological systems, it becomes challenging to build reliable mathematical models due to the complexity and the poor understanding of the underlying mechanisms. With the breakthrough in big data technologies in biology, data-driven machine learning (ML) approaches offer a promising complement to traditional theory-based models in systems biology. Firstly, ML can be used to model the systems in which the relationships between the components and the system are too complex to be modelled with theory-based models. Two such examples of using ML to resolve the genotype-phenotype relationships are presented in this thesis: (i) predicting yeast phenotypes using genomic features and (ii) predicting the thermal niche of microorganisms based on the proteome features. Secondly, ML naturally complements theory-based models. By applying ML, I improved the performance of the genome-scale metabolic model in describing yeast thermotolerance. In this application, ML was used to estimate the thermal parameters by using a Bayesian statistical learning approach that trains regression models and performs uncertainty quantification and reduction. The predicted bottleneck genes were further validated by experiments in improving yeast thermotolerance. In such applications, regression models are frequently used, and their performance relies on many factors, including but not limited to feature engineering and quality of response values. Manually engineering sufficient relevant features is particularly challenging in biology due to the lack of knowledge in certain areas. With the increasing volume of big data, deep-transfer learning enables us to learn a statistical summary of the samples from a big dataset which can be used as input to train other ML models. In the present thesis, I applied this approach to first learn a deep representation of enzyme thermal adaptation and then use it for the development of regression models for predicting enzyme optimal and protein melting temperatures. It was demonstrated that the transfer learning-based regression models outperform the classical ones trained on rationally engineered features in both cases. On the other hand, noisy response values are very common in biological datasets due to the variation in experimental measurements and they fundamentally restrict the performance attainable with regression models. I thereby addressed this challenge by deriving a theoretical upper bound for the coefficient of determination (R2) for regression models. This theoretical upper bound depends on the noise associated with the response variable and variance for a given dataset. It can thus be used to test whether the maximal performance has been reached on a particular dataset, or whether further model improvement is possible

    Gaussian Process in Computational Biology: Covariance Functions for Transcriptomics

    Get PDF
    In the field of machine learning, Gaussian process models are widely used families of stochastic process for modelling data observed over time, space or both. Gaussian processes models are nonparametric, meaning that the models are developed on an infinite-dimensional parameter space. The parameter space is then typically learnt as the set of all possible solutions for a given learning problem. Gaussian process distributions are distribution over functions. The covariance function determines the properties of functions samples drawn from the process. Once the decision to model with a Gaussian process has been made the choice of the covariance function is a central step in modelling. In molecular biology and genetics, a transcription factor is a protein that binds to specific DNA sequences and controls the flow of genetic information from DNA to mRNA. To develop models of cellular processes, quantitative estimation of the regulatory relationship between transcription factors and genes is a basic requirement. Quantitative estimation is complex due to various reasons. Many of the transcription factors' activities and their own transcription level are post transcriptionally modified; very often the levels of the transcription factors' expressions are low and noisy. So, from the expression levels of their target genes, it is useful to infer the activity of the transcription factors. Here we developed a Gaussian process based nonparametric regression model to infer the exact transcription factor activities from a combination of mRNA expression levels and DNA-protein binding measurements. Clustering of gene expression time series gives insight into which genes may be coregulated, allowing us to discern the activity of pathways in a given microarray experiment. Of particular interest is how a given group of genes varies with different conditions or genetic backgrounds. In this thesis, we developed a new clustering method that allows each cluster to be parametrized according to the behaviour of the genes across conditions whether they are correlated or anti-correlated. By specifying the correlation between such genes, we gain more information within the cluster about how the genes interrelate. Our study shows the effectiveness of sharing information between replicates and different model conditions while modelling gene expression time series

    Neurofly 2008 abstracts : the 12th European Drosophila neurobiology conference 6-10 September 2008 Wuerzburg, Germany

    Get PDF
    This volume consists of a collection of conference abstracts

    Image analysis platforms for exploring genetic and neuronal mechanisms regulating animal behavior

    Get PDF
    An important aim of neuroscience is to understand how gene interactions and neuronal networks regulate animal behavior. The larvae of the marine annelid Platynereis dumerilii provide a convenient system for such integrative studies. These larvae exhibit a wide range of behaviors, including phototaxis, chemotaxis and gravitaxis and at the same time exhibit relatively simple nervous system organization. Due to its small size and transparent body, the Platynereis larva is compatible with whole-body light microscopic imaging following tissue staining protocols. It is also suitable for serial electron microscopic imaging and subsequent neuronal connectome reconstruction. Despite advances in imaging techniques, automated computational tools for large data analysis are not well-established in Platynereis. In the current work, I developed image analysis software for exploring genetic and nervous system mechanisms modulating Platynereis behavior. Exploring gene expression patterns Current labeling and imaging techniques restrict the number of gene expression patterns that can be labelled and visualized in a single specimen, which hinders the study of behaviors driven by multi-molecular interactions. To address this problem, I employed image registration to generate a gene expression atlas that integrates gene expression information from multiple specimens in a common reference space. The gene expression atlas was used to investigate mechanisms regulating larval locomotion, settlement and phototaxis in Platynereis. The atlas can assist in the identification of inter-individual and inter-species variations in gene expression. To provide a representation convenient for exploring gene expression patterns, I created a model of the atlas using 3D graphics software, which enabled convenient data visualization and efficient data storage and sharing. Exploring neuronal networks regulating behavior Neuronal circuitry can be reconstructed from the images obtained from electron microscopy, which resolves very fine structures such as neuron morphology or synapses. The amount of data resulting from electron microscopy and the complexity of neuronal networks represent a significant challenge for manual analysis. To solve this problem, I developed the NeuroDetective software, which models a neuronal circuitry and analyzes the information flow within it. The software combines the advantages of 3D visualization and graph analysis software by integrating neuron morphology and spatial distribution together with synaptic connectivity. NeuroDetective allowed studying the neuronal circuitry responsible for phototaxis in Platynereis larvae, revealing the connections and the neurons important for the network functionality. NeuroDetective facilitated the establishment of a relationship between the function and the structure of the neuronal circuitry in Platynereis phototaxis. Integrating gene expression patterns with neuronal connectivity Neuronal circuitry and its associated modulating biomolecules, such as neurotransmitters and neuropeptides, are thought to be the main factors regulating animal behavior. Therefore it was important to integrate both genetic and neuronal information in order to fully understand how biomolecules in conjunction with neuronal anatomy elicit certain animal behavior. To resolve the difference in specimen preparation for gene expression versus electron microscopy preparations, I developed an image registration procedure to match the signals from these two different datasets. This method enabled the integration the spatial distribution of specific modulators into the analysis of neuronal networks, leading to an improved understanding of the genetic and neuronal mechanisms modulating behavior in Platynereis

    On the role of molecular mechanisms and unequal cleavage during neurogenesis in the C. elegans C lineage

    Get PDF
    Required for neurogenesis is a family of evolutionarily conserved bHLH transcription factors known as proneural genes. However, regulation of their initial expression remains a poorly understood aspect of neurodevelopment in any model, particularly Caenorhabditis elegans. A key mechanism by which cells acquire different fates is asymmetric division and in neuronal lineages these often generate unequally sized daughters. Whether this unequal size directly affects cell fate regulation is often unknown. Indeed, the question of how control of cell size intersects with fate decisions is poorly understood in biology more generally. Taking advantage of the single-cell resolution provided by the invariant cell lineage of C. elegans, I interrogate these two fundamental biological questions in the C lineage. Expression of the proneural gene hlh-14/Ascl1 in a single branch of the lineage is required for neurogenesis of the DVC and PVR neurons and is immediately preceded by unequal cleavages. Addressing both molecular and cellular regulators I perform a 4D-lineage based genetic screen for upstream regulators of hlh-14/Ascl1 and address the effect of unequal cleavage and daughter cell size. I find that a regulator of other neuronal lineage cleavages, PIG-1/MELK, is also required in the C lineage, yet equalisation does not affect the initiation of hlh-14/Ascl1 expression. Conversely, I demonstrate that unequal cleavage and acquisition of neuronal fate in separate successive divisions are controlled by the same key regulators. The first by an upstream regulator of hlh-14, the Mediator complex kinase module let-19/Mdt-13 and the second by hlh-14 itself. Taken together the results described in this thesis suggest that rather than acting to correctly segregate initial proneural gene expression, unequal cleavages are instead co-regulated by the same factors regulating neuronal fate acquisition. This co-regulation at successive divisions thus coordinates two separable aspects of fate; acquisition of neuronal identity and correct post-mitotic embryonic cell size

    Diseño, desarrollo y evaluación de algoritmos basados en aprendizaje profundo para automatización de experimentos Lifespan con C. elegans

    Full text link
    [ES] En los últimos años, los nematodos C. elegans cultivados en placas de Petri se han utilizado en muchas investigaciones relacionadas con el envejecimiento. El desarrollo de nuevas herramientas para automatizar los experimentos de lifespan permite realizar más ensayos en menos tiempo y evitar errores humanos, obteniendo resultados más precisos. El objetivo de este TFM consiste en diseñar y desarrollar métodos para abordar este problema utilizando técnicas de aprendizaje profundo. Posteriormente, se evaluarán los resultados comparando los resultados con los obtenidos empleando técnicas tradicionales de visión por computador. Inicialmente, el trabajo se centrará en la creación y edición de forma supervisada de un conjunto de imágenes bien etiquetadas. Posteriormente se diseñarán distintas arquitecturas de redes neuronales y se optimizará cada una de ellas sobre el espacio de hiperparámetros utilizando Python y Pytorch. Finalmente, se evaluarán las distintas arquitecturas propuestas, utilizando como criterios de optimización tanto las tasas de aciertos como los costes temporales de computación.[EN] In recent years, C. elegans nematodes grown in Petri dishes have been used in many investigations related to aging. The development of new tools to automate lifespan experiments allows more tests to be carried out in less time and to avoid human error, obtaining more accurate results. The objective of this TFM is to design and develop methods to address this problem using deep learning techniques. Subsequently, the results will be evaluated by comparing the results with those obtained using traditional computer vision techniques. Initially, work will focus on supervised creation and editing of a set of well-labeled images. Subsequently, different neural network architectures will be designed and each one will be optimized on the hyperparameter space using Python and Pytorch. Finally, the different proposed architectures will be evaluated, using both the accuracies and the temporary computing costs as optimization criteria.[CA] En els últims anys, els nematodes C. elegans conreats en plaques de Petri s'han utilitzat en moltes recerques relacionades amb l'envelliment. El desenvolupament de noves eines per a automatitzar els experiments de lifespan permet realitzar més assajos en menys temps i evitar errors humans, obtenint resultats més precisos. L'objectiu d'aquest TFM consisteix a dissenyar i desenvolupar mètodes per a abordar aquest problema utilitzant tècniques d'aprenentatge profund. Posteriorment, s'avaluaran els resultats comparant els resultats amb els obtinguts emprant tècniques tradicionals de visió per computador. Inicialment, el treball se centrarà en la creació i edició de forma supervisada d'un conjunt d'imatges ben etiquetades. Posteriorment es dissenyaran diferents arquitectures de xarxes neuronals i s'optimitzarà cadascuna d'elles sobre l'espai de hiperparámetros utilitzant Python i Pytorch. Finalment, s'avaluaran les diferents arquitectures proposades, utilitzant com a criteris d'optimització tant les taxes d'encerts com els costos temporals de computació.García Garví, A. (2020). Diseño, desarrollo y evaluación de algoritmos basados en aprendizaje profundo para automatización de experimentos Lifespan con C. elegans. http://hdl.handle.net/10251/151938TFG

    Robot-Assisted Full Automation Interface: Touch-Response On Zebrafish Larvae

    Get PDF

    Women in Science 2013

    Get PDF
    “Women in Science” summarizes research done by Smith College’s Summer Research Fellowship (SURF) Program participants. Ever since its 1967 start, SURF has been a cornerstone of Smith’s science education. In 2013, 167 students participated in SURF, supervised by 57 faculty mentor-advisors drawn from the Clark Science Center’s fourteen science, mathematics, and engineering departments and programs, and associated centers and units. At summer’s end, SURF participants were asked to summarize their research experiences for this publication.https://scholarworks.smith.edu/clark_womeninscience/1000/thumbnail.jp

    Women in Science 2017

    Get PDF
    Ever since its 1967 start, SURF has been a cornerstone of Smith’s science education. Women in Science 2017 summarizes research done by Smith College’s SURF Program participants during the summer of 2017. 151 students participated in SURF (144 hosted on campus and nearby eld sites), supervised by 58 faculty mentor-advisors drawn from the Clark Science Center and connected to its eighteen science, mathematics, and engineering departments and programs and associated centers and units. At summer’s end, SURF participants summarized their research experiences for this publication.https://scholarworks.smith.edu/clark_womeninscience/1006/thumbnail.jp
    corecore