1,072 research outputs found

    A Mesh Generation and Machine Learning Framework for Drosophila Gene Expression Pattern Image Analysis

    Get PDF
    Background: Multicellular organisms consist of cells of many different types that are established during development. Each type of cell is characterized by the unique combination of expressed gene products as a result of spatiotemporal gene regulation. Currently, a fundamental challenge in regulatory biology is to elucidate the gene expression controls that generate the complex body plans during development. Recent advances in high-throughput biotechnologies have generated spatiotemporal expression patterns for thousands of genes in the model organism fruit fly Drosophila melanogaster. Existing qualitative methods enhanced by a quantitative analysis based on computational tools we present in this paper would provide promising ways for addressing key scientific questions. Results: We develop a set of computational methods and open source tools for identifying co-expressed embryonic domains and the associated genes simultaneously. To map the expression patterns of many genes into the same coordinate space and account for the embryonic shape variations, we develop a mesh generation method to deform a meshed generic ellipse to each individual embryo. We then develop a co-clustering formulation to cluster the genes and the mesh elements, thereby identifying co-expressed embryonic domains and the associated genes simultaneously. Experimental results indicate that the gene and mesh co-clusters can be correlated to key developmental events during the stages of embryogenesis we study. The open source software tool has been made available at http://compbio.cs.odu.edu/fly/. Conclusions: Our mesh generation and machine learning methods and tools improve upon the flexibility, ease-of-use and accuracy of existing methods

    A Mesh Generation and Machine Learning Framework for Drosophila Gene Expression Pattern Image Analysis

    Get PDF
    Background: Multicellular organisms consist of cells of many different types that are established during development. Each type of cell is characterized by the unique combination of expressed gene products as a result of spatiotemporal gene regulation. Currently, a fundamental challenge in regulatory biology is to elucidate the gene expression controls that generate the complex body plans during development. Recent advances in high-throughput biotechnologies have generated spatiotemporal expression patterns for thousands of genes in the model organism fruit fly Drosophila melanogaster. Existing qualitative methods enhanced by a quantitative analysis based on computational tools we present in this paper would provide promising ways for addressing key scientific questions. Results: We develop a set of computational methods and open source tools for identifying co-expressed embryonic domains and the associated genes simultaneously. To map the expression patterns of many genes into the same coordinate space and account for the embryonic shape variations, we develop a mesh generation method to deform a meshed generic ellipse to each individual embryo. We then develop a co-clustering formulation to cluster the genes and the mesh elements, thereby identifying co-expressed embryonic domains and the associated genes simultaneously. Experimental results indicate that the gene and mesh co-clusters can be correlated to key developmental events during the stages of embryogenesis we study. The open source software tool has been made available at http://compbio.cs.odu.edu/fly/. Conclusions: Our mesh generation and machine learning methods and tools improve upon the flexibility, ease-of-use and accuracy of existing methods

    A Computational Framework for Learning from Complex Data: Formulations, Algorithms, and Applications

    Get PDF
    Many real-world processes are dynamically changing over time. As a consequence, the observed complex data generated by these processes also evolve smoothly. For example, in computational biology, the expression data matrices are evolving, since gene expression controls are deployed sequentially during development in many biological processes. Investigations into the spatial and temporal gene expression dynamics are essential for understanding the regulatory biology governing development. In this dissertation, I mainly focus on two types of complex data: genome-wide spatial gene expression patterns in the model organism fruit fly and Allen Brain Atlas mouse brain data. I provide a framework to explore spatiotemporal regulation of gene expression during development. I develop evolutionary co-clustering formulation to identify co-expressed domains and the associated genes simultaneously over different temporal stages using a mesh-generation pipeline. I also propose to employ the deep convolutional neural networks as a multi-layer feature extractor to generate generic representations for gene expression pattern in situ hybridization (ISH) images. Furthermore, I employ the multi-task learning method to fine-tune the pre-trained models with labeled ISH images. My proposed computational methods are evaluated using synthetic data sets and real biological data sets including the gene expression data from the fruit fly BDGP data sets and Allen Developing Mouse Brain Atlas in comparison with baseline existing methods. Experimental results indicate that the proposed representations, formulations, and methods are efficient and effective in annotating and analyzing the large-scale biological data sets

    A mesh generation and machine learning framework for Drosophilagene expression pattern image analysis

    Get PDF
    abstract: Background Multicellular organisms consist of cells of many different types that are established during development. Each type of cell is characterized by the unique combination of expressed gene products as a result of spatiotemporal gene regulation. Currently, a fundamental challenge in regulatory biology is to elucidate the gene expression controls that generate the complex body plans during development. Recent advances in high-throughput biotechnologies have generated spatiotemporal expression patterns for thousands of genes in the model organism fruit fly Drosophila melanogaster. Existing qualitative methods enhanced by a quantitative analysis based on computational tools we present in this paper would provide promising ways for addressing key scientific questions. Results We develop a set of computational methods and open source tools for identifying co-expressed embryonic domains and the associated genes simultaneously. To map the expression patterns of many genes into the same coordinate space and account for the embryonic shape variations, we develop a mesh generation method to deform a meshed generic ellipse to each individual embryo. We then develop a co-clustering formulation to cluster the genes and the mesh elements, thereby identifying co-expressed embryonic domains and the associated genes simultaneously. Experimental results indicate that the gene and mesh co-clusters can be correlated to key developmental events during the stages of embryogenesis we study. The open source software tool has been made available at http://compbio.cs.odu.edu/fly/. Conclusions Our mesh generation and machine learning methods and tools improve upon the flexibility, ease-of-use and accuracy of existing methods.The electronic version of this article is the complete one and can be found online at: http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-14-37

    Bioimage informatics in the context of drosophila research

    Get PDF
    Modern biological research relies heavily on microscopic imaging. The advanced genetic toolkit of drosophila makes it possible to label molecular and cellular components with unprecedented level of specificity necessitating the application of the most sophisticated imaging technologies. Imaging in drosophila spans all scales from single molecules to the entire populations of adult organisms, from electron microscopy to live imaging of developmental processes. As the imaging approaches become more complex and ambitious, there is an increasing need for quantitative, computer-mediated image processing and analysis to make sense of the imagery. Bioimage informatics is an emerging research field that covers all aspects of biological image analysis from data handling, through processing, to quantitative measurements, analysis and data presentation. Some of the most advanced, large scale projects, combining cutting edge imaging with complex bioimage informatics pipelines, are realized in the drosophila research community. In this review, we discuss the current research in biological image analysis specifically relevant to the type of systems level image datasets that are uniquely available for the drosophila model system. We focus on how state-of-the-art computer vision algorithms are impacting the ability of drosophila researchers to analyze biological systems in space and time. We pay particular attention to how these algorithmic advances from computer science are made usable to practicing biologists through open source platforms and how biologists can themselves participate in their further development

    Towards Smarter Fluorescence Microscopy: Enabling Adaptive Acquisition Strategies With Optimized Photon Budget

    Get PDF
    Fluorescence microscopy is an invaluable technique for studying the intricate process of organism development. The acquisition process, however, is associated with the fundamental trade-off between the quality and reliability of the acquired data. On one hand, the goal of capturing the development in its entirety, often times across multiple spatial and temporal scales, requires extended acquisition periods. On the other hand, high doses of light required for such experiments are harmful for living samples and can introduce non-physiological artifacts in the normal course of development. Conventionally, a single set of acquisition parameters is chosen in the beginning of the acquisition and constitutes the experimenter’s best guess of the overall optimal configuration within the aforementioned trade-off. In the paradigm of adaptive microscopy, in turn, one aims at achieving more efficient photon budget distribution by dynamically adjusting the acquisition parameters to the changing properties of the sample. In this thesis, I explore the principles of adaptive microscopy and propose a range of improvements for two real imaging scenarios. Chapter 2 summarizes the design and implementation of an adaptive pipeline for efficient observation of the asymmetrically dividing neurogenic progenitors in Zebrafish retina. In the described approach the fast and expensive acquisition mode is automatically activated only when the mitotic cells are present in the field of view. The method illustrates the benefits of the adaptive acquisition in the common scenario of the individual events of interest being sparsely distributed throughout the duration of the acquisition. Chapter 3 focuses on computational aspects of segmentation-based adaptive schemes for efficient acquisition of the developing Drosophila pupal wing. Fast sample segmentation is shown to provide a valuable output for the accurate evaluation of the sample morphology and dynamics in real time. This knowledge proves instrumental for adjusting the acquisition parameters to the current properties of the sample and reducing the required photon budget with minimal effects to the quality of the acquired data. Chapter 4 addresses the generation of synthetic training data for learning-based methods in bioimage analysis, making them more practical and accessible for smart microscopy pipelines. State-of-the-art deep learning models trained exclusively on the generated synthetic data are shown to yield powerful predictions when applied to the real microscopy images. In the end, in-depth evaluation of the segmentation quality of both real and synthetic data-based models illustrates the important practical aspects of the approach and outlines the directions for further research

    A comparison of machine learning techniques for detection of drug target articles

    Get PDF
    Important progress in treating diseases has been possible thanks to the identification of drug targets. Drug targets are the molecular structures whose abnormal activity, associated to a disease, can be modified by drugs, improving the health of patients. Pharmaceutical industry needs to give priority to their identification and validation in order to reduce the long and costly drug development times. In the last two decades, our knowledge about drugs, their mechanisms of action and drug targets has rapidly increased. Nevertheless, most of this knowledge is hidden in millions of medical articles and textbooks. Extracting knowledge from this large amount of unstructured information is a laborious job, even for human experts. Drug target articles identification, a crucial first step toward the automatic extraction of information from texts, constitutes the aim of this paper. A comparison of several machine learning techniques has been performed in order to obtain a satisfactory classifier for detecting drug target articles using semantic information from biomedical resources such as the Unified Medical Language System. The best result has been achieved by a Fuzzy Lattice Reasoning classifier, which reaches 98% of ROC area measure.This research paper is supported by Projects TIN2007-67407- C03-01, S-0505/TIC-0267 and MICINN project TEXT-ENTERPRISE 2.0 TIN2009-13391-C04-03 (Plan I + D + i), as well as for the Juan de la Cierva program of the MICINN of SpainPublicad

    Unsupervised behavioral classification with 3D pose data from tethered Drosophila melanogaster

    Get PDF
    Tese de mestrado integrado em Engenharia Biomédica e Biofísica (Biofísica Médica e Fisiologia de Sistemas), Universidade de Lisboa, Faculdade de Ciências, 2020O comportamento animal e guiado por instruções geneticamente codificadas, com contribuições do meio envolvente e experiências antecedentes. O mesmo pode ser considerado como o derradeiro output da atividade neuronal, pelo que o estudo do comportamento animal constitui um meio de compreensão dos mecanismos subjacentes ao funcionamento do cérebro animal. Para desvendar a correspondência entre cérebro e comportamento são necessárias ferramentas que consigam medir um comportamento de forma precisa, apreciável e coerente. O domínio científico responsável pelo estudo dos comportamentos dos animais denomina-se Etologia. No início do seculo XX, os etólogos categorizavam comportamentos animais com recurso as suas próprias intuições e experiência. Consequentemente, as suas avaliações eram subjetivas e desprovidas de comportamentos que os etólogos não considerassem a priori. Com o ressurgimento de novas técnicas de captura e analise de comportamentos, os etólogos transitaram para paradigmas mais objetivos, quantitativos da medição de comportamentos. Tais ferramentas analíticas fomentaram a construção de datasets comportamentais que, por sua vez, promoveram o desenvolvimento de softwares para a quantificação de comportamentos: rastreamento de trajetórias, classificação de ações, analise de padrões comportamentais em grandes escalas consistem nos exemplos mais preeminentes. Este trabalho encontra-se inserido na segunda categoria referida (classificação de ações). Os classificadores de ações dividem-se consoante são supervisionados ou não-supervisionados. A primeira categoria compreende classificadores treinados para reconhecer padrões específicos, definidos por um especialista humano. Esta categoria de classificadores e encontra-se limitada por: 1) necessitar de um processo extenuado de anotação de frames para treino do classificador; 2) subjetividade face ao especialista que classifica os mesmos frames, 3) baixa dimensionalidade, na medida em que a classificação reduz os complexos comportamentos a um só rotulo; 4) assunções erróneas; 5) preconceito humano face aos comportamentos observados. Por sua vez, os classificadores não-supervisionados seguem exaustivamente uma formula: 1) computer vision e empregue para a extração das características posturais do animal; 2) dá-se o pré-processamento dos dados, que inclui um modulo vital que envolve a construção de uma representação dinâmico-postural das ações do animal, de forma a capturar os elementos dinâmicos do comportamento; 3) segue-se um modulo opcional de redução de dimensionalidade, caso o utilizador deseje visualizar diretamente os dados num espaço de reduzidas dimensões; 4) efetua-se a atribuição de um rótulo a cada elemento dos dados, por via de um algoritmo que opera quer diretamente no espaço de alta dimensão, ou no de baixa dimensão, resultante do passo anterior. O objetivo deste trabalho passa por alcançar uma classificação objetiva e reproduzível, de forma não-supervisionada de frames de Drosophila melanogaster suspensas numa bola que flutua no ar, tentando minimizar o número de intuições requeridas para o efeito e, se possível, dissipar a influência dos aspetos morfológicos de cada individuo (garantindo assim uma classificação generalizada dos comportamentos destes insetos). Para alcançar tal classificação, este estudo recorre a uma ferramenta recém desenvolvida que regista a pose tridimensional de Drosophila fixas, o DeepFly3D, para construir um dataset com as coordenadas x-, y- e z-, ao longo do tempo, das posições de referência de um conjunto de três genótipos de Drosophila melanogaster (linhas aDN>CsChrimson, MDN-GAL4/+ e aDNGAL4/+). Sucede-se uma operação inovadora de normalização que recorre ao cálculo de ângulos entre pontos de referência adjacentes, como as articulações, antenas e riscas dorsais das moscas, por via de relações trigonométricas e a definição dos planos anatómicos das moscas, que visa atenuar os pesos das diferenças morfológicas das moscas, ou a sua orientação relativa as camaras do DeepFly3D, para o classificador. O modulo de normalização e sucedido por outro de analise de frequência, focado na extração das frequências relevantes nas series temporais dos ângulos calculados, bem como dos seus pesos relativos. O produto final do pré-processamento consiste numa matriz com a norma dos ditos pesos – a matriz de expressão do espaço dinâmico-postural. Subsequentemente, seguem-se os módulos de redução de dimensionalidade e de atribuição de clusters (pontos 3) e 4) do paragrafo anterior). Para os mesmos, são propostas seis configurações possíveis de algoritmos, submetidas de imediato a uma anélise comparativa, de forma a determinar a mais apta para classificar este tipo de dados. Os algoritmos de redução de dimensionalidade aqui postos a prova são o t-SNE (t-distributed Stochastic Neighbor Embedding) e o PCA (Principal Component Analysis), enquanto que os algoritmos de clustering comparados são o Watershed, GMM-posterior probability assignment e o HDBSCAN (Hierarchical Density Based Spatial Clustering of Applications with Noise). Cada uma das pipelines candidatas e finalmente avaliada mediante a observação dos vídeos inclusos nos clusters produzidos e, dado o vasto numero destes vídeos, bem como a possibilidade de uma validação subjetiva face a observadores distintos, com o auxilio de métricas que expressam determinados critérios abrangentes de qualidade dos clusters: 1) Fly uncompactness, que avalia a eficiência do modulo de normalização com ângulos de referencia da mosca; 2) Homogeneity, que procura garantir que os clusters não refletem a identidade ou o genótipo das moscas; 3) Cluster entropy, que afere a previsibilidade das transições entre os clusters; 4) Mean dwell time, que pondera o tempo que um individuo demora em media a realizar uma Acão. Dois critérios auxiliares extra são ainda considerados: o número de parâmetros que foram estimados pelo utilizador (quanto maior, mais limitada e a reprodutibilidade da pipeline) e o tempo de execução do algoritmo (que deve ser igualmente minimizado). Apesar de manter alguma subjetividade face aquilo a que o utilizador considera um “bom” cluster, a inclusão das métricas aproxima esta abordagem a um cenário ideal de completa autonomia entre a conceção de uma definição de comportamento, e a validação dos resultados que decorrem das suas conjeturas. Os desempenhos das pipelines candidatas divergiram largamente: os espaços resultantes das operações de redução de dimensionalidade demonstram-se heterogéneos e anisotrópicos, com a presença de sequências de pontos que tomam formas vermiformes, ao invés de um antecipado conglomerado de pontos desassociados. Estas trajetórias vermiformes limitam o desempenho dos algoritmos de clustering que operam nos espaços de baixas (duas, neste caso) dimensões. A ausência de um passo intermedio de amostragem do espaço dinâmico-postural explica a génese destas trajetórias vermiformes. Não obstante, as pipelines que praticam redução de dimensionalidade geraram melhores resultados que a pipeline que recorre a clustering com HDBSCAN diretamente sobre a matriz de expressão do espaço dinâmico-postural. A combinação mais fortuita de módulos de redução de dimensionalidade e clustering adveio da pipeline PCA30-t-SNE2-GMM. Embora não sejam absolutamente consistentes, os clusters resultantes desta pipeline incluem um comportamento que se sobressai face aos demais que se encontram inseridos no mesmo cluster (erroneamente). Lacunas destes clusters envolvem sobretudo a ocasional fusão de dois comportamentos distintos no mesmo cluster, ou a presença inoportuna de sequências de comportamentos nas quais a mosca se encontra imóvel (provavelmente o resultado de pequenos erros de deteção produzidos pelo DeepFly3D). Para mais, a pipeline PCA30-t-SNE2-GMM foi capaz de reconhecer diferenças no fenótipo comportamental de moscas, validadas pelas linhas genéticas das mesmas. Apesar dos resultados obtidos manifestarem visíveis melhorias face aqueles produzidos por abordagens semelhantes, sobretudo a nível de vídeos dos clusters, uma vez que só uma das abordagens inclui métricas de sucesso dos clusters, alguns aspetos desta abordagem requerem correções: a inclusão de uma etapa de amostragem, sucedida de um novo algoritmo que fosse capaz de realizar reduções de dimensionalidade consistentes, de forma a reunir todos os pontos no mesmo espaço embutido será possivelmente a característica mais capaz de acrescentar valor a esta abordagem. Futuras abordagens não deverão descurar o contributo de múltiplas representações comportamentais que possam vir a validar-se mutuamente, substituindo a necessidade de métricas de sucesso definidas pelos utilizadores.One of the preeminent challenges of Behavioral Neuroscience is the understanding of how the brain works and how it ultimately commands an animal’s behavior. Solving this brain-behavior linkage requires, on one end, precise, meaningful and coherent techniques for measuring behavior. Rapid technical developments in tools for collecting and analyzing behavioral data, paired with the immaturity of current approaches, motivate an ongoing search for systematic, unbiased behavioral classification techniques. To accomplish such a classification, this study employs a state-of-the-art tool for tracking 3D pose of tethered Drosophila, DeepFly3D, to collect a dataset of x-, y- and z- landmark positions over time, from tethered Drosophila melanogaster moving over an air-suspended ball. This is succeeded by unprecedented normalization across individual flies by computing the angles between adjoining landmarks, followed by standard wavelet analysis. Subsequently, six unsupervised behavior classification techniques are compared - four of which follow proven formulas, while the remaining two are experimental. Lastly, their performances are evaluated via meaningful metric scores along with cluster video assessment, as to ensure a fully unbiased cycle - from the conjecturing of a definition of behavior to the corroboration of the results that stem from its assumptions. Performances from different techniques varied significantly. Techniques that perform clustering in embedded low- (two-) dimensional spaces struggled with their heterogeneous and anisotropic nature. High-dimensional clustering techniques revealed that these properties emerged from the original highdimensional posture-dynamics spaces. Nonetheless, high and low-dimensional spaces disagree on the arrangement of their elements, with embedded data points showing hierarchical organization, which was lacking prior to their embedding. Low-dimensional clustering techniques were globally a better match against these spatial features and yielded more suitable results. Their candidate embedding algorithms alone were capable of revealing dissimilarities in preferred behaviors among contrasting genotypes of Drosophila. Lastly, the top-ranking classification technique produced satisfactory behavioral cluster videos (despite the irregular allocation of rest labels) in a consistent and repeatable manner, while requiring a marginal number of hand tuned parameters

    Image analysis platforms for exploring genetic and neuronal mechanisms regulating animal behavior

    Get PDF
    An important aim of neuroscience is to understand how gene interactions and neuronal networks regulate animal behavior. The larvae of the marine annelid Platynereis dumerilii provide a convenient system for such integrative studies. These larvae exhibit a wide range of behaviors, including phototaxis, chemotaxis and gravitaxis and at the same time exhibit relatively simple nervous system organization. Due to its small size and transparent body, the Platynereis larva is compatible with whole-body light microscopic imaging following tissue staining protocols. It is also suitable for serial electron microscopic imaging and subsequent neuronal connectome reconstruction. Despite advances in imaging techniques, automated computational tools for large data analysis are not well-established in Platynereis. In the current work, I developed image analysis software for exploring genetic and nervous system mechanisms modulating Platynereis behavior. Exploring gene expression patterns Current labeling and imaging techniques restrict the number of gene expression patterns that can be labelled and visualized in a single specimen, which hinders the study of behaviors driven by multi-molecular interactions. To address this problem, I employed image registration to generate a gene expression atlas that integrates gene expression information from multiple specimens in a common reference space. The gene expression atlas was used to investigate mechanisms regulating larval locomotion, settlement and phototaxis in Platynereis. The atlas can assist in the identification of inter-individual and inter-species variations in gene expression. To provide a representation convenient for exploring gene expression patterns, I created a model of the atlas using 3D graphics software, which enabled convenient data visualization and efficient data storage and sharing. Exploring neuronal networks regulating behavior Neuronal circuitry can be reconstructed from the images obtained from electron microscopy, which resolves very fine structures such as neuron morphology or synapses. The amount of data resulting from electron microscopy and the complexity of neuronal networks represent a significant challenge for manual analysis. To solve this problem, I developed the NeuroDetective software, which models a neuronal circuitry and analyzes the information flow within it. The software combines the advantages of 3D visualization and graph analysis software by integrating neuron morphology and spatial distribution together with synaptic connectivity. NeuroDetective allowed studying the neuronal circuitry responsible for phototaxis in Platynereis larvae, revealing the connections and the neurons important for the network functionality. NeuroDetective facilitated the establishment of a relationship between the function and the structure of the neuronal circuitry in Platynereis phototaxis. Integrating gene expression patterns with neuronal connectivity Neuronal circuitry and its associated modulating biomolecules, such as neurotransmitters and neuropeptides, are thought to be the main factors regulating animal behavior. Therefore it was important to integrate both genetic and neuronal information in order to fully understand how biomolecules in conjunction with neuronal anatomy elicit certain animal behavior. To resolve the difference in specimen preparation for gene expression versus electron microscopy preparations, I developed an image registration procedure to match the signals from these two different datasets. This method enabled the integration the spatial distribution of specific modulators into the analysis of neuronal networks, leading to an improved understanding of the genetic and neuronal mechanisms modulating behavior in Platynereis
    corecore