1,879 research outputs found

    GeneTrees: a phylogenomics resource for prokaryotes

    Get PDF
    The GeneTrees phylogenomics system pursues comparative genomic analyses from the perspective of gene phylogenies for individual genes. The GeneTrees project has the goal of providing detailed evolutionary models for all protein-coding gene components of the fully sequenced genomes. Currently, a database of alignments and trees for all protein sequences for 325 fully sequenced and annotated prokaryote genomes is available. The prokaryote database contains 890 000 protein sequences organized into over 100 000 alignments, each described by a phylogenetic tree. An original homology group discovery tool assembles sets of related proteins from all versus all pairwise alignments. Multiple alignments for each homology group are stored and subjected to phylogenetic tree inference. A graphical web interface provides visual exploration of the GeneTrees database. Homology groups can be queried by sequence identifiers or annotation terms. Genomes can be browsed visually on a gene map of each chromosome or plasmid. Phylogenetic trees with support values are displayed in conjunction with the associated sequence alignment. A variety of classes of information can be selected to label the tree tips to aid in visual evaluation of annotation and gene function. This web interface is available at

    Automated simultaneous analysis phylogenetics (ASAP) : an enabling tool for phlyogenomics

    Get PDF
    Ā© 2008 Sarkar et al. This is an Open Access article distributed under the terms of the Creative Commons Attribution License 2.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The definitive version was published in BMC Bioinformatics 9 (2008): 103, doi:10.1186/1471-2105-9-103.The availability of sequences from whole genomes to reconstruct the tree of life has the potential to enable the development of phylogenomic hypotheses in ways that have not been before possible. A significant bottleneck in the analysis of genomic-scale views of the tree of life is the time required for manual curation of genomic data into multi-gene phylogenetic matrices. To keep pace with the exponentially growing volume of molecular data in the genomic era, we have developed an automated technique, ASAP (Automated Simultaneous Analysis Phylogenetics), to assemble these multigene/multi species matrices and to evaluate the significance of individual genes within the context of a given phylogenetic hypothesis. Applications of ASAP may enable scientists to re-evaluate species relationships and to develop new phylogenomic hypotheses based on genome-scale data.This work is funded in part by NSF DBI-0421604 to GC and RD. INS is supported in part by the Ellison Medical Foundation

    Chloroplast genome sequencing analysis of Heterosigma akashiwo CCMP452 (West Atlantic) and NIES293 (West Pacific) strains

    Get PDF
    Background: Heterokont algae form a monophyletic group within the stramenopile branch of the tree of life. These organisms display wide morphological diversity, ranging from minute unicells to massive, bladed forms. Surprisingly, chloroplast genome sequences are available only for diatoms, representing two (Coscinodiscophyceae and Bacillariophyceae) of approximately 18 classes of algae that comprise this taxonomic cluster. A universal challenge to chloroplast genome sequencing studies is the retrieval of highly purified DNA in quantities sufficient for analytical processing. To circumvent this problem, we have developed a simplified method for sequencing chloroplast genomes, using fosmids selected from a total cellular DNA library. The technique has been used to sequence chloroplast DNA of two Heterosigma akashiwo strains. This raphidophyte has served as a model system for studies of stramenopile chloroplast biogenesis and evolution. Results: H. akashiwo strain CCMP452 (West Atlantic) chloroplast DNA is 160,149 bp in size with a 21,822-bp inverted repeat, whereas NIES293 (West Pacific) chloroplast DNA is 159,370 bp in size and has an inverted repeat of 21,665 bp. The fosmid cloning technique reveals that both strains contain an isomeric chloroplast DNA population resulting from an inversion of their single copy domains. Both strains contain multiple small inverted and tandem repeats, non-randomly distributed within the genomes. Although both CCMP452 and NIES293 chloroplast DNAs contains 197 genes, multiple nucleotide polymorphisms are present in both coding and intergenic regions. Several protein-coding genes contain large, in-frame inserts relative to orthologous genes in other plastids. These inserts are maintained in mRNA products. Two genes of interest in H. akashiwo, not previously reported in any chloroplast genome, include tyrC, a tyrosine recombinase, which we hypothesize may be a result of a lateral gene transfer event, and an unidentified 456 amino acid protein, which we hypothesize serves as a G-protein-coupled receptor. The H. akashiwo chloroplast genomes share little synteny with other algal chloroplast genomes sequenced to date. Conclusion: The fosmid cloning technique eliminates chloroplast isolation, does not require chloroplast DNA purification, and reduces sequencing processing time. Application of this method has provided new insights into chloroplast genome architecture, gene content and evolution within the stramenopile cluster

    Geometry Processing of Conventionally Produced Mouse Brain Slice Images

    Full text link
    Brain mapping research in most neuroanatomical laboratories relies on conventional processing techniques, which often introduce histological artifacts such as tissue tears and tissue loss. In this paper we present techniques and algorithms for automatic registration and 3D reconstruction of conventionally produced mouse brain slices in a standardized atlas space. This is achieved first by constructing a virtual 3D mouse brain model from annotated slices of Allen Reference Atlas (ARA). Virtual re-slicing of the reconstructed model generates ARA-based slice images corresponding to the microscopic images of histological brain sections. These image pairs are aligned using a geometric approach through contour images. Histological artifacts in the microscopic images are detected and removed using Constrained Delaunay Triangulation before performing global alignment. Finally, non-linear registration is performed by solving Laplace's equation with Dirichlet boundary conditions. Our methods provide significant improvements over previously reported registration techniques for the tested slices in 3D space, especially on slices with significant histological artifacts. Further, as an application we count the number of neurons in various anatomical regions using a dataset of 51 microscopic slices from a single mouse brain. This work represents a significant contribution to this subfield of neuroscience as it provides tools to neuroanatomist for analyzing and processing histological data.Comment: 14 pages, 11 figure

    PhyloPattern: regular expressions to identify complex patterns in phylogenetic trees

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>To effectively apply evolutionary concepts in genome-scale studies, large numbers of phylogenetic trees have to be automatically analysed, at a level approaching human expertise. Complex architectures must be recognized within the trees, so that associated information can be extracted.</p> <p>Results</p> <p>Here, we present a new software library, PhyloPattern, for automating tree manipulations and analysis. PhyloPattern includes three main modules, which address essential tasks in high-throughput phylogenetic tree analysis: node annotation, pattern matching, and tree comparison. PhyloPattern thus allows the programmer to focus on: i) the use of predefined or user defined annotation functions to perform immediate or deferred evaluation of node properties, ii) the search for user-defined patterns in large phylogenetic trees, iii) the pairwise comparison of trees by dynamically generating patterns from one tree and applying them to the other.</p> <p>Conclusion</p> <p>PhyloPattern greatly simplifies and accelerates the work of the computer scientist in the evolutionary biology field. The library has been used to automatically identify phylogenetic evidence for domain shuffling or gene loss events in the evolutionary histories of protein sequences. However any workflow that relies on phylogenetic tree analysis, could be automated with PhyloPattern.</p

    Strategies for Reliable Exploitation of Evolutionary Concepts in High Throughput Biology

    Get PDF
    The recent availability of the complete genome sequences of a large number of model organisms, together with the immense amount of data being produced by the new high-throughput technologies, means that we can now begin comparative analyses to understand the mechanisms involved in the evolution of the genome and their consequences in the study of biological systems. Phylogenetic approaches provide a unique conceptual framework for performing comparative analyses of all this data, for propagating information between different systems and for predicting or inferring new knowledge. As a result, phylogeny-based inference systems are now playing an increasingly important role in most areas of high throughput genomics, including studies of promoters (phylogenetic footprinting), interactomes (based on the presence and degree of conservation of interacting proteins), and in comparisons of transcriptomes or proteomes (phylogenetic proximity and co-regulation/co-expression). Here we review the recent developments aimed at making automatic, reliable phylogeny-based inference feasible in large-scale projects. We also discuss how evolutionary concepts and phylogeny-based inference strategies are now being exploited in order to understand the evolution and function of biological systems. Such advances will be fundamental for the success of the emerging disciplines of systems biology and synthetic biology, and will have wide-reaching effects in applied fields such as biotechnology, medicine and pharmacology

    Horizontal gene transfer and the evolution of transcriptional regulation in Escherichia coli

    Get PDF
    Most Escherichia coli transcription factors have paralogs, but these usually arose by horizontal gene transfer rather than by duplication within the E. coli lineage, as previously believed

    How to describe a cell: a path to automated versatile characterization of cells in imaging data

    Get PDF
    A cell is the basic functional unit of life. Most ulticellular organisms, including animals, are composed of a variety of different cell types that fulfil distinct roles. Within an organism, all cells share the same genome, however, their diverse genetic programs lead them to acquire different molecular and anatomical characteristics. Describing these characteristics is essential for understanding how cellular diversity emerged and how it contributes to the organism function. Probing cellular appearance by microscopy methods is the original way of describing cell types and the main approach to characterise cellular morphology and position in the organism. Present cutting-edge microscopy techniques generate immense amounts of data, requiring efficient automated unbiased methods of analysis. Not only can such methods accelerate the process of scientific discovery, they should also facilitate large-scale systematic reproducible analysis. The necessity of processing big datasets has led to development of intricate image analysis pipelines, however, they are mostly tailored to a particular dataset and a specific research question. In this thesis I aimed to address the problem of creating more general fully-automated ways of describing cells in different imaging modalities, with a specific focus on deep neural networks as a promising solution for extracting rich general-purpose features from the analysed data. I further target the problem of integrating multiple data modalities to generate a detailed description of cells on the whole-organism level. First, on two examples of cell analysis projects, I show how using automated image analysis pipelines and neural networks in particular, can assist characterising cells in microscopy data. In the first project I analyse a movie of drosophila embryo development to elucidate the difference in myosin patterns between two populations of cells with different shape fate. In the second project I develop a pipeline for automatic cell classification in a new imaging modality to show that the quality of the data is sufficient to tell apart cell types in a volume of mouse brain cortex. Next, I present an extensive collaborative effort aimed at generating a whole-body multimodal cell atlas of a three-segmented Platynereis dumerilii worm, combining high resolution morphology and gene expression. To generate a multi-sided description of cells in the atlas I create a pipeline for assigning coherent denoised gene expression profiles, obtained from spatial gene expression maps, to cells segmented in the EM volume. Finally, as the main project of this thesis, I focus on extracting comprehensive unbiased cell morphology features from an EM volume of Platynereis dumerilii. I design a fully unsupervised neural network pipeline for extracting rich morphological representations that enable grouping cells into morphological cell classes with characteristic gene expression. I further show how such descriptors could be used to explore the morphological diversity of cells, tissues and organs in the dataset

    Clinical Document Classification Using Labeled and Unlabeled Data Across Hospitals

    Full text link
    Reviewing radiology reports in emergency departments is an essential but laborious task. Timely follow-up of patients with abnormal cases in their radiology reports may dramatically affect the patient's outcome, especially if they have been discharged with a different initial diagnosis. Machine learning approaches have been devised to expedite the process and detect the cases that demand instant follow up. However, these approaches require a large amount of labeled data to train reliable predictive models. Preparing such a large dataset, which needs to be manually annotated by health professionals, is costly and time-consuming. This paper investigates a semi-supervised learning framework for radiology report classification across three hospitals. The main goal is to leverage clinical unlabeled data in order to augment the learning process where limited labeled data is available. To further improve the classification performance, we also integrate a transfer learning technique into the semi-supervised learning pipeline . Our experimental findings show that (1) convolutional neural networks (CNNs), while being independent of any problem-specific feature engineering, achieve significantly higher effectiveness compared to conventional supervised learning approaches, (2) leveraging unlabeled data in training a CNN-based classifier reduces the dependency on labeled data by more than 50% to reach the same performance of a fully supervised CNN, and (3) transferring the knowledge gained from available labeled data in an external source hospital significantly improves the performance of a semi-supervised CNN model over their fully supervised counterparts in a target hospital

    A simple, fast, and accurate method of phylogenomic inference

    Get PDF
    An automated pipeline for phylogenomic analysis (AMPHORA) is presented that overcomes existing limits to large-scale protein phylogenetic inference
    • ā€¦
    corecore