63 research outputs found
Emerging semantics to link phenotype and environment
abstract: Understanding the interplay between environmental conditions and phenotypes is a fundamental goal of biology. Unfortunately, data that include observations on phenotype and environment are highly heterogeneous and thus difficult to find and integrate. One approach that is likely to improve the status quo involves the use of ontologies to standardize and link data about phenotypes and environments. Specifying and linking data through ontologies will allow researchers to increase the scope and flexibility of large-scale analyses aided by modern computing methods. Investments in this area would advance diverse fields such as ecology, phylogenetics, and conservation biology. While several biological ontologies are well-developed, using them to link phenotypes and environments is rare because of gaps in ontological coverage and limits to interoperability among ontologies and disciplines. In this manuscript, we present (1) use cases from diverse disciplines to illustrate questions that could be answered more efficiently using a robust linkage between phenotypes and environments, (2) two proof-of-concept analyses that show the value of linking phenotypes to environments in fishes and amphibians, and (3) two proposed example data models for linking phenotypes and environments using the extensible observation ontology (OBOE) and the Biological Collections Ontology (BCO); these provide a starting point for the development of a data model linking phenotypes and environments.The final version of this article, as published in PeerJ, can be viewed online at: https://peerj.com/articles/1470
Genome sequence of adherent-invasive Escherichia coli and comparative genomic analysis with other E. coli pathotypes
<p>Abstract</p> <p>Background</p> <p>Adherent and invasive <it>Escherichia coli </it>(AIEC) are commonly found in ileal lesions of Crohn's Disease (CD) patients, where they adhere to intestinal epithelial cells and invade into and survive in epithelial cells and macrophages, thereby gaining access to a typically restricted host niche. Colonization leads to strong inflammatory responses in the gut suggesting that AIEC could play a role in CD immunopathology. Despite extensive investigation, the genetic determinants accounting for the AIEC phenotype remain poorly defined. To address this, we present the complete genome sequence of an AIEC, revealing the genetic blueprint for this disease-associated <it>E. coli </it>pathotype.</p> <p>Results</p> <p>We sequenced the complete genome of <it>E. coli </it>NRG857c (O83:H1), a clinical isolate of AIEC from the ileum of a Crohn's Disease patient. Our sequence data confirmed a phylogenetic linkage between AIEC and extraintestinal pathogenic <it>E. coli </it>causing urinary tract infections and neonatal meningitis. The comparison of the NRG857c AIEC genome with other pathogenic and commensal <it>E. coli </it>allowed for the identification of unique genetic features of the AIEC pathotype, including 41 genomic islands, and unique genes that are found only in strains exhibiting the adherent and invasive phenotype.</p> <p>Conclusions</p> <p>Up to now, the virulence-like features associated with AIEC are detectable only phenotypically. AIEC genome sequence data will facilitate the identification of genetic determinants implicated in invasion and intracellular growth, as well as enable functional genomic studies of AIEC gene expression during health and disease.</p
Systems biology approaches to a rational drug discovery paradigm
The published manuscript is available at EurekaSelect via http://www.eurekaselect.com/openurl/content.php?genre=article&doi=10.2174/1568026615666150826114524.Prathipati P., Mizuguchi K.. Systems biology approaches to a rational drug discovery paradigm. Current Topics in Medicinal Chemistry, 16, 9, 1009. https://doi.org/10.2174/1568026615666150826114524
Genome-wide expression profiling and bioinformatics analysis of diurnally regulated genes in the mouse prefrontal cortex
BACKGROUND: The prefrontal cortex is important in regulating sleep and mood. Diurnally regulated genes in the prefrontal cortex may be controlled by the circadian system, by sleep:wake states, or by cellular metabolism or environmental responses. Bioinformatics analysis of these genes will provide insights into a wide-range of pathways that are involved in the pathophysiology of sleep disorders and psychiatric disorders with sleep disturbances. RESULTS: We examined gene expression in the mouse prefrontal cortex at four time points during a 24 hour (12 hour light:12 hour dark) cycle using microarrays, and identified 3,890 transcripts corresponding to 2,927 genes with diurnally regulated expression patterns. We show that 16% of the genes identified in our study are orthologs of identified clock, clock controlled or sleep/wakefulness induced genes in the mouse liver and suprachiasmatic nucleus, rat cortex and cerebellum, or Drosophila head. The diurnal expression patterns were confirmed for 16 out of 18 genes in an independent set of RNA samples. The diurnal genes fall into eight temporal categories with distinct functional attributes, as assessed by Gene Ontology classification and analysis of enriched transcription factor binding sites. CONCLUSION: Our analysis demonstrates that approximately 10% of transcripts have diurnally regulated expression patterns in the mouse prefrontal cortex. Functional annotation of these genes will be important for the selection of candidate genes for behavioral mutants in the mouse and for genetic studies of disorders associated with anomalies in the sleep:wake cycle and circadian rhythm
Next-generation information systems for genomics
NIH Grant no. HG00739The advent of next-generation sequencing technologies is transforming
biology by enabling individual researchers to sequence the
genomes of individual organisms or cells on a massive scale. In order
to realize the translational potential of this technology we will need
advanced information systems to integrate and interpret this deluge
of data. These systems must be capable of extracting the location and
function of genes and biological features from genomic data, requiring
the coordinated parallel execution of multiple bioinformatics analyses
and intelligent synthesis of the results. The resulting databases must
be structured to allow complex biological knowledge to be recorded
in a computable way, which requires the development of logic-based
knowledge structures called ontologies. To visualise and manipulate
the results, new graphical interfaces and knowledge acquisition tools
are required. Finally, to help understand complex disease processes,
these information systems must be equipped with the capability to
integrate and make inferences over multiple data sets derived from
numerous sources.
RESULTS:
Here I describe research, design and implementation of some of
the components of such a next-generation information system. I first
describe the automated pipeline system used for the annotation of
the Drosophila genome, and the application of this system in genomic
research. This was succeeded by the development of a flexible graphoriented
database system called Chado, which relies on the use of
ontologies for structuring data and knowledge. I also describe research
to develop, restructure and enhance a number of biological
ontologies, adding a layer of logical semantics that increases the computability
of these key knowledge sources. The resulting database and
ontology collection can be accessed through a suite of tools. Finally
I describe how the combination of genome analysis, ontology-based
database representation and powerful tools can be combined in order
to make inferences about genotype-phenotype relationships within and
across species.
CONCLUSION:
The large volumes of complex data generated by high-throughput
genomic and systems biology technology threatens to overwhelm us,
unless we can devise better computing tools to assist us with its analysis.
Ontologies are key technologies, but many existing ontologies are
not interoperable or lack features that make them computable. Here
I have shown how concerted ontology, tool and database development
can be applied to make inferences of value to translational research
The evolutionary role of human-specific genomic events
In the short evolutionary time since the human-chimpanzee divergence, approximately 6.6 million years ago, humans have acquired a range of traits that are unique among primates. These include tripling brain size, enhanced cognitive abilities, complex culture, descended larynx structure that enables spoken language, longevity, specific diseases, inferior olfaction, and (in some human populations) adult lactase persistence. These traits were likely to have evolved through various genomic mechanisms, among them gene duplications and gene-culture co-evolution. Several studies have estimated the dates for some of these human lineage genomic events. However, no study to date has performed a genomewide estimate of the dates of all human gene duplications. Moreover, as many of these traits were likely to have evolved via gene-culture coevolutionary mechanisms, investigating the evolution of one of these human-specific traits – lactase persistence – provides a model example for in-depth future investigations of specific human phenotypes.
In this study I have investigated an important class of human-specific genomic events – gene duplications (otherwise known as human inparalogues). I have developed a new bioinformatics approach for detecting human lineage-specific inparalogues and the duplication dates for those genes. I show that human-specific inparalogues are non-randomly distributed among biological function classes, and their duplication event dates are non-randomly distributed on a timeline between the date of the human-chimpanzee split and the present. I have also investigated the evolution of the human-specific polymorphic trait – lactase persistence. I have performed a worldwide correlation analysis comparing frequency data on all currently known lactase persistence-associated alleles and the distribution of the lactase persistence phenotype in different human populations. I have also performed a gene-culture co-evolution analysis, employing spatially explicit simulation and Approximate Bayesian Computation to condition simulations on genetic and archaeological data, in order to make inferences on the evolution of lactase persistence and dairying in Europe
Systems Analytics and Integration of Big Omics Data
A “genotype"" is essentially an organism's full hereditary information which is obtained from its parents. A ""phenotype"" is an organism's actual observed physical and behavioral properties. These may include traits such as morphology, size, height, eye color, metabolism, etc. One of the pressing challenges in computational and systems biology is genotype-to-phenotype prediction. This is challenging given the amount of data generated by modern Omics technologies. This “Big Data” is so large and complex that traditional data processing applications are not up to the task. Challenges arise in collection, analysis, mining, sharing, transfer, visualization, archiving, and integration of these data. In this Special Issue, there is a focus on the systems-level analysis of Omics data, recent developments in gene ontology annotation, and advances in biological pathways and network biology. The integration of Omics data with clinical and biomedical data using machine learning is explored. This Special Issue covers new methodologies in the context of gene–environment interactions, tissue-specific gene expression, and how external factors or host genetics impact the microbiome
- …