52 research outputs found

    Widespread Genotype-Phenotype Correlations in Intellectual Disability

    Get PDF
    Background: Linking genotype to phenotype is a major aim of genetics research, yet the underlying biochemical mechanisms of many complex conditions continue to remain elusive. Recent research provides evidence that relevant gene-phenotype associations are discoverable in the study of intellectual disability (ID). Here we expand on that work, identifying distinctive gene interaction modules with unique enrichment patterns reflective of associated clinical features in ID.Methods: Two hundred twelve forms of monogenic ID were curated according to comorbidities with autism and epilepsy. These groups were further subdivided according to secondary clinical manifestations of complex vs. simple facial dysmorphia and neurodegenerative-like features due to their clinical prominence, modest symptom overlap, and probable etiological divergence. An aggregate gene interaction ID network for these phenotype subgroups was discovered via a public database of known gene interactions: protein-protein, genetic, and mRNA coexpression. Additional annotation resources (Gene Ontology, Human Phenotype Ontology, TRANSFAC/JASPAR, and KEGG/WikiPathways) were utilized to assess functional and phenotypic enrichment patterns within subgroups.Results: Phenotypic analysis revealed high rates of complex facial dysmorphia in ID with comorbid autism. In contrast, neurodegenerative-like features were overrepresented in ID with epilepsy. Network analysis subsequently showed that gene groups divided according to clinical features of interest resulted in distinctive interaction clusters, with unique functional enrichments according to gene set.Conclusions: These data suggest that specific comorbid and secondary clinical features in ID are predictive of underlying genotype. In summary, ID form unique clusters, which are comprised of individual conditions with remarkable genotypic and phenotypic overlap

    Hydra -- A Federated Data Repository over NDN

    Full text link
    Today's big data science communities manage their data publication and replication at the application layer. These communities utilize myriad mechanisms to publish, discover, and retrieve datasets - the result is an ecosystem of either centralized, or otherwise a collection of ad-hoc data repositories. Publishing datasets to centralized repositories can be process-intensive, and those repositories do not accept all datasets. The ad-hoc repositories are difficult to find and utilize due to differences in data names, metadata standards, and access methods. To address the problem of scientific data publication and storage, we have designed Hydra, a secure, distributed, and decentralized data repository made of a loose federation of storage servers (nodes) provided by user communities. Hydra runs over Named Data Networking (NDN) and utilizes the State Vector Sync (SVS) protocol that lets individual nodes maintain a "global view" of the system. Hydra provides a scalable and resilient data retrieval service, with data distribution scalability achieved via NDN's built-in data anycast and in-network caching and resiliency against individual server failures through automated failure detection and maintaining a specific degree of replication. Hydra utilizes "Favor", a locally calculated numerical value to decide which nodes will replicate a file. Finally, Hydra utilizes data-centric security for data publication and node authentication. Hydra uses a Network Operation Center (NOC) to bootstrap trust in Hydra nodes and data publishers. The NOC distributes user and node certificates and performs the proof-of-possession challenges. This technical report serves as the reference for Hydra. It outlines the design decisions, the rationale behind them, the functional modules, and the protocol specifications

    Functional genomics of drought stress response in rice: transcript mapping of annotated unigenes of an indica rice (Oryza sativa L. cv. Nagina 22)

    Get PDF
    Rice being one of the widely cultivated cereals across diverse agroecological systems, is prone to high yield losses due to recurring droughts. In India, drought is a major constraint of rice production and accounts for as much as 15% of yield losses during some years. Conventional plant breeding techniques though cumbersome and time-consuming, have been immensely helpful in releasing drought-tolerant varieties. However, this is not adequate to cope up with the future demand for rice, as drought seems to spread to more regions and seasons across the country. Understanding the genes that govern rice plant architecture and response to drought stress is urgently needed to enhance breeding rice with improved drought tolerance. In order to identify genes associated with drought stress response and their temporal and spatial regulation, we took the genomic approach. By generating a large set of expressed sequence tags (ESTs) from cDNA libraries of drought-stressed seedlings and transcript profiling, we identified 589 genes presumed to be involved in drought stress. These 5814 ESTs are assembled into 2094 contigs and localized onto chromosome arms. We present here the physical map of the 2094 unigene set along with 589 annotated putative stress responsive genes of rice. Further, using ESTs, a few of drought quantitative trait loci (QTLs) have been dissected and putative candidate genes identified. This will be useful to rice researchers as ready reference source for breeding through developing candidate gene markers, molecular dissection of QTLs associated with drought stress and map-based cloning

    Assessing pooled BAC and whole genome shotgun strategies for assembly of complex genomes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>We investigate if pooling BAC clones and sequencing the pools can provide for more accurate assembly of genome sequences than the "whole genome shotgun" (WGS) approach. Furthermore, we quantify this accuracy increase. We compare the pooled BAC and WGS approaches using <it>in silico </it>simulations. Standard measures of assembly quality focus on assembly size and fragmentation, which are desirable for large whole genome assemblies. We propose additional measures enabling easy and visual comparison of assembly quality, such as rearrangements and redundant sequence content, relative to the known target sequence.</p> <p>Results</p> <p>The best assembly quality scores were obtained using 454 coverage of 15× linear and 5× paired (3kb insert size) reads (15L-5P) on <it>Arabidopsis</it>. This regime gave similarly good results on four additional plant genomes of very different GC and repeat contents. BAC pooling improved assembly scores over WGS assembly, coverage and redundancy scores improving the most.</p> <p>Conclusions</p> <p>BAC pooling works better than WGS, however, both require a physical map to order the scaffolds. Pool sizes up to 12Mbp work well, suggesting this pooling density to be effective in medium-scale re-sequencing applications such as targeted sequencing of QTL intervals for candidate gene discovery. Assuming the current Roche/454 Titanium sequencing limitations, a 12 Mbp region could be re-sequenced with a full plate of linear reads and a half plate of paired-end reads, yielding 15L-5P coverage after read pre-processing. Our simulation suggests that massively over-sequencing may not improve accuracy. Our scoring measures can be used generally to evaluate and compare results of simulated genome assemblies.</p

    Sequencing papaya X and Yh chromosomes reveals molecular basis of incipient sex chromosome evolution

    Get PDF
    Sex determination in papaya is controlled by a recently evolved XY chromosome pair, with two slightly different Y chromosomes controlling the development of males (Y) and hermaphrodites (Y(h)). To study the events of early sex chromosome evolution, we sequenced the hermaphrodite-specific region of the Y(h) chromosome (HSY) and its X counterpart, yielding an 8.1-megabase (Mb) HSY pseudomolecule, and a 3.5-Mb sequence for the corresponding X region. The HSY is larger than the X region, mostly due to retrotransposon insertions. The papaya HSY differs from the X region by two large-scale inversions, the first of which likely caused the recombination suppression between the X and Y(h) chromosomes, followed by numerous additional chromosomal rearrangements. Altogether, including the X and/or HSY regions, 124 transcription units were annotated, including 50 functional pairs present in both the X and HSY. Ten HSY genes had functional homologs elsewhere in the papaya autosomal regions, suggesting movement of genes onto the HSY, whereas the X region had none. Sequence divergence between 70 transcripts shared by the X and HSY revealed two evolutionary strata in the X chromosome, corresponding to the two inversions on the HSY, the older of which evolved about 7.0 million years ago. Gene content differences between the HSY and X are greatest in the older stratum, whereas the gene content and order of the collinear regions are identical. Our findings support theoretical models of early sex chromosome evolution

    Modes of Gene Duplication Contribute Differently to Genetic Novelty and Redundancy, but Show Parallels across Divergent Angiosperms

    Get PDF
    BACKGROUND: Both single gene and whole genome duplications (WGD) have recurred in angiosperm evolution. However, the evolutionary effects of different modes of gene duplication, especially regarding their contributions to genetic novelty or redundancy, have been inadequately explored. RESULTS: In Arabidopsis thaliana and Oryza sativa (rice), species that deeply sample botanical diversity and for which expression data are available from a wide range of tissues and physiological conditions, we have compared expression divergence between genes duplicated by six different mechanisms (WGD, tandem, proximal, DNA based transposed, retrotransposed and dispersed), and between positional orthologs. Both neo-functionalization and genetic redundancy appear to contribute to retention of duplicate genes. Genes resulting from WGD and tandem duplications diverge slowest in both coding sequences and gene expression, and contribute most to genetic redundancy, while other duplication modes contribute more to evolutionary novelty. WGD duplicates may more frequently be retained due to dosage amplification, while inferred transposon mediated gene duplications tend to reduce gene expression levels. The extent of expression divergence between duplicates is discernibly related to duplication modes, different WGD events, amino acid divergence, and putatively neutral divergence (time), but the contribution of each factor is heterogeneous among duplication modes. Gene loss may retard inter-species expression divergence. Members of different gene families may have non-random patterns of origin that are similar in Arabidopsis and rice, suggesting the action of pan-taxon principles of molecular evolution. CONCLUSION: Gene duplication modes differ in contribution to genetic novelty and redundancy, but show some parallels in taxa separated by hundreds of millions of years of evolution

    21st Century (Q1-Q2) Bio-Computing: Systems, Algorithms, Data, Science

    No full text
    Dr. Feltus is Professor of Genetics and Biochemistry at Clemson and works on research in bioinformatics, high-performance computing, cyberinfrastructure, network biology, genome assembly, systems genetics, paleogenomics, and bioenergy feedstock genetics. Feltus is also CEO of Allele Systems LLC, Core Faculty in the CU-MUSC Biomedical Data Science and Informatics (BDSI) program, member of the Center for Human Genetics, and serves on the Internet2 Board of Trustees. For more information on Dr. Feltus and his visit see the press release
    corecore