46 research outputs found
Simultaneous genotyping and species identification using hybridization pattern recognition analysis of generic Mycobacterium DNA arrays
High-density oligonucleotide arrays can be used to rapidly examine large amounts of DNA sequence in a high throughput manner. An array designed to determine the specific nucleotide sequence of 705 bp of the rpoB gene of Mycobacterium tuberculosis accurately detected rifampin resistance associated with mutations of 44 clinical isolates of M. tuberculosis. The nucleotide sequence diversity in 121 Mycobacterial isolates (comprised of 10 species) was examined by both conventional dideoxynucleotide sequencing of the rpoB and 165 genes and by analysis of the rpoB oligonucleotide array hybridization patterns. Species identification for each of the isolates was similar irrespective of whether 16S sequence, rpoB sequence, or the pattern of rpoB hybridization was used. However, for several species, the number of alleles in the 16S and rpoB gene sequences provided discordant estimates of the genetic diversity within a species. In addition to confirming the array's intended utility for sequencing the region of M. tuberculosis that confers rifampin resistance, this work demonstrates that this array can identify the species of nontuberculous Mycobacteria. This demonstrates the general point that DNA microarrays that sequence important genomic regions (such as drug resistance or pathogenicity islands) can simultaneously identify species and provide some insight into the organism's population structure
Transcriptional landscape of the human and fly genomes: Nonlinear and multifunctional modular model of transcriptomes
Regions of the genome not coding for proteins or not involved in cis-acting regulatory activities are frequently viewed as lacking in functional value. However, a number of recent large-scale studies have revealed significant regulated transcription of unannotated portions of a variety of plant and animal genomes, allowing a new appreciation of the widespread transcription of large portions of the genome. High-resolution mapping of the sites of transcription of the human and fly genomes has provided an alternative picture of the extent and organization of transcription and has offered insights for biological functions of some of the newly identified unannotated transcripts. Considerable portions of the unannotated transcription observed are developmental or cell-type-specific parts of protein-coding transcripts, often serving as novel, alternative 5′ transcriptional start sites. These distal 5′ portions are often situated at significant distances from the annotated gene and alternatively join with or ignore portions of other intervening genes to comprise novel unannotated protein-coding transcripts. These data support an interlaced model of the genome in which many regions serve multifunctional purposes and are highly modular in their utilization. This model illustrates the underappreciated organizational complexity of the genome and one of the functional roles of transcription from unannotated portions of the genome. Copyright 2006, Cold Spring Harbor Laboratory Press © 2006 Cold Spring Harbor Laboratory Press
Exploiting Large Neuroimaging Datasets to Create Connectome-Constrained Approaches for more Robust, Efficient, and Adaptable Artificial Intelligence
Despite the progress in deep learning networks, efficient learning at the
edge (enabling adaptable, low-complexity machine learning solutions) remains a
critical need for defense and commercial applications. We envision a pipeline
to utilize large neuroimaging datasets, including maps of the brain which
capture neuron and synapse connectivity, to improve machine learning
approaches. We have pursued different approaches within this pipeline
structure. First, as a demonstration of data-driven discovery, the team has
developed a technique for discovery of repeated subcircuits, or motifs. These
were incorporated into a neural architecture search approach to evolve network
architectures. Second, we have conducted analysis of the heading direction
circuit in the fruit fly, which performs fusion of visual and angular velocity
features, to explore augmenting existing computational models with new insight.
Our team discovered a novel pattern of connectivity, implemented a new model,
and demonstrated sensor fusion on a robotic platform. Third, the team analyzed
circuitry for memory formation in the fruit fly connectome, enabling the design
of a novel generative replay approach. Finally, the team has begun analysis of
connectivity in mammalian cortex to explore potential improvements to
transformer networks. These constraints increased network robustness on the
most challenging examples in the CIFAR-10-C computer vision robustness
benchmark task, while reducing learnable attention parameters by over an order
of magnitude. Taken together, these results demonstrate multiple potential
approaches to utilize insight from neural systems for developing robust and
efficient machine learning techniques.Comment: 11 pages, 4 figure
Management, Analyses, and Distribution of the MaizeCODE Data on the Cloud
MaizeCODE is a project aimed at identifying and analyzing functional elements in the maize genome. In its initial phase, MaizeCODE assayed up to five tissues from four maize strains (B73, NC350, W22, TIL11) by RNA-Seq, Chip-Seq, RAMPAGE, and small RNA sequencing. To facilitate reproducible science and provide both human and machine access to the MaizeCODE data, we enhanced SciApps, a cloud-based portal, for analysis and distribution of both raw data and analysis results. Based on the SciApps workflow platform, we generated new components to support the complete cycle of MaizeCODE data management. These include publicly accessible scientific workflows for the reproducible and shareable analysis of various functional data, a RESTful API for batch processing and distribution of data and metadata, a searchable data page that lists each MaizeCODE experiment as a reproducible workflow, and integrated JBrowse genome browser tracks linked with workflows and metadata. The SciApps portal is a flexible platform that allows the integration of new analysis tools, workflows, and genomic data from multiple projects. Through metadata and a ready-to-compute cloud-based platform, the portal experience improves access to the MaizeCODE data and facilitates its analysis
A User's Guide to the Encyclopedia of DNA Elements (ENCODE)
The mission of the Encyclopedia of DNA Elements (ENCODE) Project is to enable the scientific and medical communities to
interpret the human genome sequence and apply it to understand human biology and improve health. The ENCODE
Consortium is integrating multiple technologies and approaches in a collective effort to discover and define the functional
elements encoded in the human genome, including genes, transcripts, and transcriptional regulatory regions, together with
their attendant chromatin states and DNA methylation patterns. In the process, standards to ensure high-quality data have
been implemented, and novel algorithms have been developed to facilitate analysis. Data and derived results are made
available through a freely accessible database. Here we provide an overview of the project and the resources it is generating
and illustrate the application of ENCODE data to interpret the human genome.National Human Genome Research Institute (U.S.)National Institutes of Health (U.S.
Recommended from our members
Author Correction: Expanded encyclopaedias of DNA elements in the human and mouse genomes
Online Correction for: https://doi.org/10.1038/s41586-020-2493-4 | Erratum for https://bura.brunel.ac.uk/handle/2438/21299In the version of this article initially published, two members of the ENCODE Project Consortium were missing from the author list. Rizi Ai (Department of Chemistry and Biochemistry, University of California, San Diego, La Jolla, CA, USA) and Shantao Li (Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA) are now included in the author list. These errors have been corrected in the online version of the article : 'Expanded encyclopaedias of DNA elements in the human and mouse genomes'.https://www.nature.com/articles/s41586-021-04226-3https://www.nature.com/articles/s41586-021-04226-
Evidence for Transcript Networks Composed of Chimeric RNAs in Human Cells
The classic organization of a gene structure has followed the Jacob and Monod bacterial gene model proposed more than 50 years ago. Since then, empirical determinations of the complexity of the transcriptomes found in yeast to human has blurred the definition and physical boundaries of genes. Using multiple analysis approaches we have characterized individual gene boundaries mapping on human chromosomes 21 and 22. Analyses of the locations of the 5′ and 3′ transcriptional termini of 492 protein coding genes revealed that for 85% of these genes the boundaries extend beyond the current annotated termini, most often connecting with exons of transcripts from other well annotated genes. The biological and evolutionary importance of these chimeric transcripts is underscored by (1) the non-random interconnections of genes involved, (2) the greater phylogenetic depth of the genes involved in many chimeric interactions, (3) the coordination of the expression of connected genes and (4) the close in vivo and three dimensional proximity of the genomic regions being transcribed and contributing to parts of the chimeric RNAs. The non-random nature of the connection of the genes involved suggest that chimeric transcripts should not be studied in isolation, but together, as an RNA network
Recommended from our members
Perspectives on ENCODE
Supplementary information is available for this paper at https://doi.org/10.1038/s41586-020- 2449-8.© 2020, The Author(s). The Encylopedia of DNA Elements (ENCODE) Project launched in 2003 with the long-term goal of developing a comprehensive map of functional elements in the human genome. These included genes, biochemical regions associated with gene regulation (for example, transcription factor binding sites, open chromatin, and histone marks) and transcript isoforms. The marks serve as sites for candidate cis-regulatory elements (cCREs) that may serve functional roles in regulating gene expression1. The project has been extended to model organisms, particularly the mouse. In the third phase of ENCODE, nearly a million and more than 300,000 cCRE annotations have been generated for human and mouse, respectively, and these have provided a valuable resource for the scientific community.NIH grants: U01HG007019, U01HG007033, U01HG007036, U01HG007037, U41HG006992, U41HG006993, U41HG006994, U41HG006995, U41HG006996, U41HG006997, U41HG006998, U41HG006999, U41HG007000, U41HG007001, U41HG007002, U41HG007003, U41HG007234, U54HG006991, U54HG006997, U54HG006998, U54HG007004, U54HG007005, U54HG007010 and UM1HG009442
Comparative analysis of the transcriptome across distant species
The transcriptome is the readout of the genome. Identifying common features in it across distant species can reveal fundamental principles. To this end, the ENCODE and modENCODE consortia have generated large amounts of matched RNA-sequencing data for human, worm and fly. Uniform processing and comprehensive annotation of these data allow comparison across metazoan phyla, extending beyond earlier within-phylum transcriptome comparisons and revealing ancient, conserved features. Specifically, we discover co-expression modules shared across animals, many of which are enriched in developmental genes. Moreover, we use expression patterns to align the stages in worm and fly development and find a novel pairing between worm embryo and fly pupae, in addition to the embryo-to-embryo and larvae-to-larvae pairings. Furthermore, we find that the extent of non-canonical, non-coding transcription is similar in each organism, per base pair. Finally, we find in all three organisms that the gene-expression levels, both coding and non-coding, can be quantitatively predicted from chromatin features at the promoter using a 'universal model' based on a single set of organism-independent parameters
A user's guide to the Encyclopedia of DNA elements (ENCODE)
The mission of the Encyclopedia of DNA Elements (ENCODE) Project is to enable the scientific and medical communities to interpret the human genome sequence and apply it to understand human biology and improve health. The ENCODE Consortium is integrating multiple technologies and approaches in a collective effort to discover and define the functional elements encoded in the human genome, including genes, transcripts, and transcriptional regulatory regions, together with their attendant chromatin states and DNA methylation patterns. In the process, standards to ensure high-quality data have been implemented, and novel algorithms have been developed to facilitate analysis. Data and derived results are made available through a freely accessible database. Here we provide an overview of the project and the resources it is generating and illustrate the application of ENCODE data to interpret the human genome