62 research outputs found
Simultaneous genotyping and species identification using hybridization pattern recognition analysis of generic Mycobacterium DNA arrays
High-density oligonucleotide arrays can be used to rapidly examine large amounts of DNA sequence in a high throughput manner. An array designed to determine the specific nucleotide sequence of 705 bp of the rpoB gene of Mycobacterium tuberculosis accurately detected rifampin resistance associated with mutations of 44 clinical isolates of M. tuberculosis. The nucleotide sequence diversity in 121 Mycobacterial isolates (comprised of 10 species) was examined by both conventional dideoxynucleotide sequencing of the rpoB and 165 genes and by analysis of the rpoB oligonucleotide array hybridization patterns. Species identification for each of the isolates was similar irrespective of whether 16S sequence, rpoB sequence, or the pattern of rpoB hybridization was used. However, for several species, the number of alleles in the 16S and rpoB gene sequences provided discordant estimates of the genetic diversity within a species. In addition to confirming the array's intended utility for sequencing the region of M. tuberculosis that confers rifampin resistance, this work demonstrates that this array can identify the species of nontuberculous Mycobacteria. This demonstrates the general point that DNA microarrays that sequence important genomic regions (such as drug resistance or pathogenicity islands) can simultaneously identify species and provide some insight into the organism's population structure
Transcriptional landscape of the human and fly genomes: Nonlinear and multifunctional modular model of transcriptomes
Regions of the genome not coding for proteins or not involved in cis-acting regulatory activities are frequently viewed as lacking in functional value. However, a number of recent large-scale studies have revealed significant regulated transcription of unannotated portions of a variety of plant and animal genomes, allowing a new appreciation of the widespread transcription of large portions of the genome. High-resolution mapping of the sites of transcription of the human and fly genomes has provided an alternative picture of the extent and organization of transcription and has offered insights for biological functions of some of the newly identified unannotated transcripts. Considerable portions of the unannotated transcription observed are developmental or cell-type-specific parts of protein-coding transcripts, often serving as novel, alternative 5′ transcriptional start sites. These distal 5′ portions are often situated at significant distances from the annotated gene and alternatively join with or ignore portions of other intervening genes to comprise novel unannotated protein-coding transcripts. These data support an interlaced model of the genome in which many regions serve multifunctional purposes and are highly modular in their utilization. This model illustrates the underappreciated organizational complexity of the genome and one of the functional roles of transcription from unannotated portions of the genome. Copyright 2006, Cold Spring Harbor Laboratory Press © 2006 Cold Spring Harbor Laboratory Press
From Generalization to Precision: Exploring SAM for Tool Segmentation in Surgical Environments
Purpose: Accurate tool segmentation is essential in computer-aided
procedures. However, this task conveys challenges due to artifacts' presence
and the limited training data in medical scenarios. Methods that generalize to
unseen data represent an interesting venue, where zero-shot segmentation
presents an option to account for data limitation. Initial exploratory works
with the Segment Anything Model (SAM) show that bounding-box-based prompting
presents notable zero-short generalization. However, point-based prompting
leads to a degraded performance that further deteriorates under image
corruption. We argue that SAM drastically over-segment images with high
corruption levels, resulting in degraded performance when only a single
segmentation mask is considered, while the combination of the masks overlapping
the object of interest generates an accurate prediction. Method: We use SAM to
generate the over-segmented prediction of endoscopic frames. Then, we employ
the ground-truth tool mask to analyze the results of SAM when the best single
mask is selected as prediction and when all the individual masks overlapping
the object of interest are combined to obtain the final predicted mask. We
analyze the Endovis18 and Endovis17 instrument segmentation datasets using
synthetic corruptions of various strengths and an In-House dataset featuring
counterfactually created real-world corruptions. Results: Combining the
over-segmented masks contributes to improvements in the IoU. Furthermore,
selecting the best single segmentation presents a competitive IoU score for
clean images. Conclusions: Combined SAM predictions present improved results
and robustness up to a certain corruption level. However, appropriate prompting
strategies are fundamental for implementing these models in the medical domain
Prominent use of distal 5′ transcription start sites and discovery of a large number of additional exons in ENCODE regions
This report presents systematic empirical annotation of transcript products from 399 annotated protein-coding loci across the 1% of the human genome targeted by the Encyclopedia of DNA elements (ENCODE) pilot project using a combination of 5′ rapid amplification of cDNA ends (RACE) and high-density resolution tiling arrays. We identified previously unannotated and often tissue- or cell-line-specific transcribed fragments (RACEfrags), both 5′ distal to the annotated 5′ terminus and internal to the annotated gene bounds for the vast majority (81.5%) of the tested genes. Half of the distal RACEfrags span large segments of genomic sequences away from the main portion of the coding transcript and often overlap with the upstream-annotated gene(s). Notably, at least 20% of the resultant novel transcripts have changes in their open reading frames (ORFs), most of them fusing ORFs of adjacent transcripts. A significant fraction of distal RACEfrags show expression levels comparable to those of known exons of the same locus, suggesting that they are not part of very minority splice forms. These results have significant implications concerning (1) our current understanding of the architecture of protein-coding genes; (2) our views on locations of regulatory regions in the genome; and (3) the interpretation of sequence polymorphisms mapping to regions hitherto considered to be "noncoding," ultimately relating to the identification of disease-related sequence alterations. ©2007 by Cold Spring Harbor Laboratory Press
Exploiting Large Neuroimaging Datasets to Create Connectome-Constrained Approaches for more Robust, Efficient, and Adaptable Artificial Intelligence
Despite the progress in deep learning networks, efficient learning at the
edge (enabling adaptable, low-complexity machine learning solutions) remains a
critical need for defense and commercial applications. We envision a pipeline
to utilize large neuroimaging datasets, including maps of the brain which
capture neuron and synapse connectivity, to improve machine learning
approaches. We have pursued different approaches within this pipeline
structure. First, as a demonstration of data-driven discovery, the team has
developed a technique for discovery of repeated subcircuits, or motifs. These
were incorporated into a neural architecture search approach to evolve network
architectures. Second, we have conducted analysis of the heading direction
circuit in the fruit fly, which performs fusion of visual and angular velocity
features, to explore augmenting existing computational models with new insight.
Our team discovered a novel pattern of connectivity, implemented a new model,
and demonstrated sensor fusion on a robotic platform. Third, the team analyzed
circuitry for memory formation in the fruit fly connectome, enabling the design
of a novel generative replay approach. Finally, the team has begun analysis of
connectivity in mammalian cortex to explore potential improvements to
transformer networks. These constraints increased network robustness on the
most challenging examples in the CIFAR-10-C computer vision robustness
benchmark task, while reducing learnable attention parameters by over an order
of magnitude. Taken together, these results demonstrate multiple potential
approaches to utilize insight from neural systems for developing robust and
efficient machine learning techniques.Comment: 11 pages, 4 figure
Management, Analyses, and Distribution of the MaizeCODE Data on the Cloud
MaizeCODE is a project aimed at identifying and analyzing functional elements in the maize genome. In its initial phase, MaizeCODE assayed up to five tissues from four maize strains (B73, NC350, W22, TIL11) by RNA-Seq, Chip-Seq, RAMPAGE, and small RNA sequencing. To facilitate reproducible science and provide both human and machine access to the MaizeCODE data, we enhanced SciApps, a cloud-based portal, for analysis and distribution of both raw data and analysis results. Based on the SciApps workflow platform, we generated new components to support the complete cycle of MaizeCODE data management. These include publicly accessible scientific workflows for the reproducible and shareable analysis of various functional data, a RESTful API for batch processing and distribution of data and metadata, a searchable data page that lists each MaizeCODE experiment as a reproducible workflow, and integrated JBrowse genome browser tracks linked with workflows and metadata. The SciApps portal is a flexible platform that allows the integration of new analysis tools, workflows, and genomic data from multiple projects. Through metadata and a ready-to-compute cloud-based platform, the portal experience improves access to the MaizeCODE data and facilitates its analysis
Author Correction: Expanded encyclopaedias of DNA elements in the human and mouse genomes
Online Correction for: https://doi.org/10.1038/s41586-020-2493-4 | Erratum for https://bura.brunel.ac.uk/handle/2438/21299In the version of this article initially published, two members of the ENCODE Project Consortium were missing from the author list. Rizi Ai (Department of Chemistry and Biochemistry, University of California, San Diego, La Jolla, CA, USA) and Shantao Li (Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA) are now included in the author list. These errors have been corrected in the online version of the article : 'Expanded encyclopaedias of DNA elements in the human and mouse genomes'.https://www.nature.com/articles/s41586-021-04226-3https://www.nature.com/articles/s41586-021-04226-
Landscape of transcription in human cells
Eukaryotic cells make many types of primary and processed RNAs that are found either in specific subcellular compartments or throughout the cells. A complete catalogue of these RNAs is not yet available and their characteristic subcellular localizations are also poorly understood. Because RNA represents the direct output of the genetic information encoded by genomes and a significant proportion of a cell's regulatory capabilities are focused on its synthesis, processing, transport, modification and translation, the generation of such a catalogue is crucial for understanding genome function. Here we report evidence that three-quarters of the human genome is capable of being transcribed, as well as observations about the range and levels of expression, localization, processing fates, regulatory regions and modifications of almost all currently annotated and thousands of previously unannotated RNAs. These observations, taken together, prompt a redefinition of the concept of a gene
Perspectives on ENCODE
Supplementary information is available for this paper at https://doi.org/10.1038/s41586-020- 2449-8.© 2020, The Author(s). The Encylopedia of DNA Elements (ENCODE) Project launched in 2003 with the long-term goal of developing a comprehensive map of functional elements in the human genome. These included genes, biochemical regions associated with gene regulation (for example, transcription factor binding sites, open chromatin, and histone marks) and transcript isoforms. The marks serve as sites for candidate cis-regulatory elements (cCREs) that may serve functional roles in regulating gene expression1. The project has been extended to model organisms, particularly the mouse. In the third phase of ENCODE, nearly a million and more than 300,000 cCRE annotations have been generated for human and mouse, respectively, and these have provided a valuable resource for the scientific community.NIH grants: U01HG007019, U01HG007033, U01HG007036, U01HG007037, U41HG006992, U41HG006993, U41HG006994, U41HG006995, U41HG006996, U41HG006997, U41HG006998, U41HG006999, U41HG007000, U41HG007001, U41HG007002, U41HG007003, U41HG007234, U54HG006991, U54HG006997, U54HG006998, U54HG007004, U54HG007005, U54HG007010 and UM1HG009442
A user's guide to the Encyclopedia of DNA elements (ENCODE)
The mission of the Encyclopedia of DNA Elements (ENCODE) Project is to enable the scientific and medical communities to interpret the human genome sequence and apply it to understand human biology and improve health. The ENCODE Consortium is integrating multiple technologies and approaches in a collective effort to discover and define the functional elements encoded in the human genome, including genes, transcripts, and transcriptional regulatory regions, together with their attendant chromatin states and DNA methylation patterns. In the process, standards to ensure high-quality data have been implemented, and novel algorithms have been developed to facilitate analysis. Data and derived results are made available through a freely accessible database. Here we provide an overview of the project and the resources it is generating and illustrate the application of ENCODE data to interpret the human genome
- …
