79 research outputs found

    nuID: a universal naming scheme of oligonucleotides for Illumina, Affymetrix, and other microarrays

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Oligonucleotide probes that are sequence identical may have different identifiers between manufacturers and even between different versions of the same company's microarray; and sometimes the same identifier is reused and represents a completely different oligonucleotide, resulting in ambiguity and potentially mis-identification of the genes hybridizing to that probe.</p> <p>Results</p> <p>We have devised a unique, non-degenerate encoding scheme that can be used as a universal representation to identify an oligonucleotide across manufacturers. We have named the encoded representation 'nuID', for nucleotide universal identifier. Inspired by the fact that the raw sequence of the oligonucleotide is the true definition of identity for a probe, the encoding algorithm uniquely and non-degenerately transforms the sequence itself into a compact identifier (a lossless compression). In addition, we added a redundancy check (checksum) to validate the integrity of the identifier. These two steps, encoding plus checksum, result in an nuID, which is a unique, non-degenerate, permanent, robust and efficient representation of the probe sequence. For commercial applications that require the sequence identity to be confidential, we have an encryption schema for nuID. We demonstrate the utility of nuIDs for the annotation of Illumina microarrays, and we believe it has universal applicability as a source-independent naming convention for oligomers.</p> <p>Reviewers</p> <p>This article was reviewed by Itai Yanai, Rong Chen (nominated by Mark Gerstein), and Gregory Schuler (nominated by David Lipman).</p

    A Comprehensive Infrastructure for Big Data in Cancer Research: Accelerating Cancer Research and Precision Medicine

    Get PDF
    Advancements in next-generation sequencing and other -omics technologies are accelerating the detailed molecular characterization of individual patient tumors, and driving the evolution of precision medicine. Cancer is no longer considered a single disease, but rather, a diverse array of diseases wherein each patient has a unique collection of germline variants and somatic mutations. Molecular profiling of patient-derived samples has led to a data explosion that could help us understand the contributions of environment and germline to risk, therapeutic response, and outcome. To maximize the value of these data, an interdisciplinary approach is paramount. The National Cancer Institute (NCI) has initiated multiple projects to characterize tumor samples using multi-omic approaches. These projects harness the expertise of clinicians, biologists, computer scientists, and software engineers to investigate cancer biology and therapeutic response in multidisciplinary teams. Petabytes of cancer genomic, transcriptomic, epigenomic, proteomic, and imaging data have been generated by these projects. To address the data analysis challenges associated with these large datasets, the NCI has sponsored the development of the Genomic Data Commons (GDC) and three Cloud Resources. The GDC ensures data and metadata quality, ingests and harmonizes genomic data, and securely redistributes the data. During its pilot phase, the Cloud Resources tested multiple cloud-based approaches for enhancing data access, collaboration, computational scalability, resource democratization, and reproducibility. These NCI-led efforts are continuously being refined to better support open data practices and precision oncology, and to serve as building blocks of the NCI Cancer Research Data Commons

    A collection of bioconductor methods to visualize gene-list annotations

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Gene-list annotations are critical for researchers to explore the complex relationships between genes and functionalities. Currently, the annotations of a gene list are usually summarized by a table or a barplot. As such, potentially biologically important complexities such as one gene belonging to multiple annotation categories are difficult to extract. We have devised explicit and efficient visualization methods that provide intuitive methods for interrogating the intrinsic connections between biological categories and genes.</p> <p>Findings</p> <p>We have constructed a data model and now present two novel methods in a Bioconductor package, "GeneAnswers", to simultaneously visualize genes, concepts (a.k.a. annotation categories), and concept-gene connections (a.k.a. annotations): the "Concept-and-Gene Network" and the "Concept-and-Gene Cross Tabulation". These methods have been tested and validated with microarray-derived gene lists.</p> <p>Conclusions</p> <p>These new visualization methods can effectively present annotations using Gene Ontology, Disease Ontology, or any other user-defined gene annotations that have been pre-associated with an organism's genome by human curation, automated pipelines, or a combination of the two. The gene-annotation data model and associated methods are available in the Bioconductor package called "GeneAnswers " described in this publication.</p

    dictyBase, the model organism database for Dictyostelium discoideum

    Get PDF
    dictyBase () is the model organism database (MOD) for the social amoeba Dictyostelium discoideum. The unique biology and phylogenetic position of Dictyostelium offer a great opportunity to gain knowledge of processes not characterized in other organisms. The recent completion of the 34 MB genome sequence, together with the sizable scientific literature using Dictyostelium as a research organism, provided the necessary tools to create a well-annotated genome. dictyBase has leveraged software developed by the Saccharomyces Genome Database and the Generic Model Organism Database project. This has reduced the time required to develop a full-featured MOD and greatly facilitated our ability to focus on annotation and providing new functionality. We hope that manual curation of the Dictyostelium genome will facilitate the annotation of other genomes

    Xanthusbase: adapting wikipedia principles to a model organism database

    Get PDF
    xanthusBase () is the official model organism database (MOD) for the social bacterium Myxococcus xanthus. In many respects, M.xanthus represents the pioneer model organism (MO) for studying the genetic, biochemical, and mechanistic basis of prokaryotic multicellularity, a topic that has garnered considerable attention due to the significance of biofilms in both basic and applied microbiology research. To facilitate its utility, the design of xanthusBase incorporates open-source software, leveraging the cumulative experience made available through the Generic Model Organism Database (GMOD) project, MediaWiki (), and dictyBase (), to create a MOD that is both highly useful and easily navigable. In addition, we have incorporated a unique Wikipedia-style curation model which exploits the internet's inherent interactivity, thus enabling M.xanthus and other myxobacterial researchers to contribute directly toward the ongoing genome annotation

    Annotating the human genome with Disease Ontology

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The human genome has been extensively annotated with Gene Ontology for biological functions, but minimally computationally annotated for diseases.</p> <p>Results</p> <p>We used the Unified Medical Language System (UMLS) MetaMap Transfer tool (MMTx) to discover gene-disease relationships from the GeneRIF database. We utilized a comprehensive subset of UMLS, which is disease-focused and structured as a directed acyclic graph (the Disease Ontology), to filter and interpret results from MMTx. The results were validated against the Homayouni gene collection using recall and precision measurements. We compared our results with the widely used Online Mendelian Inheritance in Man (OMIM) annotations.</p> <p>Conclusion</p> <p>The validation data set suggests a 91% recall rate and 97% precision rate of disease annotation using GeneRIF, in contrast with a 22% recall and 98% precision using OMIM. Our thesaurus-based approach allows for comparisons to be made between disease containing databases and allows for increased accuracy in disease identification through synonym matching. The much higher recall rate of our approach demonstrates that annotating human genome with Disease Ontology and GeneRIF for diseases dramatically increases the coverage of the disease annotation of human genome.</p

    dictyBase—a Dictyostelium bioinformatics resource update

    Get PDF
    dictyBase (http://dictybase.org) is the model organism database for Dictyostelium discoideum. It houses the complete genome sequence, ESTs and the entire body of literature relevant to Dictyostelium. This information is curated to provide accurate gene models and functional annotations, with the goal of fully annotating the genome. This dictyBase update describes the annotations and features implemented since 2006, including improved strain and phenotype representation, integration of predicted transcriptional regulatory elements, protein domain information, biochemical pathways, improved searching and a wiki tool that allows members of the research community to provide annotations

    Access to COVID-19 testing by individuals with housing insecurity during the early days of the COVID-19 pandemic in the United States: a scoping review

    Get PDF
    Introduction: The COVID-19 pandemic focused attention on healthcare disparities and inequities faced by individuals within marginalized and structurally disadvantaged groups in the United States. These individuals bore the heaviest burden across this pandemic as they faced increased risk of infection and difficulty in accessing testing and medical care. Individuals experiencing housing insecurity are a particularly vulnerable population given the additional barriers they face. In this scoping review, we identify some of the barriers this high-risk group experienced during the early days of the pandemic and assess novel solutions to overcome these barriers. Methods: A scoping review was performed following PRISMA-Sc guidelines looking for studies focusing on COVID-19 testing among individuals experiencing housing insecurity. Barriers as well as solutions to barriers were identified as applicable and summarized using qualitative methods, highlighting particular ways that proved effective in facilitating access to testing access and delivery. Results: Ultimately, 42 studies were included in the scoping review, with 143 barriers grouped into four categories: lack of cultural understanding, systemic racism, and stigma; medical care cost, insurance, and logistics; immigration policies, language, and fear of deportation; and other. Out of these 42 studies, 30 of these studies also suggested solutions to address them. Conclusion: A paucity of studies have analyzed COVID-19 testing barriers among those experiencing housing insecurity, and this is even more pronounced in terms of solutions to address those barriers. Expanding resources and supporting investigators within this space is necessary to ensure equitable healthcare delivery

    RADx-UP Testing Core: Access to COVID-19 Diagnostics in Community-Engaged Research with Underserved Populations

    Get PDF
    Research on the COVID-19 pandemic revealed a disproportionate burden of COVID-19 infection and death among underserved populations and exposed low rates of SARS-CoV-2 testing in these communities. A landmark National Institutes of Health (NIH) funding initiative, the Rapid Acceleration of Diagnostics-Underserved Populations (RADx-UP) program, was developed to address the research gap in understanding the adoption of COVID-19 testing in underserved populations. This program is the single largest investment in health disparities and community-engaged research in the history of the NIH. The RADx-UP Testing Core (TC) provides community-based investigators with essential scientific expertise and guidance on COVID-19 diagnostics. This commentary describes the first 2 years of the TC's experience, highlighting the challenges faced and insights gained to safely and effectively deploy large-scale diagnostics for community-initiated research in underserved populations during a pandemic. The success of RADx-UP shows that community-based research to increase access and uptake of testing among underserved populations can be accomplished during a pandemic with tools, resources, and multidisciplinary expertise provided by a centralized testing-specific coordinating center. We developed adaptive tools to support individual testing strategies and frameworks for these diverse studies and ensured continuous monitoring of testing strategies and use of study data. In a rapidly evolving setting of tremendous uncertainty, the TC provided essential and real-time technical expertise to support safe, effective, and adaptive testing. The lessons learned go beyond this pandemic and can serve as a framework for rapid deployment of testing in response to future crises, especially when populations are affected inequitably
    corecore