70 research outputs found
Recommended from our members
A literature search tool for intelligent extraction of disease-associated genes
Objective: To extract disorder-associated genes from the scientific literature in PubMed with greater sensitivity for literature-based support than existing methods. Methods: We developed a PubMed query to retrieve disorder-related, original research articles. Then we applied a rule-based text-mining algorithm with keyword matching to extract target disorders, genes with significant results, and the type of study described by the article. Results: We compared our resulting candidate disorder genes and supporting references with existing databases. We demonstrated that our candidate gene set covers nearly all genes in manually curated databases, and that the references supporting the disorder–gene link are more extensive and accurate than other general purpose gene-to-disorder association databases. Conclusions: We implemented a novel publication search tool to find target articles, specifically focused on links between disorders and genotypes. Through comparison against gold-standard manually updated gene–disorder databases and comparison with automated databases of similar functionality we show that our tool can search through the entirety of PubMed to extract the main gene findings for human diseases rapidly and accurately
Detecting biological network organization and functional gene orthologs
SUMMARY: We developed a package TripletSearch to compute relationships within triplets of genes based on Roundup, an orthologous gene database containing >1500 genomes. These relationships, derived from the coevolution of genes, provide valuable information in the detection of biological network organization from the local to the system level, in the inference of protein functions and in the identification of functional orthologs. To run the computation, users need to provide the GI IDs of the genes of interest
Cost-Effective Cloud Computing: A Case Study Using the Comparative Genomics Tool, Roundup
Background Comparative genomics resources, such as ortholog detection tools and repositories are rapidly increasing in scale and complexity. Cloud computing is an emerging technological paradigm that enables researchers to dynamically build a dedicated virtual cluster and may represent a valuable alternative for large computational tools in bioinformatics. In the present manuscript, we optimize the computation of a large-scale comparative genomics resource—Roundup—using cloud computing, describe the proper operating principles required to achieve computational efficiency on the cloud, and detail important procedures for improving cost-effectiveness to ensure maximal computation at minimal costs. Methods Utilizing the comparative genomics tool, Roundup, as a case study, we computed orthologs among 902 fully sequenced genomes on Amazon's Elastic Compute Cloud. For managing the ortholog processes, we designed a strategy to deploy the web service, Elastic MapReduce, and maximize the use of the cloud while simultaneously minimizing costs. Specifically, we created a model to estimate cloud runtime based on the size and complexity of the genomes being compared that determines in advance the optimal order of the jobs to be submitted. Results We computed orthologous relationships for 245,323 genome-to-genome comparisons on Amazon's computing cloud, a computation that required just over 200 hours and cost $8,000 USD, at least 40% less than expected under a strategy in which genome comparisons were submitted to the cloud randomly with respect to runtime. Our cost savings projections were based on a model that not only demonstrates the optimal strategy for deploying RSD to the cloud, but also finds the optimal cluster size to minimize waste and maximize usage. Our cost-reduction model is readily adaptable for other comparative genomics tools and potentially of significant benefit to labs seeking to take advantage of the cloud as an alternative to local computing infrastructure
Recommended from our members
Roundup 2.0: Enabling Comparative Genomics for over 1800 Genomes
Summary: Roundup is an online database of gene orthologs for over 1800 genomes, including 226 Eukaryota, 1447 Bacteria, 113 Archaea, and 21 Viruses. Orthologs are inferred using the Reciprocal Smallest Distance algorithm. Users may query Roundup for single-linkage clusters of orthologous genes based on any group of genomes. Annotated query results may be viewed in a variety of ways including as clusters of orthologs and as phylogenetic profiles. Genomic results may be downloaded in formats suitable for functional as well as phylogenetic analysis, including the recent OrthoXML standard. In addition, gene IDs can be retrieved using FASTA sequence search. All orthology results and source code are freely available
Recommended from our members
The Potential of Accelerating Early Detection of Autism through Content Analysis of YouTube Videos
Abstract Autism is on the rise, with 1 in 88 children receiving a diagnosis in the United States, yet the process for diagnosis remains cumbersome and time consuming. Research has shown that home videos of children can help increase the accuracy of diagnosis. However the use of videos in the diagnostic process is uncommon. In the present study, we assessed the feasibility of applying a gold-standard diagnostic instrument to brief and unstructured home videos and tested whether video analysis can enable more rapid detection of the core features of autism outside of clinical environments. We collected 100 public videos from YouTube of children ages 1–15 with either a self-reported diagnosis of an ASD (N = 45) or not (N = 55). Four non-clinical raters independently scored all videos using one of the most widely adopted tools for behavioral diagnosis of autism, the Autism Diagnostic Observation Schedule-Generic (ADOS). The classification accuracy was 96.8%, with 94.1% sensitivity and 100% specificity, the inter-rater correlation for the behavioral domains on the ADOS was 0.88, and the diagnoses matched a trained clinician in all but 3 of 22 randomly selected video cases. Despite the diversity of videos and non-clinical raters, our results indicate that it is possible to achieve high classification accuracy, sensitivity, and specificity as well as clinically acceptable inter-rater reliability with nonclinical personnel. Our results also demonstrate the potential for video-based detection of autism in short, unstructured home videos and further suggests that at least a percentage of the effort associated with detection and monitoring of autism may be mobilized and moved outside of traditional clinical environments
Genotator: A disease-agnostic tool for genetic annotation of disease
<p>Abstract</p> <p>Background</p> <p>Disease-specific genetic information has been increasing at rapid rates as a consequence of recent improvements and massive cost reductions in sequencing technologies. Numerous systems designed to capture and organize this mounting sea of genetic data have emerged, but these resources differ dramatically in their disease coverage and genetic depth. With few exceptions, researchers must manually search a variety of sites to assemble a complete set of genetic evidence for a particular disease of interest, a process that is both time-consuming and error-prone.</p> <p>Methods</p> <p>We designed a real-time aggregation tool that provides both comprehensive coverage and reliable gene-to-disease rankings for any disease. Our tool, called Genotator, automatically integrates data from 11 externally accessible clinical genetics resources and uses these data in a straightforward formula to rank genes in order of disease relevance. We tested the accuracy of coverage of Genotator in three separate diseases for which there exist specialty curated databases, Autism Spectrum Disorder, Parkinson's Disease, and Alzheimer Disease. Genotator is freely available at <url>http://genotator.hms.harvard.edu</url>.</p> <p>Results</p> <p>Genotator demonstrated that most of the 11 selected databases contain unique information about the genetic composition of disease, with 2514 genes found in only one of the 11 databases. These findings confirm that the integration of these databases provides a more complete picture than would be possible from any one database alone. Genotator successfully identified at least 75% of the top ranked genes for all three of our use cases, including a 90% concordance with the top 40 ranked candidates for Alzheimer Disease.</p> <p>Conclusions</p> <p>As a meta-query engine, Genotator provides high coverage of both historical genetic research as well as recent advances in the genetic understanding of specific diseases. As such, Genotator provides a real-time aggregation of ranked data that remains current with the pace of research in the disease fields. Genotator's algorithm appropriately transforms query terms to match the input requirements of each targeted databases and accurately resolves named synonyms to ensure full coverage of the genetic results with official nomenclature. Genotator generates an excel-style output that is consistent across disease queries and readily importable to other applications.</p
Transcriptomic analysis across nasal, temporal, and macular regions of human neural retina and RPE/choroid by RNA-Seq
AbstractProper spatial differentiation of retinal cell types is necessary for normal human vision. Many retinal diseases, such as Best disease and male germ cell associated kinase (MAK)-associated retinitis pigmentosa, preferentially affect distinct topographic regions of the retina. While much is known about the distribution of cell types in the retina, the distribution of molecular components across the posterior pole of the eye has not been well-studied. To investigate regional difference in molecular composition of ocular tissues, we assessed differential gene expression across the temporal, macular, and nasal retina and retinal pigment epithelium (RPE)/choroid of human eyes using RNA-Seq. RNA from temporal, macular, and nasal retina and RPE/choroid from four human donor eyes was extracted, poly-A selected, fragmented, and sequenced as 100 bp read pairs. Digital read files were mapped to the human genome and analyzed for differential expression using the Tuxedo software suite. Retina and RPE/choroid samples were clearly distinguishable at the transcriptome level. Numerous transcription factors were differentially expressed between regions of the retina and RPE/choroid. Photoreceptor-specific genes were enriched in the peripheral samples, while ganglion cell and amacrine cell genes were enriched in the macula. Within the RPE/choroid, RPE-specific genes were upregulated at the periphery while endothelium associated genes were upregulated in the macula. Consistent with previous studies, BEST1 expression was lower in macular than extramacular regions. The MAK gene was expressed at lower levels in macula than in extramacular regions, but did not exhibit a significant difference between nasal and temporal retina. The regional molecular distinction is greatest between macula and periphery and decreases between different peripheral regions within a tissue. Datasets such as these can be used to prioritize candidate genes for possible involvement in retinal diseases with regional phenotypes
Protein Architecture of the Human Kinetochore Microtubule Attachment Site
Centromeric chromatin – spindle microtubule interactions mediated by kinetochores drive chromosome segregation. We have developed a two-color fluorescence light microscopy method that measures average label separation, Delta, at < 5 nm accuracy — to elucidate the protein architecture of human metaphase kinetochores. Delta analysis, when correlated with tension states of spindle-attached sister kinetochore pairs, provided information on mechanical properties of protein linkages within kinetochores. Treatment with taxol—which suppresses microtubule dynamics, eliminates tension at kinetochores, and activates the spindle checkpoint—resulted in specific large-scale changes in kinetochore architecture. Cumulatively, Delta analysis revealed compliant linkages close to the centromeric chromatin, suggests a model for how the KMN (KNL1/Mis12 complex/Ndc80 complex) network provides microtubule attachment and generates pulling forces from depolymerization, and reveals architectural changes induced by taxol treatment. The methods described here should also be applicable to other intermediate-scale biological machines in cells
The Herpes Simplex Virus-1 Transactivator Infected Cell Protein-4 Drives VEGF-A Dependent Neovascularization
Herpes simplex virus-1 (HSV-1) causes lifelong infection affecting between 50 and 90% of the global population. In addition to causing dermal lesions, HSV-1 is a leading cause of blindness resulting from recurrent corneal infection. Corneal disease is characterized by loss of corneal immunologic privilege and extensive neovascularization driven by vascular endothelial growth factor-A (VEGF-A). In the current study, we identify HSV-1 infected cells as the dominant source of VEGF-A during acute infection, and VEGF-A transcription did not require TLR signaling or MAP kinase activation. Rather than being an innate response to the pathogen, VEGF-A transcription was directly activated by the HSV-1 encoded immediate early transcription factor, ICP4. ICP4 bound the proximal human VEGF-A promoter and was sufficient to promote transcription. Transcriptional activation also required cis GC-box elements common to the VEGF-A promoter and HSV-1 early genes. Our results suggest that the neovascularization characteristic of ocular HSV-1 disease is a direct result of HSV-1's major transcriptional regulator, ICP4, and similarities between the VEGF-A promoter and those of HSV-1 early genes
An international effort towards developing standards for best practices in analysis, interpretation and reporting of clinical genome sequencing results in the CLARITY Challenge
There is tremendous potential for genome sequencing to improve clinical diagnosis and care once it becomes routinely accessible, but this will require formalizing research methods into clinical best practices in the areas of sequence data generation, analysis, interpretation and reporting. The CLARITY Challenge was designed to spur convergence in methods for diagnosing genetic disease starting from clinical case history and genome sequencing data. DNA samples were obtained from three families with heritable genetic disorders and genomic sequence data were donated by sequencing platform vendors. The challenge was to analyze and interpret these data with the goals of identifying disease-causing variants and reporting the findings in a clinically useful format. Participating contestant groups were solicited broadly, and an independent panel of judges evaluated their performance.
RESULTS:
A total of 30 international groups were engaged. The entries reveal a general convergence of practices on most elements of the analysis and interpretation process. However, even given this commonality of approach, only two groups identified the consensus candidate variants in all disease cases, demonstrating a need for consistent fine-tuning of the generally accepted methods. There was greater diversity of the final clinical report content and in the patient consenting process, demonstrating that these areas require additional exploration and standardization.
CONCLUSIONS:
The CLARITY Challenge provides a comprehensive assessment of current practices for using genome sequencing to diagnose and report genetic diseases. There is remarkable convergence in bioinformatic techniques, but medical interpretation and reporting are areas that require further development by many groups
- …