14 research outputs found

    Non-synonymous variations in cancer and their effects on the human proteome: workflow for NGS data biocuration and proteome-wide analysis of TCGA data

    Get PDF
    Background Next-generation sequencing (NGS) technologies have resulted in petabytes of scattered data, decentralized in archives, databases and sometimes in isolated hard-disks which are inaccessible for browsing and analysis. It is expected that curated secondary databases will help organize some of this Big Data thereby allowing users better navigate, search and compute on it. Results To address the above challenge, we have implemented a NGS biocuration workflow and are analyzing short read sequences and associated metadata from cancer patients to better understand the human variome. Curation of variation and other related information from control (normal tissue) and case (tumor) samples will provide comprehensive background information that can be used in genomic medicine research and application studies. Our approach includes a CloudBioLinux Virtual Machine which is used upstream of an integrated High-performance Integrated Virtual Environment (HIVE) that encapsulates Curated Short Read archive (CSR) and a proteome-wide variation effect analysis tool (SNVDis). As a proof-of-concept, we have curated and analyzed control and case breast cancer datasets from the NCI cancer genomics program - The Cancer Genome Atlas (TCGA). Our efforts include reviewing and recording in CSR available clinical information on patients, mapping of the reads to the reference followed by identification of non-synonymous Single Nucleotide Variations (nsSNVs) and integrating the data with tools that allow analysis of effect nsSNVs on the human proteome. Furthermore, we have also developed a novel phylogenetic analysis algorithm that uses SNV positions and can be used to classify the patient population. The workflow described here lays the foundation for analysis of short read sequence data to identify rare and novel SNVs that are not present in dbSNP and therefore provides a more comprehensive understanding of the human variome. Variation results for single genes as well as the entire study are available from the CSR website (hive.biochemistry.gwu.edu/tools/csr/SRARecords_Curated.php). Conclusions Availability of thousands of sequenced samples from patients provides a rich repository of sequence information that can be utilized to identify individual level SNVs and their effect on the human proteome beyond what the dbSNP database provides

    Detection of a Low Level and Heterogeneous B Cell Immune Response in Peripheral Blood of Acute Borreliosis Patients With High Throughput Sequencing

    Get PDF
    The molecular diagnosis of acute Borreliosis is complicated and better strategies to improve the diagnostic processes are warranted. High Throughput Sequencing (HTS) of human B cell repertoires after e.g., Dengue virus infection or influenza vaccination revealed antigen-associated “CDR3 signatures” which may have the potential to support diagnosis in infectious diseases. The human B cell immune response to Borrelia burgdorferi sensu lato—the causative agent of Borreliosis—has mainly been studied at the antibody level, while less attention has been given to the cellular part of the humoral immune response. There are indications that Borrelia actively influence the B cell immune response and that it is therefore not directly comparable to responses induced by other infections. The main goal of this study was to identify B cell features that could be used to support diagnosis of Borreliosis. Therefore, we characterized the B cell immune response in these patients by combining multicolor flow cytometry, single Borrelia-reactive B cell receptor (BCR) sequencing, and B cell repertoire deep sequencing. Our phenotyping experiments showed, that there is no significant difference between B cell subpopulations of acute Borreliosis patients and controls. BCR sequences from individual epitope-reactive B cells had little in common between each other. HTS showed, however, a higher complementarity determining region 3 (CDR3) amino acid (aa) sequence overlap between samples from different timepoints in patients as compared to controls. This indicates, that HTS is sensitive enough to detect ongoing B cell immune responses in these patients. Although each individual's repertoire was dominated by rather unique clones, clustering of bulk BCR repertoire sequences revealed a higher overlap of IgG BCR repertoire sequences between acute patients than controls. Even if we have identified a few Borrelia-associated CDR3aa sequences, they seem to be rather unique for each patient and therefore not suitable as biomarkers

    New science, synthesis, scholarship, and strategic vision for society

    Get PDF
    Harvard Forest LTER (HFR) is a two decade-strong, integrated research and educational program investigating responses of forest dynamics to natural and human disturbances and environmental changes over broad spatial and temporal scales. HFR engages \u3e30 researchers, \u3e200 graduate and undergraduate students, and dozens of institutions in research into fundamental and applied ecological questions of national and international relevance. Through LTER I–IV, HFR has added historical perspectives, expanded its scope to the New England region, integrated social, biological, and physical sciences, and developed education and outreach programs for K-12, undergraduate, and graduate students, along with managers, decision-makers, and media professionals

    Classification of Breast Cancer Patients and NCI-60 Cell Lines Using a Novel SNV-based Phylogenetic Analysis Method

    No full text
    Next-generation sequencing data can be mapped to a reference genome to identify singlenucleotide polymorphisms/variations (SNPs/SNVs; called SNPs hereafter). In theory, SNPs can be compared across several samples and the differences can be used to create phylogenetic trees depicting relatedness among the samples. However, in practice this is difficult because currently there is no stand-alone tool that takes SNP data directly as input and produces phylogenetic trees. In response to this need, PhyloSNP application was created with two analysis methods: 1) a quantitative method that creates the presence/absence matrix which can be directly used to generate phylogenetic trees or creates a tree from a shrunk genome alignment (includes additional bases surrounding the SNP position), and 2) a qualitative method that clusters samples based on the frequency of different bases found at a particular position. The algorithms were used to generate trees from Poliovirus, Burkholderia and human cancer genomics NGS datasets. The NCI-60 cell line has been extensively researched [1-3]. These lines were aligned and profiled from lllumina exome data using the High-performance Integrated Virtual Environment (HIVE) and phylogenetic trees were generated using PhyloSNP and FastTree based on the discovered SNPs. These trees were used to determine whether there were any noticeable relationships between the cancer lines. We noticed that several melanoma cell lines (ME:MDA_MB_435, ME:MDA_N, ME:UACC_62, and ME:M14) and breast (BR:HS578T, BR:T47D, and BR:MCF7) grouped together in both total variation and non-dbSNP variation trees with near 100% bootstrap support (10,000 replicates). Preliminary findings show that the observed phylogenetic clustering of these seven cell lines is at least partially due to some altered motif and common involvement of the TP53 gene to each of these cancer types. However, more in depth analysis is required to elucidate on these potential findings

    Dear Lawyer Bao: Everyday Problems, Legal Advice, and State Power in China

    No full text
    corecore