458 research outputs found

    Genomics and Privacy: Implications of the New Reality of Closed Data for the Field

    Get PDF
    Open source and open data have been driving forces in bioinformatics in the past. However, privacy concerns may soon change the landscape, limiting future access to important data sets, including personal genomics data. Here we survey this situation in some detail, describing, in particular, how the large scale of the data from personal genomic sequencing makes it especially hard to share data, exacerbating the privacy problem. We also go over various aspects of genomic privacy: first, there is basic identifiability of subjects having their genome sequenced. However, even for individuals who have consented to be identified, there is the prospect of very detailed future characterization of their genotype, which, unanticipated at the time of their consent, may be more personal and invasive than the release of their medical records. We go over various computational strategies for dealing with the issue of genomic privacy. One can ā€œsliceā€ and reformat datasets to allow them to be partially shared while securing the most private variants. This is particularly applicable to functional genomics information, which can be largely processed without variant information. For handling the most private data there are a number of legal and technological approachesā€”for example, modifying the informed consent procedure to acknowledge that privacy cannot be guaranteed, and/or employing a secure cloud computing environment. Cloud computing in particular may allow access to the data in a more controlled fashion than the current practice of downloading and computing on large datasets. Furthermore, it may be particularly advantageous for small labs, given that the burden of many privacy issues falls disproportionately on them in comparison to large corporations and genome centers. Finally, we discuss how education of future genetics researchers will be important, with curriculums emphasizing privacy and data security. However, teaching personal genomics with identifiable subjects in the university setting will, in turn, create additional privacy issues and social conundrums

    The role of disorder in interaction networks: a structural analysis

    Get PDF
    Recent studies have emphasized the value of including structural information into the topological analysis of protein networks. Here, we utilized structural information to investigate the role of intrinsic disorder in these networks. Hub proteins tend to be more disordered than other proteins (i.e. the proteome average); however, we find this only true for those with one or two binding interfaces (ā€˜single'-interface hubs). In contrast, the distribution of disordered residues in multi-interface hubs is indistinguishable from the overall proteome. Surprisingly, we find that the binding interfaces in single-interface hubs are highly structured, as is the case for multi-interface hubs. However, the binding partners of single-interface hubs tend to have a higher level of disorder than the proteome average, suggesting that their binding promiscuity is related to the disorder of their binding partners. In turn, the higher level of disorder of single-interface hubs can be partly explained by their tendency to bind to each other in a cascade. A good illustration of this trend can be found in signaling pathways and, more specifically, in kinase cascades. Finally, our findings have implications for the current controversy related to party and date-hubs

    IQSeq: Integrated Isoform Quantification Analysis Based on Next-Generation Sequencing

    Get PDF
    With the recent advances in high-throughput RNA sequencing (RNA-Seq), biologists are able to measure transcription with unprecedented precision. One problem that can now be tackled is that of isoform quantification: here one tries to reconstruct the abundances of isoforms of a gene. We have developed a statistical solution for this problem, based on analyzing a set of RNA-Seq reads, and a practical implementation, available from archive.gersteinlab.org/proj/rnaseq/IQSeq, in a tool we call IQSeq (Isoform Quantification in next-generation Sequencing). Here, we present theoretical results which IQSeq is based on, and then use both simulated and real datasets to illustrate various applications of the tool. In order to measure the accuracy of an isoform-quantification result, one would try to estimate the average variance of the estimated isoform abundances for each gene (based on resampling the RNA-seq reads), and IQSeq has a particularly fast algorithm (based on the Fisher Information Matrix) for calculating this, achieving a speedup of times compared to brute-force resampling. IQSeq also calculates an information theoretic measure of overall transcriptome complexity to describe isoform abundance for a whole experiment. IQSeq has many features that are particularly useful in RNA-Seq experimental design, allowing one to optimally model the integration of different sequencing technologies in a cost-effective way. In particular, the IQSeq formalism integrates the analysis of different sample (i.e. read) sets generated from different technologies within the same statistical framework. It also supports a generalized statistical partial-sample-generation function to model the sequencing process. This allows one to have a modular, ā€œplugin-ableā€ read-generation function to support the particularities of the many evolving sequencing technologies

    The ā€‹oestrogen receptor alpha-regulated lncRNA ā€‹NEAT1 is a critical modulator of prostate cancer

    Get PDF
    The androgen receptor (AR) plays a central role in establishing an oncogenic cascade that drives prostate cancer progression. Some prostate cancers escape androgen dependence and are often associated with an aggressive phenotype. The oestrogen receptor alpha (ERĪ±) is expressed in prostate cancers, independent of AR status. However, the role of ERĪ± remains elusive. Using a combination of chromatin immunoprecipitation (ChIP) and RNA-sequencing data, we identified an ERĪ±-specific non-coding transcriptome signature. Among putatively ERĪ±-regulated intergenic long non-coding RNAs (lncRNAs), we identified nuclear enriched abundant transcript 1 (NEAT1) as the most significantly overexpressed lncRNA in prostate cancer. Analysis of two large clinical cohorts also revealed that NEAT1 expression is associated with prostate cancer progression. Prostate cancer cells expressing high levels of NEAT1 were recalcitrant to androgen or AR antagonists. Finally, we provide evidence that NEAT1 drives oncogenic growth by altering the epigenetic landscape of target gene promoters to favour transcription

    Impact of the SPOP Mutant Subtype on the Interpretation of Clinical Parameters in Prostate Cancer.

    Get PDF
    Purpose: Molecular characterization of prostate cancer, including The Cancer Genome Atlas, has revealed distinct subtypes with underlying genomic alterations. One of these core subtypes, SPOP (speckle-type POZ protein) mutant prostate cancer, has previously only been identifiable via DNA sequencing, which has made the impact on prognosis and routinely used risk stratification parameters unclear. Methods: We have developed a novel gene expression signature, classifier (Subclass Predictor Based on Transcriptional Data), and decision tree to predict the SPOP mutant subclass from RNA gene expression data and classify common prostate cancer molecular subtypes. We then validated and further interrogated the association of prostate cancer molecular subtypes with pathologic and clinical outcomes in retrospective and prospective cohorts of 8,158 patients. Results: The subclass predictor based on transcriptional data model showed high sensitivity and specificity in multiple cohorts across both RNA sequencing and microarray gene expression platforms. We predicted approximately 8% to 9% of cases to be SPOP mutant from both retrospective and prospective cohorts. We found that the SPOP mutant subclass was associated with lower frequency of positive margins, extraprostatic extension, and seminal vesicle invasion at prostatectomy; however, SPOP mutant cancers were associated with higher pretreatment serum prostate-specific antigen (PSA). The association between SPOP mutant status and higher PSA level was validated in three independent cohorts. Despite high pretreatment PSA, the SPOP mutant subtype was associated with a favorable prognosis with improved metastasis-free survival, particularly in patients with high-risk preoperative PSA levels. Conclusion: Using a novel gene expression model and a decision tree algorithm to define prostate cancer molecular subclasses, we found that the SPOP mutant subclass is associated with higher preoperative PSA, less adverse pathologic features, and favorable prognosis. These findings suggest a paradigm in which the interpretation of common risk stratification parameters, particularly PSA, may be influenced by the underlying molecular subtype of prostate cancer

    ABEMUS: platform specific and data informed detection of somatic SNVs in cfDNA

    Get PDF
    MOTIVATION: The use of liquid biopsies for cancer patients enables the non-invasive tracking of treatment response and tumor dynamics through single or serial blood drawn tests. Next generation sequencing assays allow for the simultaneous interrogation of extended sets of somatic single nucleotide variants (SNVs) in circulating cell free DNA (cfDNA), a mixture of DNA molecules originating both from normal and tumor tissue cells. However, low circulating tumor DNA (ctDNA) fractions together with sequencing background noise and potential tumor heterogeneity challenge the ability to confidently call SNVs. RESULTS: We present a computational methodology, called Adaptive Base Error Model in Ultra-deep Sequencing data (ABEMUS), which combines platform-specific genetic knowledge and empirical signal to readily detect and quantify somatic SNVs in cfDNA. We tested the capability of our method to analyze data generated using different platforms with distinct sequencing error properties and we compared ABEMUS performances with other popular SNV callers on both synthetic and real cancer patients sequencing data. Results show that ABEMUS performs better in most of the tested conditions proving its reliability in calling low variant allele frequencies somatic SNVs in low ctDNA levels plasma samples. AVAILABILITY: ABEMUS is cross-platform and can be installed as R package. The source code is maintained on Github at http://github.com/cibiobcg/abemus and it is also available at CRAN official R repository. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

    Comparison and calibration of transcriptome data from RNA-Seq and tiling arrays

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Tiling arrays have been the tool of choice for probing an organism's transcriptome without prior assumptions about the transcribed regions, but RNA-Seq is becoming a viable alternative as the costs of sequencing continue to decrease. Understanding the relative merits of these technologies will help researchers select the appropriate technology for their needs.</p> <p>Results</p> <p>Here, we compare these two platforms using a matched sample of poly(A)-enriched RNA isolated from the second larval stage of <it>C. elegans</it>. We find that the raw signals from these two technologies are reasonably well correlated but that RNA-Seq outperforms tiling arrays in several respects, notably in exon boundary detection and dynamic range of expression. By exploring the accuracy of sequencing as a function of depth of coverage, we found that about 4 million reads are required to match the sensitivity of two tiling array replicates. The effects of cross-hybridization were analyzed using a "nearest neighbor" classifier applied to array probes; we describe a method for determining potential "black list" regions whose signals are unreliable. Finally, we propose a strategy for using RNA-Seq data as a gold standard set to calibrate tiling array data. All tiling array and RNA-Seq data sets have been submitted to the modENCODE Data Coordinating Center.</p> <p>Conclusions</p> <p>Tiling arrays effectively detect transcript expression levels at a low cost for many species while RNA-Seq provides greater accuracy in several regards. Researchers will need to carefully select the technology appropriate to the biological investigations they are undertaking. It will also be important to reconsider a comparison such as ours as sequencing technologies continue to evolve.</p

    Unraveling the clonal hierarchy of somatic genomic aberrations

    Get PDF
    Defining the chronology of molecular alterations may identify milestones in carcinogenesis. To unravel the temporal evolution of aberrations from clinical tumors, we developed CLONET, which upon estimation of tumor admixture and ploidy infers the clonal hierarchy of genomic aberrations. Comparative analysis across 100 sequenced genomes from prostate, melanoma, and lung cancers established diverse evolutionary hierarchies, demonstrating the early disruption of tumor-specific pathways. The analyses highlight the diversity of clonal evolution within and across tumor types that might be informative for risk stratification and patient selection for targeted therapies. CLONET addresses heterogeneous clinical samples seen in the setting of precision medicine. Electronic supplementary material The online version of this article (doi:10.1186/s13059-014-0439-6) contains supplementary material, which is available to authorized users
    • ā€¦
    corecore