849 research outputs found
Oculus: faster sequence alignment by streaming read compression
Abstract
Background
Despite significant advancement in alignment algorithms, the exponential growth of nucleotide sequencing throughput threatens to outpace bioinformatic analysis. Computation may become the bottleneck of genome analysis if growing alignment costs are not mitigated by further improvement in algorithms. Much gain has been gleaned from indexing and compressing alignment databases, but many widely used alignment tools process input reads sequentially and are oblivious to any underlying redundancy in the reads themselves.
Results
Here we present Oculus, a software package that attaches to standard aligners and exploits read redundancy by performing streaming compression, alignment, and decompression of input sequences. This nearly lossless process (> 99.9%) led to alignment speedups of up to 270% across a variety of data sets, while requiring a modest amount of memory. We expect that streaming read compressors such as Oculus could become a standard addition to existing RNA-Seq and ChIP-Seq alignment pipelines, and potentially other applications in the future as throughput increases.
Conclusions
Oculus efficiently condenses redundant input reads and wraps existing aligners to provide nearly identical SAM output in a fraction of the aligner runtime. It includes a number of useful features, such as tunable performance and fidelity options, compatibility with FASTA or FASTQ files, and adherence to the SAM format. The platform-independent C++ source code is freely available online, at
http://code.google.com/p/oculus-bio
.http://deepblue.lib.umich.edu/bitstream/2027.42/112673/1/12859_2012_Article_5548.pd
The lncRNA landscape of breast cancer reveals a role for DSCAM-AS1 in breast cancer progression.
Molecular classification of cancers into subtypes has resulted in an advance in our understanding of tumour biology and treatment response across multiple tumour types. However, to date, cancer profiling has largely focused on protein-coding genes, which comprise <1% of the genome. Here we leverage a compendium of 58,648 long noncoding RNAs (lncRNAs) to subtype 947 breast cancer samples. We show that lncRNA-based profiling categorizes breast tumours by their known molecular subtypes in breast cancer. We identify a cohort of breast cancer-associated and oestrogen-regulated lncRNAs, and investigate the role of the top prioritized oestrogen receptor (ER)-regulated lncRNA, DSCAM-AS1. We demonstrate that DSCAM-AS1 mediates tumour progression and tamoxifen resistance and identify hnRNPL as an interacting protein involved in the mechanism of DSCAM-AS1 action. By highlighting the role of DSCAM-AS1 in breast cancer biology and treatment resistance, this study provides insight into the potential clinical implications of lncRNAs in breast cancer
Recommended from our members
Machine Learning Framework to Identify Individuals at Risk of Rapid Progression of Coronary Atherosclerosis: From the PARADIGM Registry.
Background Rapid coronary plaque progression (RPP) is associated with incident cardiovascular events. To date, no method exists for the identification of individuals at risk of RPP at a single point in time. This study integrated coronary computed tomography angiography-determined qualitative and quantitative plaque features within a machine learning (ML) framework to determine its performance for predicting RPP. Methods and Results Qualitative and quantitative coronary computed tomography angiography plaque characterization was performed in 1083 patients who underwent serial coronary computed tomography angiography from the PARADIGM (Progression of Atherosclerotic Plaque Determined by Computed Tomographic Angiography Imaging) registry. RPP was defined as an annual progression of percentage atheroma volume â„1.0%. We employed the following ML models: model 1, clinical variables; model 2, model 1 plus qualitative plaque features; model 3, model 2 plus quantitative plaque features. ML models were compared with the atherosclerotic cardiovascular disease risk score, Duke coronary artery disease score, and a logistic regression statistical model. 224 patients (21%) were identified as RPP. Feature selection in ML identifies that quantitative computed tomography variables were higher-ranking features, followed by qualitative computed tomography variables and clinical/laboratory variables. ML model 3 exhibited the highest discriminatory performance to identify individuals who would experience RPP when compared with atherosclerotic cardiovascular disease risk score, the other ML models, and the statistical model (area under the receiver operating characteristic curve in ML model 3, 0.83 [95% CI 0.78-0.89], versus atherosclerotic cardiovascular disease risk score, 0.60 [0.52-0.67]; Duke coronary artery disease score, 0.74 [0.68-0.79]; ML model 1, 0.62 [0.55-0.69]; ML model 2, 0.73 [0.67-0.80]; all P<0.001; statistical model, 0.81 [0.75-0.87], P=0.128). Conclusions Based on a ML framework, quantitative atherosclerosis characterization has been shown to be the most important feature when compared with clinical, laboratory, and qualitative measures in identifying patients at risk of RPP
Efficacy and safety of once-daily nevirapine- or efavirenz-based antiretroviral therapy in HIV-associated tuberculosis: a randomized clinical trial
Background: Nevirapine (NVP) can be safely and effectively administered once-daily but has not been assessed in human immunodeficiency virus (HIV)âinfected patients with tuberculosis (TB). We studied the safety and efficacy of once-daily NVP, compared with efavirenz (EFV; standard therapy); both drugs were administered in combination with 2 nucleoside reverse-transcriptase inhibitors. Methods: An open-label, noninferiority, randomized controlled clinical trial was conducted at 3 sites in southern India. HIV-infected patients with TB were treated with a standard short-course anti-TB regimen (2EHRZ3/4RH3; [2 months of Ethambutol, Isoniazid, Rifampicin, Pyrazinamide/4 months of Isoniazid and Rifampicin] thrice weekly) and randomized to receive once-daily EFV at a dose of 600 mg or NVP at a dose of 400 mg (after 14 days of 200 mg administered once daily) with didanosine 250/400 mg and lamivudine 300 mg after 2 months. Sputum smears and mycobacterial cultures were performed every month. CD4+ cell count, viral load, and liver function test results were monitored periodically. Primary outcome was a composite of death, virological failure, default, or serious adverse event (SAE) at 24 weeks. Both intent-to-treat and per protocol analyses were done, and planned interim analyses were performed. Results: A total of 116 patients (75% [87 patients] of whom had pulmonary TB), with a mean age of 36 years, a median CD4+ cell count of 84 cells/mm3, and a median viral load of 310?000 copies/mL, were randomized. At 24 weeks, 50 of 59 patients in the EFV group and 37 of 57 patients in the NVP group had virological suppression (P = .024). There were no deaths, 1 SAE, and 5 treatment failures in the EFV arm, compared with 5 deaths, 2 SAEs, and 10 treatment failures in the NVP arm. The trial was halted by the data and safety monitoring board at the second interim analysis. Favorable TB treatment outcomes were observed in 93% of the patients in the EFV arm and 84% of the patients in the NVP arm (P = .058). Conclusions: Compared with a regimen of didanosine, lamivudine, and EFV, a regimen of once-daily didanosine, lamivudine, and NVP was inferior and was associated with more frequent virologic failure and death
Molecular profiling of ETS and nonâETS aberrations in prostate cancer patients from northern India
BACKGROUNDMolecular stratification of prostate cancer (PCa) based on genetic aberrations including ETS or RAF geneârearrangements, PTEN deletion, and SPINK1 overâexpression show clear prognostic and diagnostic utility. Gene rearrangements involving ETS transcription factors are frequent pathogenetic somatic events observed in PCa. Incidence of ETS rearrangements in Caucasian PCa patients has been reported, however, occurrence in Indian population is largely unknown. The aim of this study was to determine the prevalence of the ETS and RAF kinase gene rearrangements, SPINK1 overâexpression, and PTEN deletion in this cohort.METHODSIn this multiâcenter study, formalinâfixed paraffin embedded (FFPE) PCa specimens (nâ=â121) were procured from four major medical institutions in India. The tissues were sectioned and molecular profiling was done using immunohistochemistry (IHC), RNA in situ hybridization (RNAâISH) and/or fluorescence in situ hybridization (FISH).RESULTSERG overâexpression was detected in 48.9% (46/94) PCa specimens by IHC, which was confirmed in a subset of cases by FISH. Among other ETS family members, while ETV1 transcript was detected in one case by RNAâISH, no alteration in ETV4 was observed. SPINK1 overâexpression was observed in 12.5% (12/96) and PTEN deletion in 21.52% (17/79) of the total PCa cases. Interestingly, PTEN deletion was found in 30% of the ERGâpositive cases (Pâ=â0.017) but in only one case with SPINK1 overâexpression (Pâ=â0.67). BRAF and RAF1 gene rearrangements were detected in âŒ1% and âŒ4.5% of the PCa cases, respectively.CONCLUSIONSThis is the first report on comprehensive molecular profiling of the major spectrum of the causal aberrations in Indian men with PCa. Our findings suggest that ETS gene rearrangement and SPINK1 overâexpression patterns in North Indian population largely resembled those observed in Caucasian population but differed from Japanese and Chinese PCa patients. The molecular profiling data presented in this study could help in clinical decisionâmaking for the pursuit of surgery, diagnosis, and in selection of therapeutic intervention. Prostate 75:1051â1062, 2015. © 2015 The Authors. The Prostate, published by Wiley Periodicals, Inc.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/111808/1/pros22989.pd
The long non-coding RNA PCAT-1 promotes prostate cancer cell proliferation through cMyc.
Long non-coding RNAs (lncRNAs) represent an emerging layer of cancer biology, contributing to tumor proliferation, invasion, and metastasis. Here, we describe a role for the oncogenic lncRNA PCAT-1 in prostate cancer proliferation through cMyc. We find that PCAT-1-mediated proliferation is dependent on cMyc protein stabilization, and using expression profiling, we observed that cMyc is required for a subset of PCAT-1-induced expression changes. The PCAT-1-cMyc relationship is mediated through the post-transcriptional activity of the MYC 3\u27 untranslated region, and we characterize a role for PCAT-1 in the disruption of MYC-targeting microRNAs. To further elucidate a role for post-transcriptional regulation, we demonstrate that targeting PCAT-1 with miR-3667-3p, which does not target MYC, is able to reverse the stabilization of cMyc by PCAT-1. This work establishes a basis for the oncogenic role of PCAT-1 in cancer cell proliferation and is the first study to implicate lncRNAs in the regulation of cMyc in prostate cancer
Analysis of long non-coding RNAs highlights tissue-specific expression patterns and epigenetic profiles in normal and psoriatic skin
Abstract
Background
Although analysis pipelines have been developed to use RNA-seq to identify long non-coding RNAs (lncRNAs), inference of their biological and pathological relevance remains a challenge. As a result, most transcriptome studies of autoimmune disease have only assessed protein-coding transcripts.
Results
We used RNA-seq data from 99 lesional psoriatic, 27 uninvolved psoriatic, and 90 normal skin biopsies, and applied computational approaches to identify and characterize expressed lncRNAs. We detect 2,942 previously annotated and 1,080 novel lncRNAs which are expected to be skin specific. Notably, over 40% of the novel lncRNAs are differentially expressed and the proportions of differentially expressed transcripts among protein-coding mRNAs and previously-annotated lncRNAs are lower in psoriasis lesions versus uninvolved or normal skin. We find that many lncRNAs, in particular those that are differentially expressed, are co-expressed with genes involved in immune related functions, and that novel lncRNAs are enriched for localization in the epidermal differentiation complex. We also identify distinct tissue-specific expression patterns and epigenetic profiles for novel lncRNAs, some of which are shown to be regulated by cytokine treatment in cultured human keratinocytes.
Conclusions
Together, our results implicate many lncRNAs in the immunopathogenesis of psoriasis, and our results provide a resource for lncRNA studies in other autoimmune diseases.http://deepblue.lib.umich.edu/bitstream/2027.42/110307/1/13059_2014_Article_570.pd
Automatic segmentation of multiple cardiovascular structures from cardiac computed tomography angiography images using deep learning.
OBJECTIVES:To develop, demonstrate and evaluate an automated deep learning method for multiple cardiovascular structure segmentation. BACKGROUND:Segmentation of cardiovascular images is resource-intensive. We design an automated deep learning method for the segmentation of multiple structures from Coronary Computed Tomography Angiography (CCTA) images. METHODS:Images from a multicenter registry of patients that underwent clinically-indicated CCTA were used. The proximal ascending and descending aorta (PAA, DA), superior and inferior vena cavae (SVC, IVC), pulmonary artery (PA), coronary sinus (CS), right ventricular wall (RVW) and left atrial wall (LAW) were annotated as ground truth. The U-net-derived deep learning model was trained, validated and tested in a 70:20:10 split. RESULTS:The dataset comprised 206 patients, with 5.130 billion pixels. Mean age was 59.9 ± 9.4 yrs., and was 42.7% female. An overall median Dice score of 0.820 (0.782, 0.843) was achieved. Median Dice scores for PAA, DA, SVC, IVC, PA, CS, RVW and LAW were 0.969 (0.979, 0.988), 0.953 (0.955, 0.983), 0.937 (0.934, 0.965), 0.903 (0.897, 0.948), 0.775 (0.724, 0.925), 0.720 (0.642, 0.809), 0.685 (0.631, 0.761) and 0.625 (0.596, 0.749) respectively. Apart from the CS, there were no significant differences in performance between sexes or age groups. CONCLUSIONS:An automated deep learning model demonstrated segmentation of multiple cardiovascular structures from CCTA images with reasonable overall accuracy when evaluated on a pixel level
- âŠ