6 research outputs found

    FQStat: a parallel architecture for very high-speed assessment of sequencing quality metrics

    Get PDF
    Background: High throughput DNA/RNA sequencing has revolutionized biological and clinical research. Sequencing is widely used, and generates very large amounts of data, mainly due to reduced cost and advanced technologies. Quickly assessing the quality of giga-to-tera base levels of sequencing data has become a routine but important task. Identification and elimination of low-quality sequence data is crucial for reliability of downstream analysis results. There is a need for a high-speed tool that uses optimized parallel programming for batch processing and simply gauges the quality of sequencing data from multiple datasets independent of any other processing steps. Results: FQStat is a stand-alone, platform-independent software tool that assesses the quality of FASTQ files using parallel programming. Based on the machine architecture and input data, FQStat automatically determines the number of cores and the amount of memory to be allocated per file for optimum performance. Our results indicate that in a core-limited case, core assignment overhead exceeds the benefit of additional cores. In a core-unlimited case, there is a saturation point reached in performance by increasingly assigning additional cores per file. We also show that memory allocation per file has a lower priority in performance when compared to the allocation of cores. FQStat’s output is summarized in HTML web page, tab-delimited text file, and high-resolution image formats. FQStat calculates and plots read count, read length, quality score, and high-quality base statistics. FQStat identifies and marks low-quality sequencing data to suggest removal from downstream analysis. We applied FQStat on real sequencing data to optimize performance and to demonstrate its capabilities. We also compared FQStat’s performance to similar quality control (QC) tools that utilize parallel programming and attained improvements in run time. Conclusions: FQStat is a user-friendly tool with a graphical interface that employs a parallel programming architecture and automatically optimizes its performance to generate quality control statistics for sequencing data. Unlike existing tools, these statistics are calculated for multiple datasets and separately at the “lane,” “sample,” and “experiment” level to identify subsets of the samples with low quality, thereby preventing the loss of complete samples when reliable data can still be obtained. Includes 6 supplemental file

    BORIS expression in ovarian cancer precursor cells alters the CTCF cistrome and enhances invasiveness through GALNT14

    Get PDF
    High-grade serous carcinoma (HGSC) is the most aggressive and predominant form of epithelial ovarian cancer and the leading cause of gynecological cancer death. We have previously shown that CTCFL (also known as BORIS, Brother of the Regulator of Imprinted Sites) is expressed in most ovarian cancers, and is associated with global and promoter-specific DNA hypomethylation, advanced tumor stage, and poor prognosis. To explore its role in HGSC, we expressed BORIS in human fallopian tube secretory epithelial cells (FTSEC), the presumptive cells of origin for HGSC. BORIS-expressing cells exhibited increased motility and invasion, and BORIS expression was associated with alterations in several cancer-associated gene expression networks, including fatty acid metabolism, TNF signaling, cell migration, and ECM-receptor interactions. Importantly, GALNT14, a glycosyltransferase gene implicated in cancer cell migration and invasion, was highly induced by BORIS, and GALNT14 knockdown significantly abrogated BORIS-induced cell motility and invasion. In addition, in silico analyses provided evidence for BORIS and GALNT14 co-expression in several cancers. Finally, ChIP-seq demonstrated that expression of BORIS was associated with de novo and enhanced binding of CTCF at hundreds of loci, many of which correlated with activation of transcription at target genes, including GALNT14. Taken together, our data indicate that BORIS may promote cell motility and invasion in HGSC via upregulation of GALNT14, and suggests BORIS as a potential therapeutic target in this malignancy

    KEGG2Net: Deducing gene interaction networks and acyclic graphs from KEGG pathways

    Get PDF
    The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database provides a manual curation of biological pathways that involve genes (or gene products), metabolites, chemical compounds, maps, and other entries. However, most applications and datasets involved in omics are gene or protein-centric requiring pathway representations that include direct and indirect interactions only between genes. Furthermore, special methodologies, such as Bayesian networks, require acyclic representations of graphs. We developed KEGG2Net, a web resource that generates a network involving only the genes represented on a KEGG pathway with all of the direct and indirect gene-gene interactions deduced from the pathway. KEGG2Net offers four different methods to remove cycles from the resulting gene interaction network, converting them into directed acyclic graphs (DAGs). We generated synthetic gene expression data using the gene interaction networks deduced from the KEGG pathways and performed a comparative analysis of different cycle removal methods by testing the fitness of their DAGs to the data and by the number of edges they eliminate. Our results indicate that an ensemble method for cycle removal performs as the best approach to convert the gene interaction networks into DAGs. Resulting gene interaction networks and DAGs are represented in multiple user-friendly formats that can be used in other applications, and as images for quick and easy visualisation. The KEGG2Net web portal converts KEGG maps for any organism into gene-gene interaction networks and corresponding DAGs representing all of the direct and indirect interactions among the genes

    Epigenetic activation of POTE genes in ovarian cancer

    No full text
    The POTE gene family consists of 14 homologous genes localized to autosomal pericentromeres, and a sub-set of POTEs are cancer-testis antigen (CTA) genes. POTEs are over-expressed in epithelial ovarian cancer (EOC), including the high-grade serous subtype (HGSC), and expression of individual POTEs correlates with chemoresistance and reduced survival in HGSC. The mechanisms driving POTE overexpression in EOC and other cancers is unknown. Here, we investigated the role of epigenetics in regulating POTE expression, with a focus on DNA hypomethylation. Consistent with their pericentromeric localization, Pan-POTE expression in EOC correlated with expression of the pericentromeric repeat NBL2, which was not the case for non-pericentromeric CTAs. POTE genomic regions contain LINE-1 (L1) sequences, and Pan-POTE expression correlated with both global and POTE-specific L1 hypomethylation in EOC. Analysis of individual POTEs using RNA-seq and DNA methylome data from fallopian tube epithelia (FTE) and HGSC revealed that POTEs C, E, and F have increased expression in HGSC in conjunction with DNA hypomethylation at 5’ promoter or enhancer regions. Moreover, POTEs C/E/F showed additional increased expression in recurrent HGSC in conjunction with 5’ hypomethylation, using patient-matched samples. Experiments using decitabine treatment and DNMT knockout cell lines verified a functional contribution of DNA methylation to POTE repression, and epigenetic drug combinations targeting histone deacetylases (HDACs) and histone methyltransferases (HMTs) in combination with decitabine further increased POTE expression. In summary, several alterations of the cancer epigenome, including pericentromeric activation, global and locus-specific L1 hypomethylation, and locus-specific 5’ CpG hypomethylation, converge to promote POTE expression in ovarian cancer
    corecore