228 research outputs found
yrGATE: a web-based gene-structure annotation tool for the identification and dissemination of eukaryotic genes
Your Gene structure Annotation Tool for Eukaryotes (yrGATE) provides an Annotation Tool and Community Utilities for worldwide web-based community genome and gene annotation. Annotators can evaluate gene structure evidence derived from multiple sources to create gene structure annotations. Administrators regulate the acceptance of annotations into published gene sets. yrGATE is designed to facilitate rapid and accurate annotation of emerging genomes as well as to confirm, refine, or correct currently published annotations. yrGATE is highly portable and supports different standard input and output formats. The yrGATE software and usage cases are available at
xGDB: open-source computational infrastructure for the integrated evaluation and analysis of genome features
The eXtensible Genome Data Broker (xGDB) provides a software infrastructure consisting of integrated tools for the storage, display, and analysis of genome features in their genomic context. Common features include gene structure annotations, spliced alignments, mapping of repetitive sequence, and microarray probes, but the software supports inclusion of any property that can be associated with a genomic location. The xGDB distribution and user support utilities are available online at the xGDB project website, http://xgdb.sourceforge.net/
ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking
Summary: Unsupervised class discovery is a highly useful technique in cancer research, where intrinsic groups sharing biological characteristics may exist but are unknown. The consensus clustering (CC) method provides quantitative and visual stability evidence for estimating the number of unsupervised classes in a dataset. ConsensusClusterPlus implements the CC method in R and extends it with new functionality and visualizations including item tracking, item-consensus and cluster-consensus plots. These new features provide users with detailed information that enable more specific decisions in unsupervised class discovery
Identification of germline population variants misclassified as cancer-associated somatic variants
IntroductionDatabases used for clinical interpretation in oncology rely on genetic data derived primarily from patients of European ancestry, leading to biases in cancer genetics research and clinical practice. One practical issue that arises in this context is the potential misclassification of multi-ancestral population variants as tumor-associated because they are not represented in reference genomes against which tumor sequencing data is aligned.MethodsTo systematically find misclassified variants, we compared somatic variants in census genes from the Catalogue of Somatic Mutations in Cancer (COSMIC) V99 with multi-ancestral population variants from the Genome Aggregation Databases’ Linkage Disequilibrium (GnomAD). By comparing genomic coordinates, reference, and alternate alleles, we could identify misclassified variants in genes associated with cancer.ResultsWe found 192 of 208 genes in COSMIC’s cancer-associated census genes (92.31%) to be associated with variant misclassifications. Among the 1,906,732 variants in COSMIC, 6,957 variants (0.36%) aligned with normal population variants in GnomAD, concerning for misclassification. The African / African American ancestral population included the greatest number of misclassified variants and also had the greatest number of unique misclassified variants.ConclusionThe direct, systematic comparison of variants from COSMIC for co-occurrence in GnomAD supports a more accurate interpretation of tumor sequencing data and reduces bias related to genomic ancestry
Cafeteria diet-induced obesity causes oxidative damage in white adipose
Obesity continues to be one of the most prominent public health dilemmas in the world. The complex interaction among the varied causes of obesity makes it a particularly challenging problem to address. While typical high-fat purified diets successfully induce weight gain in rodents, we have described a more robust model of diet-induced obesity based on feeding rats a diet consisting of highly palatable, energy-dense human junk foods – the “cafeteria” diet (CAF, 45-53% kcal from fat). We previously reported that CAF-fed rats became hyperphagic, gained more weight, and developed more severe hyperinsulinemia, hyperglycemia, and glucose intolerance compared to the lard-based 45% kcal from fat high fat diet–fed group. In addition, the CAF diet-fed group displayed a higher degree of inflammation in adipose and liver, mitochondrial dysfunction, and an increased concentration of lipid-derived, pro-inflammatory mediators. Building upon our previous findings, we aimed to determine mechanisms that underlie physiologic findings in the CAF diet. We investigated the effect of CAF diet-induced obesity on adipose tissue specifically using expression arrays and immunohistochemistry. Genomic evidence indicated the CAF diet induced alterations in the white adipose gene transcriptome, with notable suppression of glutathione-related genes and pathways involved in mitigating oxidative stress. Immunohistochemical analysis indicated a doubling in adipose lipid peroxidation marker 4-HNE levels compared to rats that remained lean on control standard chow diet. Our data indicates that the CAF diet drives an increase in oxidative damage in white adipose tissue that may affect tissue homeostasis. Oxidative stress drives activation of inflammatory kinases that can perturb insulin signaling leading to glucose intolerance and diabetes
BlackOPs: Increasing confidence in variant detection through mappability filtering
Identifying variants using high-throughput sequen-cing data is currently a challenge because true biological variants can be indistinguishable from technical artifacts. One source of technical arti-fact results from incorrectly aligning experimen-tally observed sequences to their true genomic origin (‘mismapping’) and inferring differences in mismapped sequences to be true variants. We de-veloped BlackOPs, an open-source tool that simu-lates experimental RNA-seq and DNA whole exome sequences derived from the reference genome, aligns these sequences by custom parameters, detects variants and outputs a blacklist of positions and alleles caused by mismapping. Blacklist
ABRA: improved coding indel detection via assembly-based realignment
Motivation: Variant detection from next-generation sequencing (NGS) data is an increasingly vital aspect of disease diagnosis, treatment and research. Commonly used NGS-variant analysis tools generally rely on accurately mapped short reads to identify somatic variants and germ-line genotypes. Existing NGS read mappers have difficulty accurately mapping short reads containing complex variation (i.e. more than a single base change), thus making identification of such variants difficult or impossible. Insertions and deletions (indels) in particular have been an area of great difficulty. Indels are frequent and can have substantial impact on function, which makes their detection all the more imperative.Results: We present ABRA, an assembly-based realigner, which uses an efficient and flexible localized de novo assembly followed by global realignment to more accurately remap reads. This results in enhanced performance for indel detection as well as improved accuracy in variant allele frequency estimation.Availability and implementation: ABRA is implemented in a combination of Java and C/C++ and is freely available for download at https://github.com/mozack/abra.Contact: [email protected]; [email protected] information: Supplementary data are available at Bioinformatics online
Integrative Analysis of miRNAs Identifies Clinically Relevant Epithelial and Stromal Subtypes of Head and Neck Squamous Cell Carcinoma
PURPOSE: The objective of this study is to characterize the role of miRNAs in the classification of head and neck squamous cell carcinoma (HNSCC). EXPERIMENTAL DESIGN: Here, we analyzed 562 HNSCC samples, 88 from a novel cohort and 474 from The Cancer Genome Atlas, using miRNA microarray and miRNA sequencing, respectively. Using an integrative correlations method followed by miRNA expression-based hierarchical clustering, we validated miRNA clusters across cohorts. Evaluation of clusters by logistic regression and gene ontology approaches revealed subtype-based clinical and biological characteristics. RESULTS: We identified two independently validated and statistically significant (P < 0.01) tumor subtypes and named them "epithelial" and "stromal" based on associations with functional target gene ontology relating to differing stages of epithelial cell differentiation. miRNA-based subtypes were correlated with individual gene expression targets based on miRNA seed sequences, as well as with miRNA families and clusters including the miR-17 and miR-200 families. These correlated genes defined pathways relevant to normal squamous cell function and pathophysiology. miRNA clusters statistically associated with differential mutation patterns including higher proportions of TP53 mutations in the stromal class and higher NSD1 and HRAS mutation frequencies in the epithelial class. miRNA classes correlated with previously reported gene expression subtypes, clinical characteristics, and clinical outcomes in a multivariate Cox proportional hazards model with stromal patients demonstrating worse prognoses (HR, 1.5646; P = 0.006). CONCLUSIONS: We report a reproducible classification of HNSCC based on miRNA that associates with known pathologically altered pathways and mutations of squamous tumors and is clinically relevant
SigFuge: Single gene clustering of RNA-seq reveals differential isoform usage among cancer samples
High-throughput sequencing technologies, including RNA-seq, have made it possible to move beyond gene expression analysis to study transcriptional events including alternative splicing and gene fusions. Furthermore, recent studies in cancer have suggested the importance of identifying transcriptionally altered loci as biomarkers for improved prognosis and therapy. While many statistical methods have been proposed for identifying novel transcriptional events with RNA-seq, nearly all rely on contrasting known classes of samples, such as tumor and normal. Few tools exist for the unsupervised discovery of such events without class labels. In this paper, we present SigFuge for identifying genomic loci exhibiting differential transcription patterns across many RNA-seq samples. SigFuge combines clustering with hypothesis testing to identify genes exhibiting alternative splicing, or differences in isoform expression. We apply SigFuge to RNA-seq cohorts of 177 lung and 279 head and neck squamous cell carcinoma samples from the Cancer Genome Atlas, and identify several cases of differential isoform usage including CDKN2A, a tumor suppressor gene known to be inactivated in a majority of lung squamous cell tumors. By not restricting attention to known sample stratifications, SigFuge offers a novel approach to unsupervised screening of genetic loci across RNA-seq cohorts. SigFuge is available as an R package through Bioconductor
SWISS MADE: Standardized WithIn Class Sum of Squares to Evaluate Methodologies and Dataset Elements
Contemporary high dimensional biological assays, such as mRNA expression microarrays, regularly involve multiple data processing steps, such as experimental processing, computational processing, sample selection, or feature selection (i.e. gene selection), prior to deriving any biological conclusions. These steps can dramatically change the interpretation of an experiment. Evaluation of processing steps has received limited attention in the literature. It is not straightforward to evaluate different processing methods and investigators are often unsure of the best method. We present a simple statistical tool, Standardized WithIn class Sum of Squares (SWISS), that allows investigators to compare alternate data processing methods, such as different experimental methods, normalizations, or technologies, on a dataset in terms of how well they cluster a priori biological classes. SWISS uses Euclidean distance to determine which method does a better job of clustering the data elements based on a priori classifications. We apply SWISS to three different gene expression applications. The first application uses four different datasets to compare different experimental methods, normalizations, and gene sets. The second application, using data from the MicroArray Quality Control (MAQC) project, compares different microarray platforms. The third application compares different technologies: a single Agilent two-color microarray versus one lane of RNA-Seq. These applications give an indication of the variety of problems that SWISS can be helpful in solving. The SWISS analysis of one-color versus two-color microarrays provides investigators who use two-color arrays the opportunity to review their results in light of a single-channel analysis, with all of the associated benefits offered by this design. Analysis of the MACQ data shows differential intersite reproducibility by array platform. SWISS also shows that one lane of RNA-Seq clusters data by biological phenotypes as well as a single Agilent two-color microarray
- …