174 research outputs found

    A framework for interpreting type I error rates from a product-term model of interaction applied to quantitative traits

    Get PDF
    Adequate control of type I error rates will be necessary in the increasing genome‐wide search for interactive effects on complex traits. After observing unexpected variability in type I error rates from SNP‐by‐genome interaction scans, we sought to characterize this variability and test the ability of heteroskedasticity‐consistent standard errors to correct it. We performed 81 SNP‐by‐genome interaction scans using a product‐term model on quantitative traits in a sample of 1,053 unrelated European Americans from the NHLBI Family Heart Study, and additional scans on five simulated datasets. We found that the interaction‐term genomic inflation factor (lambda) showed inflation and deflation that varied with sample size and allele frequency; that similar lambda variation occurred in the absence of population substructure; and that lambda was strongly related to heteroskedasticity but not to minor non‐normality of phenotypes. Heteroskedasticity‐consistent standard errors narrowed the range of lambda, with HC3 outperforming HC0, but in individual scans tended to create new P‐value outliers related to sparse two‐locus genotype classes. We explain the lambda variation as a result of non‐independence of test statistics coupled with stochastic biases in test statistics due to a failure of the test to reach asymptotic properties. We propose that one way to interpret lambda is by comparison to an empirical distribution generated from data simulated under the null hypothesis and without population substructure. We further conclude that the interaction‐term lambda should not be used to adjust test statistics and that heteroskedasticity‐consistent standard errors come with limitations that may outweigh their benefits in this setting

    Multilingual Speech Recognition With A Single End-To-End Model

    Full text link
    Training a conventional automatic speech recognition (ASR) system to support multiple languages is challenging because the sub-word unit, lexicon and word inventories are typically language specific. In contrast, sequence-to-sequence models are well suited for multilingual ASR because they encapsulate an acoustic, pronunciation and language model jointly in a single network. In this work we present a single sequence-to-sequence ASR model trained on 9 different Indian languages, which have very little overlap in their scripts. Specifically, we take a union of language-specific grapheme sets and train a grapheme-based sequence-to-sequence model jointly on data from all languages. We find that this model, which is not explicitly given any information about language identity, improves recognition performance by 21% relative compared to analogous sequence-to-sequence models trained on each language individually. By modifying the model to accept a language identifier as an additional input feature, we further improve performance by an additional 7% relative and eliminate confusion between different languages.Comment: Accepted in ICASSP 201

    State-of-the-art Speech Recognition With Sequence-to-Sequence Models

    Full text link
    Attention-based encoder-decoder architectures such as Listen, Attend, and Spell (LAS), subsume the acoustic, pronunciation and language model components of a traditional automatic speech recognition (ASR) system into a single neural network. In previous work, we have shown that such architectures are comparable to state-of-theart ASR systems on dictation tasks, but it was not clear if such architectures would be practical for more challenging tasks such as voice search. In this work, we explore a variety of structural and optimization improvements to our LAS model which significantly improve performance. On the structural side, we show that word piece models can be used instead of graphemes. We also introduce a multi-head attention architecture, which offers improvements over the commonly-used single-head attention. On the optimization side, we explore synchronous training, scheduled sampling, label smoothing, and minimum word error rate optimization, which are all shown to improve accuracy. We present results with a unidirectional LSTM encoder for streaming recognition. On a 12, 500 hour voice search task, we find that the proposed changes improve the WER from 9.2% to 5.6%, while the best conventional system achieves 6.7%; on a dictation task our model achieves a WER of 4.1% compared to 5% for the conventional system.Comment: ICASSP camera-ready versio

    Truncated and Helix-Constrained Peptides with High Affinity and Specificity for the cFos Coiled-Coil of AP-1

    Get PDF
    Protein-based therapeutics feature large interacting surfaces. Protein folding endows structural stability to localised surface epitopes, imparting high affinity and target specificity upon interactions with binding partners. However, short synthetic peptides with sequences corresponding to such protein epitopes are unstructured in water and promiscuously bind to proteins with low affinity and specificity. Here we combine structural stability and target specificity of proteins, with low cost and rapid synthesis of small molecules, towards meeting the significant challenge of binding coiled coil proteins in transcriptional regulation. By iteratively truncating a Jun-based peptide from 37 to 22 residues, strategically incorporating i-->i+4 helix-inducing constraints, and positioning unnatural amino acids, we have produced short, water-stable, alpha-helical peptides that bind cFos. A three-dimensional NMR-derived structure for one peptide (24) confirmed a highly stable alpha-helix which was resistant to proteolytic degradation in serum. These short structured peptides are entropically pre-organized for binding with high affinity and specificity to cFos, a key component of the oncogenic transcriptional regulator Activator Protein-1 (AP-1). They competitively antagonized the cJun–cFos coiled-coil interaction. Truncating a Jun-based peptide from 37 to 22 residues decreased the binding enthalpy for cJun by ~9 kcal/mol, but this was compensated by increased conformational entropy (TDS ≀ 7.5 kcal/mol). This study demonstrates that rational design of short peptides constrained by alpha-helical cyclic pentapeptide modules is able to retain parental high helicity, as well as high affinity and specificity for cFos. These are important steps towards small antagonists of the cJun-cFos interaction that mediates gene transcription in cancer and inflammatory diseases

    Spatial Organization and Molecular Correlation of Tumor-Infiltrating Lymphocytes Using Deep Learning on Pathology Images

    Get PDF
    Beyond sample curation and basic pathologic characterization, the digitized H&E-stained images of TCGA samples remain underutilized. To highlight this resource, we present mappings of tumorinfiltrating lymphocytes (TILs) based on H&E images from 13 TCGA tumor types. These TIL maps are derived through computational staining using a convolutional neural network trained to classify patches of images. Affinity propagation revealed local spatial structure in TIL patterns and correlation with overall survival. TIL map structural patterns were grouped using standard histopathological parameters. These patterns are enriched in particular T cell subpopulations derived from molecular measures. TIL densities and spatial structure were differentially enriched among tumor types, immune subtypes, and tumor molecular subtypes, implying that spatial infiltrate state could reflect particular tumor cell aberration states. Obtaining spatial lymphocytic patterns linked to the rich genomic characterization of TCGA samples demonstrates one use for the TCGA image archives with insights into the tumor-immune microenvironment

    Pan-Cancer Analysis of lncRNA Regulation Supports Their Targeting of Cancer Genes in Each Tumor Context

    Get PDF
    Long noncoding RNAs (lncRNAs) are commonly dys-regulated in tumors, but only a handful are known toplay pathophysiological roles in cancer. We inferredlncRNAs that dysregulate cancer pathways, onco-genes, and tumor suppressors (cancer genes) bymodeling their effects on the activity of transcriptionfactors, RNA-binding proteins, and microRNAs in5,185 TCGA tumors and 1,019 ENCODE assays.Our predictions included hundreds of candidateonco- and tumor-suppressor lncRNAs (cancerlncRNAs) whose somatic alterations account for thedysregulation of dozens of cancer genes and path-ways in each of 14 tumor contexts. To demonstrateproof of concept, we showed that perturbations tar-geting OIP5-AS1 (an inferred tumor suppressor) andTUG1 and WT1-AS (inferred onco-lncRNAs) dysre-gulated cancer genes and altered proliferation ofbreast and gynecologic cancer cells. Our analysis in-dicates that, although most lncRNAs are dysregu-lated in a tumor-specific manner, some, includingOIP5-AS1, TUG1, NEAT1, MEG3, and TSIX, synergis-tically dysregulate cancer pathways in multiple tumorcontexts

    Genomic, Pathway Network, and Immunologic Features Distinguishing Squamous Carcinomas

    Get PDF
    This integrated, multiplatform PanCancer Atlas study co-mapped and identified distinguishing molecular features of squamous cell carcinomas (SCCs) from five sites associated with smokin

    Pan-cancer Alterations of the MYC Oncogene and Its Proximal Network across the Cancer Genome Atlas

    Get PDF
    Although theMYConcogene has been implicated incancer, a systematic assessment of alterations ofMYC, related transcription factors, and co-regulatoryproteins, forming the proximal MYC network (PMN),across human cancers is lacking. Using computa-tional approaches, we define genomic and proteo-mic features associated with MYC and the PMNacross the 33 cancers of The Cancer Genome Atlas.Pan-cancer, 28% of all samples had at least one ofthe MYC paralogs amplified. In contrast, the MYCantagonists MGA and MNT were the most frequentlymutated or deleted members, proposing a roleas tumor suppressors.MYCalterations were mutu-ally exclusive withPIK3CA,PTEN,APC,orBRAFalterations, suggesting that MYC is a distinct onco-genic driver. Expression analysis revealed MYC-associated pathways in tumor subtypes, such asimmune response and growth factor signaling; chro-matin, translation, and DNA replication/repair wereconserved pan-cancer. This analysis reveals insightsinto MYC biology and is a reference for biomarkersand therapeutics for cancers with alterations ofMYC or the PMN
    • 

    corecore