682 research outputs found

    A Comparative Study on Regularization Strategies for Embedding-based Neural Networks

    Full text link
    This paper aims to compare different regularization strategies to address a common phenomenon, severe overfitting, in embedding-based neural networks for NLP. We chose two widely studied neural models and tasks as our testbed. We tried several frequently applied or newly proposed regularization strategies, including penalizing weights (embeddings excluded), penalizing embeddings, re-embedding words, and dropout. We also emphasized on incremental hyperparameter tuning, and combining different regularizations. The results provide a picture on tuning hyperparameters for neural NLP models.Comment: EMNLP '1

    Computational modeling for identification of low-frequency single nucleotide variants

    Get PDF
    Indiana University-Purdue University Indianapolis (IUPUI)Reliable detection of low-frequency single nucleotide variants (SNVs) carries great significance in many applications. In cancer genetics, the frequencies of somatic variants from tumor biopsies tend to be low due to contamination with normal tissue and tumor heterogeneity. Circulating tumor DNA monitoring also faces the challenge of detecting low-frequency variants due to the small percentage of tumor DNA in blood. Moreover, in population genetics, although pooled sequencing is cost-effective compared with individual sequencing, pooling dilutes the signals of variants from any individual. Detection of low frequency variants is difficult and can be cofounded by multiple sources of errors, especially next-generation sequencing artifacts. Existing methods are limited in sensitivity and mainly focus on frequencies around 5%; most fail to consider differential, context-specific sequencing artifacts. To face this challenge, we developed a computational and experimental framework, RareVar, to reliably identify low-frequency SNVs from high-throughput sequencing data. For optimized performance, RareVar utilized a supervised learning framework to model artifacts originated from different components of a specific sequencing pipeline. This is enabled by a customized, comprehensive benchmark data enriched with known low-frequency SNVs from the sequencing pipeline of interest. Genomic-context-specific sequencing error model was trained on the benchmark data to characterize the systematic sequencing artifacts, to derive the position-specific detection limit for sensitive low-frequency SNV detection. Further, a machine-learning algorithm utilized sequencing quality features to refine SNV candidates for higher specificity. RareVar outperformed existing approaches, especially at 0.5% to 5% frequency. We further explored the influence of statistical modeling on position specific error modeling and showed zero-inflated negative binomial as the best-performed statistical distribution. When replicating analyses on an Illumina MiSeq benchmark dataset, our method seamlessly adapted to technologies with different biochemistries. RareVar enables sensitive detection of low-frequency SNVs across different sequencing platforms and will facilitate research and clinical applications such as pooled sequencing, cancer early detection, prognostic assessment, metastatic monitoring, and relapses or acquired resistance identification

    PASSPORT-seq: A Novel High-Throughput Bioassay to Functionally Test Polymorphisms in Micro-RNA Target Sites

    Get PDF
    Next-generation sequencing (NGS) studies have identified large numbers of genetic variants that are predicted to alter miRNA-mRNA interactions. We developed a novel high-throughput bioassay, PASSPORT-seq, that can functionally test in parallel 100s of these variants in miRNA binding sites (mirSNPs). The results are highly reproducible across both technical and biological replicates. The utility of the bioassay was demonstrated by testing 100 mirSNPs in HEK293, HepG2, and HeLa cells. The results of several of the variants were validated in all three cell lines using traditional individual luciferase assays. Fifty-five mirSNPs were functional in at least one of three cell lines (FDR ≤ 0.05); 11, 36, and 27 of them were functional in HEK293, HepG2, and HeLa cells, respectively. Only four of the variants were functional in all three cell lines, which demonstrates the cell-type specific effects of mirSNPs and the importance of testing the mirSNPs in multiple cell lines. Using PASSPORT-seq, we functionally tested 111 variants in the 3' UTR of 17 pharmacogenes that are predicted to alter miRNA regulation. Thirty-three of the variants tested were functional in at least one cell line

    Programmable heating and quenching for non-equilibrium thermochemical synthesis

    Get PDF

    A Reinforcement Learning-assisted Genetic Programming Algorithm for Team Formation Problem Considering Person-Job Matching

    Full text link
    An efficient team is essential for the company to successfully complete new projects. To solve the team formation problem considering person-job matching (TFP-PJM), a 0-1 integer programming model is constructed, which considers both person-job matching and team members' willingness to communicate on team efficiency, with the person-job matching score calculated using intuitionistic fuzzy numbers. Then, a reinforcement learning-assisted genetic programming algorithm (RL-GP) is proposed to enhance the quality of solutions. The RL-GP adopts the ensemble population strategies. Before the population evolution at each generation, the agent selects one from four population search modes according to the information obtained, thus realizing a sound balance of exploration and exploitation. In addition, surrogate models are used in the algorithm to evaluate the formation plans generated by individuals, which speeds up the algorithm learning process. Afterward, a series of comparison experiments are conducted to verify the overall performance of RL-GP and the effectiveness of the improved strategies within the algorithm. The hyper-heuristic rules obtained through efficient learning can be utilized as decision-making aids when forming project teams. This study reveals the advantages of reinforcement learning methods, ensemble strategies, and the surrogate model applied to the GP framework. The diversity and intelligent selection of search patterns along with fast adaptation evaluation, are distinct features that enable RL-GP to be deployed in real-world enterprise environments.Comment: 16 page

    RareVar: A Framework for Detecting Low-Frequency Single-Nucleotide Variants

    Get PDF
    Accurate identification of low-frequency somatic point mutations in tumor samples has important clinical utilities. Although high-throughput sequencing technology enables capturing such variants while sequencing primary tumor samples, our ability for accurate detection is compromised when the variant frequency is close to the sequencer error rate. Most current experimental and bioinformatic strategies target mutations with ≥5% allele frequency, which limits our ability to understand the cancer etiology and tumor evolution. We present an experimental and computational modeling framework, RareVar, to reliably identify low-frequency single-nucleotide variants from high-throughput sequencing data under standard experimental protocols. RareVar protocol includes a benchmark design by pooling DNAs from already sequenced individuals at various concentrations to target variants at desired frequencies, 0.5%-3% in our case. By applying a generalized, linear model-based, position-specific error model, followed by machine-learning-based variant calibration, our approach outperforms existing methods. Our method can be applied on most capture and sequencing platforms without modifying the experimental protocol
    • …
    corecore