71 research outputs found

    HanoiT: Enhancing Context-aware Translation via Selective Context

    Full text link
    Context-aware neural machine translation aims to use the document-level context to improve translation quality. However, not all words in the context are helpful. The irrelevant or trivial words may bring some noise and distract the model from learning the relationship between the current sentence and the auxiliary context. To mitigate this problem, we propose a novel end-to-end encoder-decoder model with a layer-wise selection mechanism to sift and refine the long document context. To verify the effectiveness of our method, extensive experiments and extra quantitative analysis are conducted on four document-level machine translation benchmarks. The experimental results demonstrate that our model significantly outperforms previous models on all datasets via the soft selection mechanism

    A community computational challenge to predict the activity of pairs of compounds

    No full text
    Recent therapeutic successes have renewed interest in drug combinations, but experimental screening approaches are costly and often identify only small numbers of synergistic combinations. The DREAM consortium launched an open challenge to foster the development of in silico methods to computationally rank 91 compound pairs, from the most synergistic to the most antagonistic, based on gene-expression profiles of human B cells treated with individual compounds at multiple time points and concentrations. Using scoring metrics based on experimental dose-response curves, we assessed 32 methods (31 community-generated approaches and SynGen), four of which performed significantly better than random guessing. We highlight similarities between the methods. Although the accuracy of predictions was not optimal, we find that computational prediction of compound-pair activity is possible, and that community challenges can be useful to advance the field of in silico compound-synergy prediction

    A 28GHz low-noise amplifier based on 40nm CMOS

    No full text
    This dissertation first introduces the theoretical basis of low-noise amplifiers, which includes active devices in CMOS process, passive devices, common topologies and performance indicators of LNAs. Then the design process of the LNA is introduced, including the selection of bias voltage and active device parameters, the design of on-chip inductor, the design of reducing noise factor and the design of matching network.Master of Science (Integrated Circuit Design

    Quantification of Uncertainty in Peptide-MHC Binding Prediction Improves High-Affinity Peptide Selection for Therapeutic Design

    No full text
    The computational identification of peptides that can bind the major histocompatibility complex (MHC) with high affinity is an essential step in developing personal immunotherapies and vaccines. We introduce PUFFIN, a deep residual network-based computational approach that quantifies uncertainty in peptide-MHC affinity prediction that arises from observational noise and the lack of relevant training examples. With PUFFIN's uncertainty metrics, we define binding likelihood, the probability a peptide binds to a given MHC allele at a specified affinity threshold. Compared to affinity point estimates, we find that binding likelihood correlates better with the observed affinity and reduces false positives in high-affinity peptide design. When applied to examine an existing peptide vaccine, PUFFIN identifies an alternative vaccine formulation with higher binding likelihood. PUFFIN is freely available for download at http://github.com/gifford-lab/PUFFIN. Machine-learning models that predict the binding affinity of a peptide-MHC pair are essential in peptide-based therapeutic design, but state-of-the-art methods provide point estimates of affinity that do not consider measurement noise and model uncertainty. We introduce PUFFIN, a method that quantifies the prediction uncertainty and prioritizes peptides with “binding likelihood” to achieve improved accuracy in high-affinity peptide selection for therapeutic design.National Institute of Health (Grant R01CA218094

    Predicting the impact of non-coding variants on DNA methylation

    No full text
    DNA methylation plays a crucial role in the establishment of tissue-specific gene expression and the regulation of key biological processes. However, our present inability to predict the effect of genome sequence variation on DNA methylation precludes a comprehensive assessment of the consequences of non-coding variation. We introduce CpGenie, a sequence-based framework that learns a regulatory code of DNA methylation using a deep convolutional neural network and uses this network to predict the impact of sequence variation on proximal CpG site DNA methylation. CpGenie produces allele-specific DNA methylation prediction with single-nucleotide sensitivity that enables accurate prediction of methylation quantitative trait loci (meQTL). We demonstrate that CpGenie prioritizes validated GWAS SNPs, and contributes to the prediction of functional non-coding variants, including expression quantitative trait loci (eQTL) and disease-associated mutations. CpGenie is publicly available to assist in identifying and interpreting regulatory non-coding variants.National Cancer Institute (U.S.) (R01HG008363)National Cancer Institute (U.S.) (U01HG007037)NVIDIA Corporatio

    DeepLigand: accurate prediction of MHC class I ligands using peptide embedding

    No full text
    Motivation: The computational modeling of peptide display by class I major histocompatibility complexes (MHCs) is essential for peptide-based therapeutics design. Existing computational methods for peptide-display focus on modeling the peptide-MHC-binding affinity. However, such models are not able to characterize the sequence features for the other cellular processes in the peptide display pathway that determines MHC ligand selection. Results: We introduce a semi-supervised model, DeepLigand that outperforms the state-of-the-art models in MHC Class I ligand prediction. DeepLigand combines a peptide language model and peptide binding affinity prediction to score MHC class I peptide presentation. The peptide language model characterizes sequence features that correspond to secondary factors in MHC ligand selection other than binding affinity. The peptide embedding is learned by pre-training on natural ligands, and can discriminate between ligands and non-ligands in the absence of binding affinity prediction. Although conventional affinity-based models fail to classify peptides with moderate affinities, DeepLigand discriminates ligands from non-ligands with consistently high accuracy.National Institutes of Health (U.S.) (Grant R01CA218094

    Machine learning models for functional genomics and therapeutic design

    No full text
    This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019Cataloged from student-submitted PDF version of thesis.Includes bibliographical references (pages 213-230).Due to the limited size of training data available, machine learning models for biology have remained rudimentary and inaccurate despite the significant advance in machine learning research. With the recent advent of high-throughput sequencing technology, an exponentially growing number of genomic and proteomic datasets have been generated. These large-scale datasets admit the training of high-capacity machine learning models to characterize sophisticated features and produce accurate predictions on unseen examples. In this thesis, we attempt to develop advanced machine learning models for functional genomics and therapeutics design, two areas with ample data deposited in public databases and tremendous clinical implications. The shared theme of these models is to learn how the composition of a biological sequence encodes a functional phenotype and then leverage such knowledge to provide insight for target discovery and therapeutic design.First, we design three machine learning models that predict transcription factor binding and DNA methylation, two fundamental epigenetic phenotypes closely tied to gene regulation, from DNA sequence alone. We show that these epigenetic phenotypes can be well predicted from the sequence context. Moreover, the predicted change in phenotype between the reference and alternate allele of a genetic variant accurately reflect its functional impact and improves the identification of regulatory variants causal for complex diseases. Second, we devise two machine learning models that improve the prediction of peptides displayed by the major histocompatibility complex (MHC) on the cell surface. Computational modeling of peptide-display by MHC is central in the design of peptide-based therapeutics.Our first machine learning model introduces the capacity to quantify uncertainty in the computational prediction and proposes a new metric for peptide prioritization that reduces false positives in high-affinity peptide design. The second model improves the state-of-the-art performance in MHC-ligand prediction by employing a deep language model to learn the sequence determinants for auxiliary processes in MHC-ligand selection, such as proteasome cleavage, that are omitted by existing methods due to the lack of labeled data. Third, we develop machine learning frameworks to model the enrichment of an antibody sequence in phage-panning experiments against a target antigen. We show that antibodies with low specificity can be reduced by a computational procedure using machine learning models trained for multiple targets. Moreover, machine learning can help to design novel antibody sequences with improved affinity.by Haoyang ZengPh. D.Ph.D. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Scienc

    Whole genome WAVE for TF binding

    No full text
    Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2015.Cataloged from PDF version of thesis.Includes bibliographical references (pages 41-44).With the advent of high-throughput sequencing technology, Genome Wide Association Studies (GWAS) have identified thousands of genetic variants that are associated with disease and complex traits. Many of these variants reside in the non-coding region of the genome, and affect gene expression and downstream cellular phenotype by disrupting the regulatory machinery of the cell. For example these variants can alter the binding of the transcription factors (TF). In this thesis we present Whole-genome regulAtory Variant Evaluation (WAVE), a computational method that models the TF binding ChIP-seq signal solely from DNA sequence and predicts genetic a variant's effect on TF binding. Applying WAVE to two important transcription factors, NFnB and CTCF, we show that WAVE accurately predicts ChIP-seq signal on held-out chromosome. WAVE discovers the DNA motif of the target TF as well as the binding co-factors, displaying substantially greater expressiveness in modeling TF binding than conventional motif-based approaches. Furthermore, with AUC larger than 0.7 in the most stringent control scenario, WAVE outperformed existing motif-based approaches in predicting genetic variants associated with allele-specific binding.by Haoyang Zeng.S.M
    • …
    corecore