47 research outputs found

    Framewise phone classification using support vector machines

    Get PDF
    We describe the use of Support Vector Machines for phonetic classification on the TIMIT corpus. Unlike previous work, in which entire phonemes are classified, our system operates in a framewise manner and is intended for use as the front-end of a hybrid system similar to ABBOT. We therefore avoid the problems of classifying variable-length vectors. Our frame-level phone classification accuracy on the complete TIMIT test set is competitive with other results from the literature. In addition, we address the serious problem of scaling Support Vector Machines by using the Kernel Fisher Discriminant

    Quantitative miRNA expression analysis: comparing microarrays with next-generation sequencing

    Get PDF
    Recently, next-generation sequencing has been introduced as a promising, new platform for assessing the copy number of transcripts, while the existing microarray technology is considered less reliable for absolute, quantitative expression measurements. Nonetheless, so far, results from the two technologies have only been compared based on biological data, leading to the conclusion that, although they are somewhat correlated, expression values differ significantly. Here, we use synthetic RNA samples, resembling human microRNA samples, to find that microarray expression measures actually correlate better with sample RNA content than expression measures obtained from sequencing data. In addition, microarrays appear highly sensitive and perform equivalently to next-generation sequencing in terms of reproducibility and relative ratio quantification

    Catalase improves saccharification of lignocellulose by reducing lytic polysaccharide monooxygenase-associated enzyme inactivation

    Get PDF
    Objectives Efficient enzymatic saccharification of plant cell wall material is key to industrial processing of agricultural and forestry waste such as straw and wood chips into fuels and chemicals. Results Saccharification assays were performed on steam-pretreated wheat straw under ambient and O2-deprived environments and in the absence and presence of a lytic polysaccharide monooxygenase (LPMO) and catalase. A kinetic model was used to calculate catalytic rate and first-order inactivation rate constants of the cellulases from reaction progress curves. The addition of a LPMO significantly (P < 0.01, Student’s T test) enhanced the rate of glucose release from 2.8 to 6.9 h−1 under ambient O2 conditions. However, this also significantly (P < 0.01, Student’s T test) increased the rate of inactivation of the enzyme mixture, thereby reducing the performance half-life from 65 to 35 h. Decreasing O2 levels or, strikingly, the addition of catalase significantly reduced (P < 0.01, Student’s T test) enzyme inactivation and, as a consequence, higher efficiency of the cellulolytic enzyme cocktail was achieved. Conclusion Oxidative inactivation of commercial cellulase mixtures is a significant factor influencing the overall saccharification efficiency and the addition of catalase can be used to protect these mixtures from inactivation

    A novel metagenome-derived viral RNA polymerase and its application in a cell-free expression system for metagenome screening

    Get PDF
    The mining of genomes from non-cultivated microorganisms using metagenomics is a powerful tool to discover novel proteins and other valuable biomolecules. However, function-based metagenome searches are often limited by the time-consuming expression of the active proteins in various heterologous host systems. We here report the initial characterization of novel single-subunit bacteriophage RNA polymerase, EM1 RNAP, identified from a metagenome data set obtained from an elephant dung microbiome. EM1 RNAP and its promoter sequence are distantly related to T7 RNA polymerase. Using EM1 RNAP and a translation-competent Escherichia coli extract, we have developed an efficient medium-throughput pipeline and protocol allowing the expression of metagenome-derived genes and the production of proteins in cell-free system is sufficient for the initial testing of the predicted activities. Here, we have successfully identified and verified 12 enzymes acting on bis(2-hydroxyethyl) terephthalate (BHET) in a completely clone-free approach and proposed an in vitro high-throughput metagenomic screening method

    Predicting Class II MHC-Peptide binding: a kernel based approach using similarity scores

    Get PDF
    BACKGROUND: Modelling the interaction between potentially antigenic peptides and Major Histocompatibility Complex (MHC) molecules is a key step in identifying potential T-cell epitopes. For Class II MHC alleles, the binding groove is open at both ends, causing ambiguity in the positional alignment between the groove and peptide, as well as creating uncertainty as to what parts of the peptide interact with the MHC. Moreover, the antigenic peptides have variable lengths, making naive modelling methods difficult to apply. This paper introduces a kernel method that can handle variable length peptides effectively by quantifying similarities between peptide sequences and integrating these into the kernel. RESULTS: The kernel approach presented here shows increased prediction accuracy with a significantly higher number of true positives and negatives on multiple MHC class II alleles, when testing data sets from MHCPEP [1], MCHBN [2], and MHCBench [3]. Evaluation by cross validation, when segregating binders and non-binders, produced an average of 0.824 A(ROC )for the MHCBench data sets (up from 0.756), and an average of 0.96 A(ROC )for multiple alleles of the MHCPEP database. CONCLUSION: The method improves performance over existing state-of-the-art methods of MHC class II peptide binding predictions by using a custom, knowledge-based representation of peptides. Similarity scores, in contrast to a fixed-length, pocket-specific representation of amino acids, provide a flexible and powerful way of modelling MHC binding, and can easily be applied to other dynamic sequence problems

    Retrospective evaluation of whole exome and genome mutation calls in 746 cancer samples

    No full text
    Funder: NCI U24CA211006Abstract: The Cancer Genome Atlas (TCGA) and International Cancer Genome Consortium (ICGC) curated consensus somatic mutation calls using whole exome sequencing (WES) and whole genome sequencing (WGS), respectively. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2,658 cancers across 38 tumour types, we compare WES and WGS side-by-side from 746 TCGA samples, finding that ~80% of mutations overlap in covered exonic regions. We estimate that low variant allele fraction (VAF < 15%) and clonal heterogeneity contribute up to 68% of private WGS mutations and 71% of private WES mutations. We observe that ~30% of private WGS mutations trace to mutations identified by a single variant caller in WES consensus efforts. WGS captures both ~50% more variation in exonic regions and un-observed mutations in loci with variable GC-content. Together, our analysis highlights technological divergences between two reproducible somatic variant detection efforts

    Framewise phone classification using support vector machines

    No full text
    We describe the use of Support Vector Machines for phonetic classification on the TIMIT corpus. Unlike previous work, in which entire phonemes are classified, our system operates in a framewise manner and is intended for use as the front-end of a hybrid system similar to ABBOT. We therefore avoid the problems of classifying variable-length vectors. Our frame-level phone classification accuracy on the complete TIMIT test set is competitive with other results from the literature. In addition, we address the serious problem of scaling Support Vector Machines by using the Kernel Fisher Discriminant. 1
    corecore