63 research outputs found
Improving recognition accuracy on CVSD speech under mismatched conditions
Emerging technology in mobile communications is seeing increasingly high acceptance as a preferred choice for last-mile communication. There have been a wide range of techniques to achieve signal compression to suit to the smaller bandwidths available on mobile communication channels; but speech recognition methods have seen success mostly only in controlled speech environments. However, designing of speech recognition systems for mobile communications is crucial in order to provide voice enabled command and control and for applications like Mobile Voice Commerce. Continuously Variable Slope Delta (CVSD) modulation, a technique for low bitrate coding of speech, has been in use particularly in military wireless environments for over 30 years, and is now also adopted by BlueTooth. CVSD is particularly suitable for Internet and mobile environments due to its robustness against transmission errors, and simplicity of implementation and the absence of a need for synchronization. In this paper, we study some characteristics of the CVSD speech in the context of robust recognition of compressed speech, and present two methods of improving the recognition accuracy in Automatic Speech Recognition (ASR) systems. We study the characteristics of the features extracted for ASR and how they relate to the corresponding features computed from Pulse Coded Modulation (PCM) speech and apply this relation to correct the CVSD features to improve recognition accuracy. Secondly we show that the ASR done on bit-streams directly, gives a good recognition accuracy and when combined with our approach gives a better accuracy
Demonstration Study: A Protocol to Combine Online Tools and Databases for Identifying Potentially Repurposable Drugs
Traditional methods for discovery and development of new drugs can be very time-consuming and expensive processes because they include several stages, such as compound identification, pre-clinical and clinical trials before the drug is approved by the U.S. Food and Drug Administration (FDA). Therefore, drug repurposing, namely using currently FDA-approved drugs as therapeutics for other diseases than what they are originally prescribed for, is emerging to be a faster and more cost-effective alternative to current drug discovery methods. In this paper, we have described a three-step in silico protocol for analyzing transcriptomics data using online databases and bioinformatics tools for identifying potentially repurposable drugs. The efficacy of this protocol was evaluated by comparing its predictions with the findings of two case studies of recently reported repurposed drugs: HIV treating drug zidovudine for the treatment of dry age-related macular degeneration and the antidepressant imipramine for small-cell lung carcinoma. The proposed protocol successfully identified the published findings, thus demonstrating the efficacy of this method. In addition, it also yielded several novel predictions that have not yet been published, including the finding that imipramine could potentially treat Severe Acute Respiratory Syndrome (SARS), a disease that currently does not have any treatment or vaccine. Since this in silico protocol is simple to use and does not require advanced computer skills, we believe any motivated participant with access to these databases and tools would be able to apply it to large datasets to identify other potentially repurposable drugs in the futur
Transmembrane helix prediction using amino acid property features and latent semantic analysis
Prediction of transmembrane (TM) helices by statistical methods suffers from lack of sufficient training data. Current best methods use hundreds or even thousands of free parameters in their models which are tuned to fit the little data available for training. Further, they are often restricted to the generally accepted topology "cytoplasmic-transmembrane-extracellular" and cannot adapt to membrane proteins that do not conform to this topology. Recent crystal structures of channel proteins have revealed novel architectures showing that the above topology may not be as universal as previously believed. Thus, there is a need for methods that can better predict TM helices even in novel topologies and families
A pilot study on the prevalence of DNA palindromes in breast cancer genomes
Background
DNA palindromes are a unique pattern of repeat sequences that are present in the human genome. It consists of a sequence of nucleotides in which the second half is the complement of the first half but appearing in reverse order. These palindromic sequences may have a significant role in DNA replication, transcription and gene regulation processes. They occur frequently in human cancers by clustering at specific locations of the genome that undergo gene amplification and tumorigenesis. Moreover, some studies showed that palindromes are clustered in amplified regions of breast cancer genomes especially in chromosomes (chr) 8 and 11. With the large number of personal genomes and cancer genomes becoming available, it is now possible to study their association to diseases using computational methods. Here, we conducted a pilot study on chromosomes 8 and 11 of cancer genomes to identify computationally the differentially occurring palindromes.
Methods
We processed 69 breast cancer genomes from The Cancer Genome Atlas including serum-normal and tumor genomes, and 1000 Genomes to serve as control group. The Biological Language Modelling Toolkit (BLMT) computes palindromes in whole genomes. We developed a computational pipeline integrating BLMT to compute and compare prevalence of palindromes in personal genomes.
Results
We carried out a pilot study on chr 8 and chr 11 taking into account single nucleotide polymorphisms, insertions and deletions. Of all the palindromes that showed any variation in cancer genomes, 38% of what were near breast cancer genes happened to be the most differentiated palindromes in tumor (i.e. they ranked among the top 25% by our heuristic measure).
Conclusions
These results will shed light on the prevalence of palindromes in oncogenes and the mutations that are present in the palindromic regions that could contribute to genomic rearrangements, and breast cancer progression
Active machine learning for transmembrane helix prediction
Abstract Background About 30% of genes code for membrane proteins, which are involved in a wide variety of crucial biological functions. Despite their importance, experimentally determined structures correspond to only about 1.7% of protein structures deposited in the Protein Data Bank due to the difficulty in crystallizing membrane proteins. Algorithms that can identify proteins whose high-resolution structure can aid in predicting the structure of many previously unresolved proteins are therefore of potentially high value. Active machine learning is a supervised machine learning approach which is suitable for this domain where there are a large number of sequences but only very few have known corresponding structures. In essence, active learning seeks to identify proteins whose structure, if revealed experimentally, is maximally predictive of others. Results An active learning approach is presented for selection of a minimal set of proteins whose structures can aid in the determination of transmembrane helices for the remaining proteins. TMpro, an algorithm for high accuracy TM helix prediction we previously developed, is coupled with active learning. We show that with a well-designed selection procedure, high accuracy can be achieved with only few proteins. TMpro, trained with a single protein achieved an F-score of 94% on benchmark evaluation and 91% on MPtopo dataset, which correspond to the state-of-the-art accuracies on TM helix prediction that are achieved usually by training with over 100 training proteins. Conclusion Active learning is suitable for bioinformatics applications, where manually characterized data are not a comprehensive representation of all possible data, and in fact can be a very sparse subset thereof. It aids in selection of data instances which when characterized experimentally can improve the accuracy of computational characterization of remaining raw data. The results presented here also demonstrate that the feature extraction method of TMpro is well designed, achieving a very good separation between TM and non TM segments
Mycobacterium tuberculosis and Clostridium difficille interactomes: demonstration of rapid development of computational system for bacterial interactome prediction
Background\ud
Protein-protein interaction (PPI) networks (interactomes) of most organisms, except for some model organisms, are largely unknown. Experimental methods including high-throughput techniques are highly resource intensive. Therefore, computational discovery of PPIs can accelerate biological discovery by presenting "most-promising" pairs of proteins that are likely to interact. For many bacteria, genome sequence, and thereby genomic context of proteomes, is readily available; additionally, for some of these proteomes, localization and functional annotations are also available, but interactomes are not available. We present here a method for rapid development of computational system to predict interactome of bacterial proteomes. While other studies have presented methods to transfer interologs across species, here, we propose transfer of computational models to benefit from cross-species annotations, thereby predicting many more novel interactions even in the absence of interologs. Mycobacterium tuberculosis (Mtb) and Clostridium difficile (CD) have been used to demonstrate the work.\ud
\ud
Results\ud
We developed a random forest classifier over features derived from Gene Ontology annotations and genetic context scores provided by STRING database for predicting Mtb and CD interactions independently. The Mtb classifier gave a precision of 94% and a recall of 23% on a held out test set. The Mtb model was then run on all the 8 million protein pairs of the Mtb proteome, resulting in 708 new interactions (at 94% expected precision) or 1,595 new interactions at 80% expected precision. The CD classifier gave a precision of 90% and a recall of 16% on a held out test set. The CD model was run on all the 8 million protein pairs of the CD proteome, resulting in 143 new interactions (at 90% expected precision) or 580 new interactions (at 80% expected precision). We also compared the overlap of predictions of our method with STRING database interactions for CD and Mtb and also with interactions identified recently by a bacterial 2-hybrid system for Mtb. To demonstrate the utility of transfer of computational models, we made use of the developed Mtb model and used it to predict CD protein-pairs. The cross species model thus developed yielded a precision of 88% at a recall of 8%. To demonstrate transfer of features from other organisms in the absence of feature-based and interaction-based information, we transferred missing feature values from Mtb orthologs into the CD data. In transferring this data from orthologs (not interologs), we showed that a large number of interactions can be predicted.\ud
\ud
Conclusions\ud
Rapid discovery of (partial) bacterial interactome can be made by using existing set of GO and STRING features associated with the organisms. We can make use of cross-species interactome development, when there are not even sufficient known interactions to develop a computational prediction system. Computational model of well-studied organism(s) can be employed to make the initial interactome prediction for the target organism. We have also demonstrated successfully, that annotations can be transferred from orthologs in well-studied organisms enabling accurate predictions for organisms with no annotations. These approaches can serve as building blocks to address the challenges associated with feature coverage, missing interactions towards rapid interactome discovery for bacterial organisms.\ud
\ud
Availability\ud
The predictions for all Mtb and CD proteins are made available at: http://severus.dbmi.pitt.edu/TB and http://severus.dbmi.pitt.edu/CD respectively for browsing as well as for download
Malignant Pleural Mesothelioma Interactome with 364 Novel Protein-Protein Interactions
Malignant pleural mesothelioma (MPM) is an aggressive cancer affecting the outer lining of the lung, with a median survival of less than one year. We constructed an ‘MPM interactome’ with over 300 computationally predicted protein-protein interactions (PPIs) and over 2400 known PPIs of 62 literature-curated genes whose activity affects MPM. Known PPIs of the 62 MPM associated genes were derived from Biological General Repository for Interaction Datasets (BioGRID) and Human Protein Reference Database (HPRD). Novel PPIs were predicted by applying the HiPPIP algorithm, which computes features of protein pairs such as cellular localization, molecular function, biological process membership, genomic location of the gene, and gene expression in microarray experiments, and classifies the pairwise features as interacting or non-interacting based on a random forest model. We validated five novel predicted PPIs experimentally. The interactome is significantly enriched with genes differentially ex-pressed in MPM tumors compared with normal pleura and with other thoracic tumors, genes whose high expression has been correlated with unfavorable prognosis in lung cancer, genes differentially expressed on crocidolite exposure, and exosome-derived proteins identified from malignant mesothelioma cell lines. 28 of the interactors of MPM proteins are targets of 147 U.S. Food and Drug Administration (FDA)-approved drugs. By comparing disease-associated versus drug-induced differential expression profiles, we identified five potentially repurposable drugs, namely cabazitaxel, primaquine, pyrimethamine, trimethoprim and gliclazide. Preclinical studies may be con-ducted in vitro to validate these computational results. Interactome analysis of disease-associated genes is a powerful approach with high translational impact. It shows how MPM-associated genes identified by various high throughput studies are functionally linked, leading to clinically translatable results such as repurposed drugs. The PPIs are made available on a webserver with interactive user interface, visualization and advanced search capabilities
- …