1,633 research outputs found

    WAVELET ANALYSIS OF SHORT GLOBULAR HOMOLOGOUS PROTEINS IN MESOPHILE AND THERMOPHILE PROKARYOTES

    Get PDF
    This study looked to identify features related to thermal stability and function in the amino acid chains of short globular proteins from mesophile and thermophile species, within the constraint that the protein fold to perform a speci_c function. To do so 540 homologous pairs of proteins were studied. The amino acid chains were con-verted to hydrophobicity signals by assigning a hydropathy score to each residue in the polypeptide. The hydrophobicity signals were passed through a wavelet packet transform and the resulting spectra analyzed. Bootstrapping was used to gener-ate a control data set to determine if the true ordering of amino acids codes for a non-random uctuation in hydropathy along the length of the polypeptide. A method to relate the spectral characteristics to the function of a protein making use of gene ontologies was developed as a proof of concept. As a group, mesophile and thermophile proteins have very similar total power. However, on a protein-to-protein basis the thermophile contains a greater total power in 489 of the 540 pairs (90.56%). The hydrophobicity scale used in this study is strongly correlated with Gibbs free energy. The total power of a protein is also strongly correlated to the Gibbs free energy, so that the thermophile protein contains a greater free energy than its corresponding mesophile partner. It has been noted in the experimental literature that thermophile proteins are stabilized by increasing their Gibbs free en-ergy. The statistical measures skew and kurtosis were adapted so that a spectrum of skew and kurtosis values were generated for each protein. These values indicate that the uctuation in hydropathy is non random and position dependent. Thermophile proteins have larger power at frequency bands 21 through 31 (average intervals of 100 to 77 amino acids), and 44 to 56 (on average 46 to 19 amino acids), which may contribute to their having greater total power in 90.56% of the pairs. Increases to the uctuation in hydropathy within certain lengths throughout the total amino acid chain of a protein may be a means of raising the temperature at which a protein denatures

    Feature-based time-series analysis

    Full text link
    This work presents an introduction to feature-based time-series analysis. The time series as a data type is first described, along with an overview of the interdisciplinary time-series analysis literature. I then summarize the range of feature-based representations for time series that have been developed to aid interpretable insights into time-series structure. Particular emphasis is given to emerging research that facilitates wide comparison of feature-based representations that allow us to understand the properties of a time-series dataset that make it suited to a particular feature-based representation or analysis algorithm. The future of time-series analysis is likely to embrace approaches that exploit machine learning methods to partially automate human learning to aid understanding of the complex dynamical patterns in the time series we measure from the world.Comment: 28 pages, 9 figure

    Network and multi-scale signal analysis for the integration of large omic datasets: applications in \u3ci\u3ePopulus trichocarpa\u3c/i\u3e

    Get PDF
    Poplar species are promising sources of cellulosic biomass for biofuels because of their fast growth rate, high cellulose content and moderate lignin content. There is an increasing movement on integrating multiple layers of ’omics data in a systems biology approach to understand gene-phenotype relationships and assist in plant breeding programs. This dissertation involves the use of network and signal processing techniques for the combined analysis of these various data types, for the goals of (1) increasing fundamental knowledge of P. trichocarpa and (2) facilitating the generation of hypotheses about target genes and phenotypes of interest. A data integration “Lines of Evidence” method is presented for the identification and prioritization of target genes involved in functions of interest. A new post-GWAS method, Pleiotropy Decomposition, is presented, which extracts pleiotropic relationships between genes and phenotypes from GWAS results, allowing for identification of genes with signatures favorable to genome editing. Continuous wavelet transform signal processing analysis is applied in the characterization of genome distributions of various features (including variant density, gene density, and methylation profiles) in order to identify chromosome structures such as the centromere. This resulted in the approximate centromere locations on all P. trichocarpa chromosomes, which had previously not been adequately reported in the scientific literature. Discrete wavelet transform signal processing followed by correlation analysis was applied to genomic features from various data types including transposable element density, methylation density, SNP density, gene density, centromere position and putative ancestral centromere position. Subsequent correlation analysis of the resulting wavelet coefficients identified scale-specific relationships between these genomic features, and provide insights into the evolution of the genome structure of P. trichocarpa. These methods have provided strategies to both increase fundamental knowledge about the P. trichocarpa system, as well as to identify new target genes related to biofuels targets. We intend that these approaches will ultimately be used in the designing of better plants for more efficient and sustainable production of bioenergy

    Genome-Wide Discovery of Somatic Regulatory Variants in Diffuse Large B-Cell Lymphoma

    Get PDF
    Diffuse large B-cell lymphoma (DLBCL) is an aggressive cancer originating from mature B-cells. Prognosis is strongly associated with molecular subgroup, although the driver mutations that distinguish the two main subgroups remain poorly defined. Through an integrative analysis of whole genomes, exomes, and transcriptomes, we have uncovered genes and non-coding loci that are commonly mutated in DLBCL. Our analysis has identified novel cis-regulatory sites, and implicates recurrent mutations in the 3′ UTR of NFKBIZ as a novel mechanism of oncogene deregulation and NF-κB pathway activation in the activated B-cell (ABC) subgroup. Small amplifications associated with over-expression of FCGR2B (the Fcγ receptor protein IIB), primarily in the germinal centre B-cell (GCB) subgroup, correlate with poor patient outcomes suggestive of a novel oncogene. These results expand the list of subgroup driver mutations that may facilitate implementation of improved diagnostic assays and could offer new avenues for the development of targeted therapeutics.&nbsp

    Modified Dynamic Time Warping for Hierarchical Clustering

    Get PDF
    Time series clustering is the process of grouping sequential correspondences in similar clusters. The key feature behind clustering time series data lies on the similarity/distance function used to identify the sequential matches. Dynamic Time Warping (DTW) is one of the common distance measures that have demonstrated competitive results compared to other functions. DTW aims to find the shortest path in the process of identifying sequential matches. DTW relies on dynamic programming to obtain the shortest path where the smaller distance is being computed. However, in the case of equivalent distances, DTW is selecting the path randomly. Hence, the selection could be misguided in such randomization process, which significantly affects the matching quality. This is due to randomization may lead to the longer path which drifts from obtaining the optimum path. This paper proposes a modified DTW that aims to enhance the dynamic selection of the shortest path when handling equivalent distances. Experiments were conducted using twenty UCR benchmark datasets. Also, the proposed modified DTW result has been compared with the state of the art competitive distance measures which is based on precision, recall and f-measure including the original DTW, Minkowski distance measure and Euclidean distance measure. The results showed that the proposed modified DTW reveal superior results in compared to the standard DTW, either using Minkowski or Euclidean. This can demonstrate the effectiveness of the proposed modification in which optimizing the shortest path has enhanced the performance of clustering. The proposed modified DTW can be used for having good clustering method for any time series data

    A visual analytics approach to feature discovery and subspace exploration in protein flexibility matrices

    Get PDF
    The vast amount of information generated by domain scientists makes the transi- tion from data to knowledge difficult and often impedes important discoveries. For example, the knowledge gained from protein flexibility data sets can speed advances in genetic therapies and drug discovery. However, these models generate so much data that large scale analysis by traditional methods is almost impossible. This hinders biomedical advances. Visual analytics is a new field that can help alleviate this problem. Visual analytics attempts to seamlessly integrate human abilities in pattern recognition, domain knowledge, and synthesis with automatic analysis techniques. I propose a novel, visual analytics pipeline and prototype which eases discovery, com- parison, and exploration in the outputs of complex computational biology datasets. The approach utilizes automatic feature extraction by image segmentation to locate regions of interest in the data, visually presents the features to users in an intuitive way, and provides rich interactions for multi-resolution visual exploration. Functional- ity is also provided for subspace exploration based on automatic similarity calculation and comparative visualizations. The effectiveness of feature discovery and subspace exploration is shown through a user study and user scenarios. Feedback from analysts confirms the suitability of the proposed solution to domain tasks

    Genomics and proteomics: a signal processor's tour

    Get PDF
    The theory and methods of signal processing are becoming increasingly important in molecular biology. Digital filtering techniques, transform domain methods, and Markov models have played important roles in gene identification, biological sequence analysis, and alignment. This paper contains a brief review of molecular biology, followed by a review of the applications of signal processing theory. This includes the problem of gene finding using digital filtering, and the use of transform domain methods in the study of protein binding spots. The relatively new topic of noncoding genes, and the associated problem of identifying ncRNA buried in DNA sequences are also described. This includes a discussion of hidden Markov models and context free grammars. Several new directions in genomic signal processing are briefly outlined in the end

    Hidden Markov Models

    Get PDF
    Hidden Markov Models (HMMs), although known for decades, have made a big career nowadays and are still in state of development. This book presents theoretical issues and a variety of HMMs applications in speech recognition and synthesis, medicine, neurosciences, computational biology, bioinformatics, seismology, environment protection and engineering. I hope that the reader will find this book useful and helpful for their own research
    • …
    corecore