124 research outputs found

    On the pH-optimum of Activity and Stability of Proteins

    Get PDF
    Biological macromolecules evolved to perform their function in specific cellular environment (subcellular compartments or tissues); therefore, they should be adapted to the biophysical characteristics of the corresponding environment, one of them being the characteristic pH. Many macromolecular properties are pH dependent, such as activity and stability. However, only activity is biologically important, while stability may not be crucial for the corresponding reaction. Here, we show that the pH-optimum of activity (the pH of maximal activity) is correlated with the pH-optimum of stability (the pH of maximal stability) on a set of 310 proteins with available experimental data. We speculate that such a correlation is needed to allow the corresponding macromolecules to tolerate small pH fluctuations that are inevitable with cellular function. Our findings rationalize the efforts of correlating the pH of maximal stability and the characteristic pH of subcellular compartments, as only pH of activity is subject of evolutionary pressure. In addition, our analysis confirmed the previous observation that pH-optimum of activity and stability are not correlated with the isoelectric point, pI, or with the optimal temperature

    Wavelet Transform-Based Phylogenetic Analysis of Protein Sequences

    Get PDF
    With the acceleration of gene sequencing studies, many biological data emerges. By analyzing these data, it contributes greatly to the studies on understanding the metabolic disorders in the organism and increasing the efficiency of the drugs. For this purpose, it is critical to classify the data in a way that is accurate, fast and low-cost according to its characteristics and relationships. Besides experimental methods, machine learning and bioinformatics methods are used. Artificial neural networks, support vector machines, flexible calculation methods are frequently used methods. However, the effectiveness of these methods on biosecence data depends on the method of using the method with the most appropriate parameters and converting protein sequences into numerical sequences. When the sequences are transformed with amino acid frequencies, the properties of amino acids are ignored. For this purpose, handling the physicochemical (hydrophobicity, hydrophilicity ...) properties of amino acids increases the performance of classification techniques. The phylogenetic tree is the best method to visualize the classification among species. In the project, the wavelet transform used in the analysis of digital signals has been adapted to protein sequences defined by hydrophobicity values. Each protein sequence was defined to correspond to a signal, the wavelet transform was divided into approach and detail components, and the similarities between them were calculated, and the phylogenetic tree of the species was created. As an application, phylogenetic trees of ND5 protein sequences of 22 species were created in the MatlabR2017 program of NeighborJoining (NJ) and Unweighed Pair Group Method of Aritmetic Averages (UPGMA) methods

    Prediction of eukaryotic protein subcellular multi- localisation with a combined KNN-SVM ensemble classifier

    Get PDF
    Proteins may exist in or shift among two or more different subcellular locations, and this phenomenon is closely related to biological function. It is challenging to deal with multiple locations during eukaryotic protein subcellular localisation prediction with routine methods; therefore, a reliable and automatic ensemble classifier for protein subcellular localisation is needed. We propose a new ensemble classifier combined with the KNN (K-nearest neighbour) and SVM (support vector machine) algorithms to predict the subcellular localisation of eukaryotic proteins from the GO (gene ontology) annotations. This method was developed by fusing basic individual classifiers through a voting system. The overall prediction accuracies thus obtained via the jackknife test and resubstitution test were 70.5 and 77.6% for eukaryotic proteins respectively, which are significantly higher than other methods presented in the previous studies and reveal that our strategy better predicts eukaryotic protein subcellular localisation

    Decision fusion in healthcare and medicine : a narrative review

    Get PDF
    Objective: To provide an overview of the decision fusion (DF) technique and describe the applications of the technique in healthcare and medicine at prevention, diagnosis, treatment and administrative levels. Background: The rapid development of technology over the past 20 years has led to an explosion in data growth in various industries, like healthcare. Big data analysis within the healthcare systems is essential for arriving to a value-based decision over a period of time. Diversity and uncertainty in big data analytics have made it impossible to analyze data by using conventional data mining techniques and thus alternative solutions are required. DF is a form of data fusion techniques that could increase the accuracy of diagnosis and facilitate interpretation, summarization and sharing of information. Methods: We conducted a review of articles published between January 1980 and December 2020 from various databases such as Google Scholar, IEEE, PubMed, Science Direct, Scopus and web of science using the keywords decision fusion (DF), information fusion, healthcare, medicine and big data. A total of 141 articles were included in this narrative review. Conclusions: Given the importance of big data analysis in reducing costs and improving the quality of healthcare; along with the potential role of DF in big data analysis, it is recommended to know the full potential of this technique including the advantages, challenges and applications of the technique before its use. Future studies should focus on describing the methodology and types of data used for its applications within the healthcare sector

    The Emerging Trends of Multi-Label Learning

    Full text link
    Exabytes of data are generated daily by humans, leading to the growing need for new efforts in dealing with the grand challenges for multi-label learning brought by big data. For example, extreme multi-label classification is an active and rapidly growing research area that deals with classification tasks with an extremely large number of classes or labels; utilizing massive data with limited supervision to build a multi-label classification model becomes valuable for practical applications, etc. Besides these, there are tremendous efforts on how to harvest the strong learning capability of deep learning to better capture the label dependencies in multi-label learning, which is the key for deep learning to address real-world classification tasks. However, it is noted that there has been a lack of systemic studies that focus explicitly on analyzing the emerging trends and new challenges of multi-label learning in the era of big data. It is imperative to call for a comprehensive survey to fulfill this mission and delineate future research directions and new applications.Comment: Accepted to TPAMI 202

    Bioinformatics Applications Based On Machine Learning

    Get PDF
    The great advances in information technology (IT) have implications for many sectors, such as bioinformatics, and has considerably increased their possibilities. This book presents a collection of 11 original research papers, all of them related to the application of IT-related techniques within the bioinformatics sector: from new applications created from the adaptation and application of existing techniques to the creation of new methodologies to solve existing problems

    In silico analysis of mitochondrial proteins

    Get PDF
    Le rôle important joué par la mitochondrie dans la cellule eucaryote est admis depuis longtemps. Cependant, la composition exacte des mitochondries, ainsi que les processus biologiques qui sy déroulent restent encore largement inconnus. Deux facteurs principaux permettent dexpliquer pourquoi létude des mitochondries progresse si lentement : le manque defficacité des méthodes didentification des protéines mitochondriales et le manque de précision dans lannotation de ces protéines. En conséquence, nous avons développé un nouvel outil informatique, YimLoc, qui permet de prédire avec succès les protéines mitochondriales à partir des séquences génomiques. Cet outil intègre plusieurs indicateurs existants, et sa performance est supérieure à celle des indicateurs considérés individuellement. Nous avons analysé environ 60 génomes fongiques avec YimLoc afin de lever la controverse concernant la localisation de la bêta-oxydation dans ces organismes. Contrairement à ce qui était généralement admis, nos résultats montrent que la plupart des groupes de Fungi possèdent une bêta-oxydation mitochondriale. Ce travail met également en évidence la diversité des processus de bêta-oxydation chez les champignons, en corrélation avec leur utilisation des acides gras comme source dénergie et de carbone. De plus, nous avons étudié le composant clef de la voie de bêta-oxydation mitochondriale, lacyl-CoA déshydrogénase (ACAD), dans 250 espèces, couvrant les 3 domaines de la vie, en combinant la prédiction de la localisation subcellulaire avec la classification en sous-familles et linférence phylogénétique. Notre étude suggère que les gènes ACAD font partie dune ancienne famille qui a adopté des stratégies évolutionnaires innovatrices afin de générer un large ensemble denzymes susceptibles dutiliser la plupart des acides gras et des acides aminés. Finalement, afin de permettre la prédiction de protéines mitochondriales à partir de données autres que les séquences génomiques, nous avons développé le logiciel TESTLoc qui utilise comme données des Expressed Sequence Tags (ESTs). La performance de TESTLoc est significativement supérieure à celle de tout autre outil de prédiction connu. En plus de fournir deux nouveaux outils de prédiction de la localisation subcellulaire utilisant différents types de données, nos travaux démontrent comment lassociation de la prédiction de la localisation subcellulaire à dautres méthodes danalyse in silico permet daméliorer la connaissance des protéines mitochondriales. De plus, ces travaux proposent des hypothèses claires et faciles à vérifier par des expériences, ce qui présente un grand potentiel pour faire progresser nos connaissances des métabolismes mitochondriaux.The important role of mitochondria in the eukaryotic cell has long been appreciated, but their exact composition and the biological processes taking place in mitochondria are not yet fully understood. The two main factors that slow down the progress in this field are inefficient recognition and imprecise annotation of mitochondrial proteins. Therefore, we developed a new computational tool, YimLoc, which effectively predicts mitochondrial proteins from genomic sequences. This tool integrates the strengths of existing predictors and yields higher performance than any individual predictor. We applied YimLoc to ~60 fungal genomes in order to address the controversy about the localization of beta oxidation in these organisms. Our results show that in contrast to previous studies, most fungal groups do possess mitochondrial beta oxidation. This work also revealed the diversity of beta oxidation in fungi, which correlates with their utilization of fatty acids as energy and carbon sources. Further, we conducted an investigation of the key component of the mitochondrial beta oxidation pathway, the acyl-CoA dehydrogenase (ACAD). We combined subcellular localization prediction with subfamily classification and phylogenetic inference of ACAD enzymes from 250 species covering all three domains of life. Our study suggests that ACAD genes are an ancient family with innovative evolutionary strategies to generate a large enzyme toolset for utilizing most diverse fatty acids and amino acids. Finally, to enable the prediction of mitochondrial proteins from data beyond genome sequences, we designed the tool TESTLoc that uses expressed sequence tags (ESTs) as input. TESTLoc performs significantly better than known tools. In addition to providing two new tools for subcellular localization designed for different data, our studies demonstrate the power of combining subcellular localization prediction with other in silico analyses to gain insights into the function of mitochondrial proteins. Most importantly, this work proposes clear hypotheses that are easily testable, with great potential for advancing our knowledge of mitochondrial metabolism

    Particle Filtering Methods for Subcellular Motion Analysis

    Get PDF
    Advances in fluorescent probing and microscopic imaging technology have revolutionized biology in the past decade and have opened the door for studying subcellular dynamical processes. However, accurate and reproducible methods for processing and analyzing the images acquired for such studies are still lacking. Since manual image analysis is time consuming, potentially inaccurate, and poorly reproducible, many biologically highly relevant questions are either left unaddressed, or are answered with great uncertainty. The subject of this thesis is particle filtering methods and their application for multiple object tracking in different biological imaging applications. Particle filtering is a technique for implementing recursive Bayesian filtering by Monte Carlo sampling. A fundamental concept behind the Bayesian approach for performing inference is the possibility to encode the information about the imaging system, possible noise sources, and the system dynamics in terms of probability density functions. In this thesis, a set of novel PF based metho

    Computational Optimizations for Machine Learning

    Get PDF
    The present book contains the 10 articles finally accepted for publication in the Special Issue “Computational Optimizations for Machine Learning” of the MDPI journal Mathematics, which cover a wide range of topics connected to the theory and applications of machine learning, neural networks and artificial intelligence. These topics include, among others, various types of machine learning classes, such as supervised, unsupervised and reinforcement learning, deep neural networks, convolutional neural networks, GANs, decision trees, linear regression, SVM, K-means clustering, Q-learning, temporal difference, deep adversarial networks and more. It is hoped that the book will be interesting and useful to those developing mathematical algorithms and applications in the domain of artificial intelligence and machine learning as well as for those having the appropriate mathematical background and willing to become familiar with recent advances of machine learning computational optimization mathematics, which has nowadays permeated into almost all sectors of human life and activity
    corecore