14 research outputs found

    A Novel Approach for Predicting Type IV Secretion System (T4SS) Effector Proteins

    No full text
    Thesis (Ph.D.), Computer Science, Washington State UniversityType IV secretion systems (T4SS) are multi-protein complexes in some bacterial pathogens that are used to secrete effector proteins directly into host cells. Upon entry, these effectors manipulate the host cell's machinery, resulting in serious illness or even death of the host. Therefore, identification of T4SS effectors is an important subject in bioinformatics. In recent years, multiple scoring and machine learning-based methods have been suggested for effector prediction. These approaches have used different sets of features, and their predictions have been inconsistent. In this work, first an optimal set of features is presented for predicting T4SS effector proteins using a multi-level feature selection approach. Next we focus on the best way to use these optimal features by designing several machine learning classifiers, comparing the results with those of others, and obtaining de novo results. We chose the pathogen Legionella pneumophila strain Philadelphia-1, a cause of Legionnaires’ disease, for these experiments. An important contribution was the development of a new comprehensive and user-friendly software package called OPT4e for Optimal-features Predictor for T4SS Effector proteins. OPT4e was used to predict candidate effectors from the proteomes of Anaplasma phagocytophilum strains HZ and HGE-1, the causative agent of anaplasmosis in humans, which is currently a very important pathogen for research because of the scarcity of known effectors. OPT4e predicted 48 and 46 candidates for strains HZ and HGE-1, respectively, with 16 and 18 most probable effectors. Two new algorithms, t-Tree and t-Forest, were developed as variations of the decision tree and random forest algorithms. The new algorithms improved the original algorithms by accounting for the relevance of features to the output classes in addition to the standard Gini index when creating split points. Known T4SS effector proteins for L. pneumophila were used to test the new algorithms as well as several variations of these algorithms. Finally, a method for prediction of protein secondary structure using the DAgger algorithm was considered as a possible improvement to OPT4e, and parallelization of PSSM protein profile calculations are presented, tested, and discussed.Washington State University, Computer Scienc

    Diffraction Influence on the Field of View and Resolution of Three-Dimensional Integral Imaging

    No full text

    Using an optimal set of features with a machine learning-based approach to predict effector proteins for Legionella pneumophila

    No full text
    Type IV secretion systems exist in a number of bacterial pathogens and are used to secrete effector proteins directly into host cells in order to change their environment making the environment hospitable for the bacteria. In recent years, several machine learning algorithms have been developed to predict effector proteins, potentially facilitating experimental verification. However, inconsistencies exist between their results. Previously we analysed the disparate sets of predictive features used in these algorithms to determine an optimal set of 370 features for effector prediction. This study focuses on the best way to use these optimal features by designing three machine learning classifiers, comparing our results with those of others, and obtaining de novo results. We chose the pathogen Legionella pneumophila strain Philadelphia-1, a cause of Legionnaires’ disease, because it has many validated effector proteins and others have developed machine learning prediction tools for it. While all of our models give good results indicating that our optimal features are quite robust, Model 1, which uses all 370 features with a support vector machine, has slightly better accuracy. Moreover, Model 1 predicted 472 effector proteins that are deemed highly probable to be effectors and include 94% of known effectors. Although the results of our three models agree well with those of other researchers, their models only predicted 126 and 311 candidate effectors.Published copyAshari, Z., K.A. Brayton, and S. L. Broschat. (2019). Using an optimal set of features with a ma-chine learning-based approach to predict effector proteins for Legionella pneumophila. PLoS ONE, Vol. 14, No. 1, e0202312. doi:10.1371/journal.pone.0202312. PMCID: PMC6347213

    Using an optimal set of features with a machine learning-based approach to predict effector proteins for Legionella pneumophila.

    No full text
    Type IV secretion systems exist in a number of bacterial pathogens and are used to secrete effector proteins directly into host cells in order to change their environment making the environment hospitable for the bacteria. In recent years, several machine learning algorithms have been developed to predict effector proteins, potentially facilitating experimental verification. However, inconsistencies exist between their results. Previously we analysed the disparate sets of predictive features used in these algorithms to determine an optimal set of 370 features for effector prediction. This study focuses on the best way to use these optimal features by designing three machine learning classifiers, comparing our results with those of others, and obtaining de novo results. We chose the pathogen Legionella pneumophila strain Philadelphia-1, a cause of Legionnaires' disease, because it has many validated effector proteins and others have developed machine learning prediction tools for it. While all of our models give good results indicating that our optimal features are quite robust, Model 1, which uses all 370 features with a support vector machine, has slightly better accuracy. Moreover, Model 1 predicted 472 effector proteins that are deemed highly probable to be effectors and include 94% of known effectors. Although the results of our three models agree well with those of other researchers, their models only predicted 126 and 311 candidate effectors

    Prediction of T4SS Effector Proteins for Anaplasma phagocytophilum Using OPT4e, A New Software Tool

    No full text
    Type IV secretion systems (T4SS) are used by a number of bacterial pathogens to attack the host cell. The complex protein structure of the T4SS is used to directly translocate effector proteins into host cells, often causing fatal diseases in humans and animals. Identification of effector proteins is the first step in understanding how they function to cause virulence and pathogenicity. Accurate prediction of effector proteins via a machine learning approach can assist in the process of their identification. The main goal of this study is to predict a set of candidate effectors for the tick-borne pathogen Anaplasma phagocytophilum, the causative agent of anaplasmosis in humans. To our knowledge, we present the first computational study for effector prediction with a focus on A. phagocytophilum. In a previous study, we systematically selected a set of optimal features from more than 1,000 possible protein characteristics for predicting T4SS effector candidates. This was followed by a study of the features using the proteome of Legionella pneumophila strain Philadelphia deduced from its complete genome. In this manuscript we introduce the OPT4e software package for Optimal-features Predictor for T4SS Effector proteins. An earlier version of OPT4e was verified using cross-validation tests, accuracy tests, and comparison with previous results for L. pneumophila. We use OPT4e to predict candidate effectors from the proteomes of A. phagocytophilum strains HZ and HGE-1 and predict 48 and 46 candidates, respectively, with 16 and 18 deemed most probable as effectors. These latter include the three known validated effectors for A. phagocytophilum.Published copyEsna Ashari, Z., K.A. Brayton, and S.L. Broschat. (2019). Prediction of T4SS effector proteins for Anaplasma phagocytophilum using OPT4e, a new software tool. Frontiers in Microbiology, Vol.10. doi:10.3389/fmicb.2019.01391. PMCID: PMC6598457.First publication by Frontiers Medi

    Memory-Aware Active Learning in Mobile Sensing Systems

    No full text

    An optimal set of features for predicting type IV secretion system effector proteins for a subset of species based on a multi-level feature selection approach

    No full text
    <div><p>Type IV secretion systems (T4SS) are multi-protein complexes in a number of bacterial pathogens that can translocate proteins and DNA to the host. Most T4SSs function in conjugation and translocate DNA; however, approximately 13% function to secrete proteins, delivering effector proteins into the cytosol of eukaryotic host cells. Upon entry, these effectors manipulate the host cell’s machinery for their own benefit, which can result in serious illness or death of the host. For this reason recognition of T4SS effectors has become an important subject. Much previous work has focused on verifying effectors experimentally, a costly endeavor in terms of money, time, and effort. Having good predictions for effectors will help to focus experimental validations and decrease testing costs. In recent years, several scoring and machine learning-based methods have been suggested for the purpose of predicting T4SS effector proteins. These methods have used different sets of features for prediction, and their predictions have been inconsistent. In this paper, an optimal set of features is presented for predicting T4SS effector proteins using a statistical approach. A thorough literature search was performed to find features that have been proposed. Feature values were calculated for datasets of known effectors and non-effectors for T4SS-containing pathogens for four genera with a sufficient number of known effectors, <i>Legionella pneumophila</i>, <i>Coxiella burnetii</i>, <i>Brucella</i> spp, and <i>Bartonella</i> spp. The features were ranked, and less important features were filtered out. Correlations between remaining features were removed, and dimensional reduction was accomplished using principal component analysis and factor analysis. Finally, the optimal features for each pathogen were chosen by building logistic regression models and evaluating each model. The results based on evaluation of our logistic regression models confirm the effectiveness of our four optimal sets of features, and based on these an optimal set of features is proposed for all T4SS effector proteins.</p></div
    corecore