14 research outputs found
A Novel Approach for Predicting Type IV Secretion System (T4SS) Effector Proteins
Thesis (Ph.D.), Computer Science, Washington State UniversityType IV secretion systems (T4SS) are multi-protein complexes in some bacterial pathogens that are used to secrete effector proteins directly into host cells. Upon entry, these effectors manipulate the host cell's machinery, resulting in serious illness or even death of the host. Therefore, identification of T4SS effectors is an important subject in bioinformatics. In recent years, multiple scoring and machine learning-based methods have been suggested for effector prediction. These approaches have used different sets of features, and their predictions have been inconsistent. In this work, first an optimal set of features is presented for predicting T4SS effector proteins using a multi-level feature selection approach. Next we focus on the best way to use these optimal features by designing several machine learning classifiers, comparing the results with those of others, and obtaining de novo results. We chose the pathogen Legionella pneumophila strain Philadelphia-1, a cause of Legionnaires’ disease, for these experiments. An important contribution was the development of a new comprehensive and user-friendly software package called OPT4e for Optimal-features Predictor for T4SS Effector proteins. OPT4e was used to predict candidate effectors from the proteomes of Anaplasma phagocytophilum strains HZ and HGE-1, the causative agent of anaplasmosis in humans, which is currently a very important pathogen for research because of the scarcity of known effectors. OPT4e predicted 48 and 46 candidates for strains HZ and HGE-1, respectively, with 16 and 18 most probable effectors. Two new algorithms, t-Tree and t-Forest, were developed as variations of the decision tree and random forest algorithms. The new algorithms improved the original algorithms by accounting for the relevance of features to the output classes in addition to the standard Gini index when creating split points. Known T4SS effector proteins for L. pneumophila were used to test the new algorithms as well as several variations of these algorithms. Finally, a method for prediction of protein secondary structure using the DAgger algorithm was considered as a possible improvement to OPT4e, and parallelization of PSSM protein profile calculations are presented, tested, and discussed.Washington State University, Computer Scienc
Recommended from our members
Mindful active learning
We propose a novel active learning framework for activity recognition using wearable sensors. Our work is unique in that it takes limitations of the oracle into account when selecting sensor data for annotation by the oracle. Our approach is inspired by human-beings' limited capacity to respond to prompts on their mobile device. This capacity constraint is manifested not only in the number of queries that a person can respond to in a given time-frame but also in the time lag between the query issuance and the oracle response. We introduce the notion of mindful active learning and propose a computational framework, called EMMA, to maximize the active learning performance taking informativeness of sensor data, query budget, and human memory into account. We formulate this optimization problem, propose an approach to model memory retention, discuss complexity of the problem, and propose a greedy heuristic to solve the optimization problem. Additionally, we design an approach to perform mindful active learning in batch mode. We demonstrate the effectiveness of our approach using three publicly available activity datasets. We show that the activity recognition accuracy ranges from 21% to 97% depending on memory strength, query budget, and difficulty of the machine learning task. Our results also indicate that EMMA achieves an accuracy level that is, on average, 13.5% higher than the case when only informativeness of the sensor data is considered. Moreover, we show that the performance of our approach is at most 20% less than experimental upper-bound and up to 80% higher than experimental lower-bound. To evaluate the performance of EMMA for batch active learning, we design two instantiations of EMMA to perform active learning in a batch mode. We show that these algorithms improve the algorithm training time at the cost of a reduced accuracy in performance. Also, clustering into the process of selecting sensor observations for batch active learning improves the activity learning performance by 11.1% on average, mainly due to reducing the redundancy among the selected sensor observations. We observe that mindful active learning is most beneficial when query budget is small and/or oracle's memory is weak
Diffraction Influence on the Field of View and Resolution of Three-Dimensional Integral Imaging
Using an optimal set of features with a machine learning-based approach to predict effector proteins for Legionella pneumophila
Type IV secretion systems exist in a number of bacterial pathogens and are used to secrete effector proteins directly into host cells in order to change their environment making the environment hospitable for the bacteria. In recent years, several machine learning algorithms have been developed to predict effector proteins, potentially facilitating experimental verification. However, inconsistencies exist between their results. Previously we analysed the disparate sets of predictive features used in these algorithms to determine an optimal set of 370 features for effector prediction. This study focuses on the best way to use these optimal features by designing three machine learning classifiers, comparing our results with those of others, and obtaining de novo results. We chose the pathogen Legionella pneumophila strain Philadelphia-1, a cause of Legionnaires’ disease, because it has many validated effector proteins and others have developed machine learning prediction tools for it. While all of our models give good results indicating that our optimal features are quite robust, Model 1, which uses all 370 features with a support vector machine, has slightly better accuracy. Moreover, Model 1 predicted 472 effector proteins that are deemed highly probable to be effectors and include 94% of known effectors. Although the results of our three models agree well with those of other researchers, their models only predicted 126 and 311 candidate effectors.Published copyAshari, Z., K.A. Brayton, and S. L. Broschat. (2019). Using an optimal set of features with a ma-chine learning-based approach to predict effector proteins for Legionella pneumophila. PLoS ONE, Vol. 14, No. 1, e0202312. doi:10.1371/journal.pone.0202312. PMCID: PMC6347213
Using an optimal set of features with a machine learning-based approach to predict effector proteins for Legionella pneumophila.
Type IV secretion systems exist in a number of bacterial pathogens and are used to secrete effector proteins directly into host cells in order to change their environment making the environment hospitable for the bacteria. In recent years, several machine learning algorithms have been developed to predict effector proteins, potentially facilitating experimental verification. However, inconsistencies exist between their results. Previously we analysed the disparate sets of predictive features used in these algorithms to determine an optimal set of 370 features for effector prediction. This study focuses on the best way to use these optimal features by designing three machine learning classifiers, comparing our results with those of others, and obtaining de novo results. We chose the pathogen Legionella pneumophila strain Philadelphia-1, a cause of Legionnaires' disease, because it has many validated effector proteins and others have developed machine learning prediction tools for it. While all of our models give good results indicating that our optimal features are quite robust, Model 1, which uses all 370 features with a support vector machine, has slightly better accuracy. Moreover, Model 1 predicted 472 effector proteins that are deemed highly probable to be effectors and include 94% of known effectors. Although the results of our three models agree well with those of other researchers, their models only predicted 126 and 311 candidate effectors
Prediction of T4SS Effector Proteins for Anaplasma phagocytophilum Using OPT4e, A New Software Tool
Type IV secretion systems (T4SS) are used by a number of bacterial pathogens to attack the host cell. The complex protein structure of the T4SS is used to directly translocate effector proteins into host cells, often causing fatal diseases in humans and animals. Identification of effector proteins is the first step in understanding how they function to cause virulence and pathogenicity. Accurate prediction of effector proteins via a machine learning approach can assist in the process of their identification. The main goal of this study is to predict a set of candidate effectors for the tick-borne pathogen Anaplasma phagocytophilum, the causative agent of anaplasmosis in humans. To our knowledge, we present the first computational study for effector prediction with a focus on A. phagocytophilum. In a previous study, we systematically selected a set of optimal features from more than 1,000 possible protein characteristics for predicting T4SS effector candidates. This was followed by a study of the features using the proteome of Legionella pneumophila strain Philadelphia deduced from its complete genome. In this manuscript we introduce the OPT4e software package for Optimal-features Predictor for T4SS Effector proteins. An earlier version of OPT4e was verified using cross-validation tests, accuracy tests, and comparison with previous results for L. pneumophila. We use OPT4e to predict candidate effectors from the proteomes of A. phagocytophilum strains HZ and HGE-1 and predict 48 and 46 candidates, respectively, with 16 and 18 deemed most probable as effectors. These latter include the three known validated effectors for A. phagocytophilum.Published copyEsna Ashari, Z., K.A. Brayton, and S.L. Broschat. (2019). Prediction of T4SS effector proteins for Anaplasma phagocytophilum using OPT4e, a new software tool. Frontiers in Microbiology, Vol.10. doi:10.3389/fmicb.2019.01391. PMCID: PMC6598457.First publication by Frontiers Medi
An optimal set of features for predicting type IV secretion system effector proteins for a subset of species based on a multi-level feature selection approach
<div><p>Type IV secretion systems (T4SS) are multi-protein complexes in a number of bacterial pathogens that can translocate proteins and DNA to the host. Most T4SSs function in conjugation and translocate DNA; however, approximately 13% function to secrete proteins, delivering effector proteins into the cytosol of eukaryotic host cells. Upon entry, these effectors manipulate the host cell’s machinery for their own benefit, which can result in serious illness or death of the host. For this reason recognition of T4SS effectors has become an important subject. Much previous work has focused on verifying effectors experimentally, a costly endeavor in terms of money, time, and effort. Having good predictions for effectors will help to focus experimental validations and decrease testing costs. In recent years, several scoring and machine learning-based methods have been suggested for the purpose of predicting T4SS effector proteins. These methods have used different sets of features for prediction, and their predictions have been inconsistent. In this paper, an optimal set of features is presented for predicting T4SS effector proteins using a statistical approach. A thorough literature search was performed to find features that have been proposed. Feature values were calculated for datasets of known effectors and non-effectors for T4SS-containing pathogens for four genera with a sufficient number of known effectors, <i>Legionella pneumophila</i>, <i>Coxiella burnetii</i>, <i>Brucella</i> spp, and <i>Bartonella</i> spp. The features were ranked, and less important features were filtered out. Correlations between remaining features were removed, and dimensional reduction was accomplished using principal component analysis and factor analysis. Finally, the optimal features for each pathogen were chosen by building logistic regression models and evaluating each model. The results based on evaluation of our logistic regression models confirm the effectiveness of our four optimal sets of features, and based on these an optimal set of features is proposed for all T4SS effector proteins.</p></div