198 research outputs found

    Novel approach for identification of influenza virus host range and zoonotic transmissible sequences by determination of host-related associative positions in viral genome segments

    Get PDF
    Background: Recent (2013 and 2009) zoonotic transmission of avian or porcine influenza to humans highlights an increase in host range by evading species barriers. Gene reassortment or antigenic shift between viruses from two or more hosts can generate a new life-threatening virus when the new shuffled virus is no longer recognized by antibodies existing within human populations. There is no large scale study to help understand the underlying mechanisms of host transmission. Furthermore, there is no clear understanding of how different segments of the influenza genome contribute in the final determination of host range. Methods: To obtain insight into the rules underpinning host range determination, various supervised machine learning algorithms were employed to mine reassortment changes in different viral segments in a range of hosts. Our multi-host dataset contained whole segments of 674 influenza strains organized into three host categories: avian, human, and swine. Some of the sequences were assigned to multiple hosts. In point of fact, the datasets are a form of multi-labeled dataset and we utilized a multi-label learning method to identify discriminative sequence sites. Then algorithms such as CBA, Ripper, and decision tree were applied to extract informative and descriptive association rules for each viral protein segment. Result: We found informative rules in all segments that are common within the same host class but varied between different hosts. For example, for infection of an avian host, HA14V and NS1230S were the most important discriminative and combinatorial positions. Conclusion: Host range identification is facilitated by high support combined rules in this study. Our major goal was to detect discriminative genomic positions that were able to identify multi host viruses, because such viruses are likely to cause pandemic or disastrous epidemics.Fatemeh Kargarfard, Ashkan Sami, Manijeh Mohammadi-Dehcheshmeh and Esmaeil Ebrahimi

    Identifying Changes in Selective Constraints: Host Shifts in Influenza

    Get PDF
    The natural reservoir of Influenza A is waterfowl. Normally, waterfowl viruses are not adapted to infect and spread in the human population. Sometimes, through reassortment or through whole host shift events, genetic material from waterfowl viruses is introduced into the human population causing worldwide pandemics. Identifying which mutations allow viruses from avian origin to spread successfully in the human population is of great importance in predicting and controlling influenza pandemics. Here we describe a novel approach to identify such mutations. We use a sitewise non-homogeneous phylogenetic model that explicitly takes into account differences in the equilibrium frequencies of amino acids in different hosts and locations. We identify 172 amino acid sites with strong support and 518 sites with moderate support of different selection constraints in human and avian viruses. The sites that we identify provide an invaluable resource to experimental virologists studying adaptation of avian flu viruses to the human host. Identification of the sequence changes necessary for host shifts would help us predict the pandemic potential of various strains. The method is of broad applicability to investigating changes in selective constraints when the timing of the changes is known

    Leveraging Machine Learning for the Analysis and Prediction of Influenza A Virus

    Get PDF
    Influenza, commonly known as flu, is a respiratory disease that poses a significant challenge to global public health due to its high prevalence and potential for serious health complications. The disease is caused by influenza viruses, among which influenza A viruses are of particular concern. These viruses are known for their rapid transmission, potential to cause severe health issues, and frequent mutations, which underscore the need for ongoing research and surveillance. A key aspect of managing influenza outbreaks includes understanding host origins, antigenic properties, and the ability of influenza A viruses to transmit between species, as this knowledge is critical in forecasting outbreaks and developing effective vaccines. Traditional approaches, such as hemagglutination inhibition assays for antigenicity assessment and phylogenetic analysis to determine genetic relationships, host origins and subtypes, have been fundamental in understanding influenza viruses. These methods, while informative, often face limitations in terms of time, resources, and the ability to keep pace with the rapid evolutionary changes of viruses. To mitigate these limitations, this thesis uses advanced machine learning techniques to analyse critical protein sequence data from influenza A viruses, offering an alternative perspective for unravelling the complexities of influenza, and potentially opening new avenues for analysis without strict reliance on prior biological knowledge. The core of the thesis is the application and refinement of predictive models to determine host origins, subtypes, and antigenic relationships of influenza A viruses. These models are evaluated comprehensively, considering factors such as the impact of incomplete sequences, performance across various host taxonomies and individual hosts, as well as the influence of reference databases on model performance. This evaluation illuminates the potential of machine learning to enhance our understanding of influenza A viruses in real-world scenarios, pointing out the ongoing importance of this research in public health

    Comprehensive analysis of lectin-glycan interactions reveals determinants of lectin specificity

    Get PDF
    Lectin-glycan interactions facilitate inter- and intracellular communication in many processes including protein trafficking, host-pathogen recognition, and tumorigenesis promotion. Specific recognition of glycans by lectins is also the basis for a wide range of applications in areas including glycobiology research, cancer screening, and antiviral therapeutics. To provide a better understanding of the determinants of lectin-glycan interaction specificity and support such applications, this study comprehensively investigates specificity-conferring features of all available lectin-glycan complex structures. Systematic characterization, comparison, and predictive modeling of a set of 221 complementary physicochemical and geometric features representing these interactions highlighted specificity-conferring features with potential mechanistic insight. Univariable comparative analyses with weighted Wilcoxon-Mann-Whitney tests revealed strong statistical associations between binding site features and specificity that are conserved across unrelated lectin binding sites. Multivariable modeling with random forests demonstrated the utility of these features for predicting the identity of bound glycans based on generalized patterns learned from non-homologous lectins. These analyses revealed global determinants of lectin specificity, such as sialic acid glycan recognition in deep, concave binding sites enriched for positively charged residues, in contrast to high mannose glycan recognition in fairly shallow but well-defined pockets enriched for non-polar residues. Focused fine specificity analysis of hemagglutinin interactions with human-like and avian-like glycans uncovered features representing both known and novel mutations related to shifts in influenza tropism from avian to human tissues. As the approach presented here relies on co-crystallized lectin-glycan pairs for studying specificity, it is limited in its inferences by the quantity, quality, and diversity of the structural data available. Regardless, the systematic characterization of lectin binding sites presented here provides a novel approach to studying lectin specificity and is a step towards confidently predicting new lectin-glycan interactions

    Understanding the undelaying mechanism of HASubtyping in the level of physic-chemal characteristics of protein

    Get PDF
    The evolution of the influenza A virus to increase its host range is a major concern worldwide. Molecular mechanisms of increasing host range are largely unknown. Influenza surface proteins play determining roles in reorganization of host-sialic acid receptors and host range. In an attempt to uncover the physic-chemical attributes which govern HA subtyping, we performed a large scale functional analysis of over 7000 sequences of 16 different HA subtypes. Large number (896) of physic-chemical protein characteristics were calculated for each HA sequence. Then, 10 different attribute weighting algorithms were used to find the key characteristics distinguishing HA subtypes. Furthermore, to discover machine leaning models which can predict HA subtypes, various Decision Tree, Support Vector Machine, NaΓ―ve Bayes, and Neural Network models were trained on calculated protein characteristics dataset as well as 10 trimmed datasets generated by attribute weighting algorithms. The prediction accuracies of the machine learning methods were evaluated by 10-fold cross validation. The results highlighted the frequency of Gln (selected by 80% of attribute weighting algorithms), percentage/frequency of Tyr, percentage of Cys, and frequencies of Try and Glu (selected by 70% of attribute weighting algorithms) as the key features that are associated with HA subtyping. Random Forest tree induction algorithm and RBF kernel function of SVM (scaled by grid search) showed high accuracy of 98% in clustering and predicting HA subtypes based on protein attributes. Decision tree models were successful in monitoring the short mutation/reassortment paths by which influenza virus can gain the key protein structure of another HA subtype and increase its host range in a short period of time with less energy consumption. Extracting and mining a large number of amino acid attributes of HA subtypes of influenza A virus through supervised algorithms represent a new avenue for understanding and predicting possible future structure of influenza pandemics.Mansour Ebrahimi, Parisa Aghagolzadeh, Narges Shamabadi, Ahmad Tahmasebi, Mohammed Alsharifi, David L. Adelson, Farhid Hemmatzadeh, Esmaeil Ebrahimi

    Avian influenza H9N2-specific changes in the adaptive immune receptor repertoire of Gallus gallus domesticus following vaccination and infectious challenge

    Get PDF
    Avian influenza viruses cause major losses to the poultry sector each year and also pose a significant risk for cross-species transmission to humans, where the disease manifestations can be very severe. However, within avian hosts, there is still a limited understanding of the adaptive immune system in the contexts of both health and disease, and the immunological mechanisms which underpin vaccine-induced protective responses against infectious challenge with pathogens such as avian influenza. As the ability of the adaptive immune system to recognise specific antigens is dependent on the T and B cell receptors which together comprise the adaptive immune receptor repertoire, understanding the determinants which shape its specificities and diversity is paramount for both improving our knowledge of the avian immune system and improving current prevention and control strategies such as vaccination. In this thesis, I present a comprehensive analysis of the domestic chicken (Gallus gallus domesticus) adaptive immune receptor repertoire upon infection and/or vaccination with H9N2 avian influenza – a pathogen that is widely prevalent across the world and poses significant risk both to the poultry sector and to human health and wellbeing. At the time of writing, no published research has examined the avian adaptive immune repertoire using high throughput sequencing, and no repertoire studies have been performed in birds that were infected with and/or vaccinated against avian influenza. My analyses thus provide valuable information on the avian adaptive immune system and the impacts of H9N2 infection and/or vaccination on the avian adaptive immune receptor repertoires

    Predicting the animal hosts of coronaviruses from compositional biases of spike protein and whole genome sequences through machine learning

    Get PDF
    The COVID-19 pandemic has demonstrated the serious potential for novel zoonotic coronaviruses to emerge and cause major outbreaks. The immediate animal origin of the causative virus, SARS-CoV-2, remains unknown, a notoriously challenging task for emerging disease investigations. Coevolution with hosts leads to specific evolutionary signatures within viral genomes that can inform likely animal origins. We obtained a set of 650 spike protein and 511 whole genome nucleotide sequences from 225 and 187 viruses belonging to the family Coronaviridae , respectively. We then trained random forest models independently on genome composition biases of spike protein and whole genome sequences, including dinucleotide and codon usage biases in order to predict animal host (of nine possible categories, including human). In hold-one-out cross-validation, predictive accuracy on unseen coronaviruses consistently reached ∼73%, indicating evolutionary signal in spike proteins to be just as informative as whole genome sequences. However, different composition biases were informative in each case. Applying optimised random forest models to classify human sequences of MERS-CoV and SARS-CoV revealed evolutionary signatures consistent with their recognised intermediate hosts (camelids, carnivores), while human sequences of SARS-CoV-2 were predicted as having bat hosts (suborder Yinpterochiroptera), supporting bats as the suspected origins of the current pandemic. In addition to phylogeny, variation in genome composition can act as an informative approach to predict emerging virus traits as soon as sequences are available. More widely, this work demonstrates the potential in combining genetic resources with machine learning algorithms to address long-standing challenges in emerging infectious diseases
    • …
    corecore