58 research outputs found

    Predicting protein function by machine learning on amino acid sequences – a critical evaluation

    Get PDF
    Copyright @ 2007 Al-Shahib et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Background: Predicting the function of newly discovered proteins by simply inspecting their amino acid sequence is one of the major challenges of post-genomic computational biology, especially when done without recourse to experimentation or homology information. Machine learning classifiers are able to discriminate between proteins belonging to different functional classes. Until now, however, it has been unclear if this ability would be transferable to proteins of unknown function, which may show distinct biases compared to experimentally more tractable proteins. Results: Here we show that proteins with known and unknown function do indeed differ significantly. We then show that proteins from different bacterial species also differ to an even larger and very surprising extent, but that functional classifiers nonetheless generalize successfully across species boundaries. We also show that in the case of highly specialized proteomes classifiers from a different, but more conventional, species may in fact outperform the endogenous species-specific classifier. Conclusion: We conclude that there is very good prospect of successfully predicting the function of yet uncharacterized proteins using machine learning classifiers trained on proteins of known function

    Enhanced nasopharyngeal infection and shedding associated with an epidemic lineage of emm3 group A Streptococcus

    Get PDF
    Background: A group A Streptococcus (GAS) lineage of genotype emm3, sequence type 15 (ST15) was associated with a six month upsurge in invasive GAS disease in the UK. The epidemic lineage (Lineage C) had lost two typical emm3 prophages, Φ315.1 and Φ315.2 associated with the superantigen ssa, but gained a different prophage (ΦUK-M3.1) associated with a different superantigen, speC and a DNAse spd1. Methods and Results: The presence of speC and spd1 in Lineage C ST15 strains enhanced both in vitro mitogenic and DNAse activities over non-Lineage C ST15 strains. Invasive disease models in Galleria mellonella and SPEC-sensitive transgenic mice, revealed no difference in overall invasiveness of Lineage C ST15 strains compared to non-Lineage C ST15 strains, consistent with clinical and epidemiological analysis. Lineage C strains did however markedly prolong murine nasal infection with enhanced nasal and airborne shedding compared to non-Lineage C strains. Deletion of speC or spd1 in two Lineage C strains identified a possible role for spd1 in airborne shedding from the murine nasopharynx. Conclusions: Nasopharyngeal infection and shedding of Lineage C strains was enhanced compared to nonLineage C strains and this was, in part, mediated by the gain of the DNase spd1 through prophage acquisition

    Emergence of a novel lineage containing a prophage in emm/M3 group A Streptococcus associated with upsurge in invasive disease in the UK

    Get PDF
    A sudden increase in invasive Group A Streptococcus (iGAS) infections associated with emm/M3 isolates during the winter of 2008/09 prompted the initiation of enhanced surveillance in England. In order to characterise the population of emm/M3 GAS within the UK and determine bacterial factors that might be responsible for this upsurge, 442 emm/M3 isolates from cases of invasive and non-invasive infections during the period 2001–2013 were subjected to whole genome sequencing. MLST analysis differentiated emm/M3 isolates into three sequence types (STs): ST15, ST315 and ST406. Analysis of the whole genome SNP-based phylogeny showed that the majority of isolates from the 2008–2009 upsurge period belonged to a distinct lineage characterized by the presence of a prophage carrying the speC exotoxin and spd1 DNAase genes but loss of two other prophages considered typical of the emm/M3 lineage. This lineage was significantly associated with the upsurge in iGAS cases and we postulate that the upsurge could be attributed in part to expansion of this novel prophage-containing lineage within the population. The study underlines the importance of prompt genomic analysis of changes in the GAS population, providing an advanced public health warning system for newly emergent, pathogenic strains

    Integration of genomic and other epidemiologic data to investigate and control a cross-institutional outbreak of Streptococcus pyogenes outbrea

    Get PDF
    Single-strain outbreaks of Streptococcus pyogenes infections are common and often go undetected. In 2013, two clusters of invasive group A Streptococcus (iGAS) infection were identified in independent but closely located care homes in Oxfordshire, United Kingdom. Investigation included visits to each home, chart review, staff survey, microbiologic sampling, and genome sequencing. S. pyogenes emm type 1.0, the most common circulating type nationally, was identified from all cases yielding GAS isolates. A tailored whole-genome reference population comprising epidemiologically relevant contemporaneous isolates and published isolates was assembled. Data were analyzed independently using whole-genome multilocus sequencing and single-nucleotide polymorphism analyses. Six isolates from staff and residents of the homes formed a single cluster that was separated from the reference population by both analytical approaches. No further cases occurred after mass chemoprophylaxis and enhanced infection control. Our findings demonstrate the ability of 2 independent analytical approaches to enable robust conclusions from nonstandardized whole-genome analysis to support public health practice

    Enzyme classification with peptide programs: a comparative study

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Efficient and accurate prediction of protein function from sequence is one of the standing problems in Biology. The generalised use of sequence alignments for inferring function promotes the propagation of errors, and there are limits to its applicability. Several machine learning methods have been applied to predict protein function, but they lose much of the information encoded by protein sequences because they need to transform them to obtain data of fixed length.</p> <p>Results</p> <p>We have developed a machine learning methodology, called peptide programs (PPs), to deal directly with protein sequences and compared its performance with that of Support Vector Machines (SVMs) and BLAST in detailed enzyme classification tasks. Overall, the PPs and SVMs had a similar performance in terms of Matthews Correlation Coefficient, but the PPs had generally a higher precision. BLAST performed globally better than both methodologies, but the PPs had better results than BLAST and SVMs for the smaller datasets.</p> <p>Conclusion</p> <p>The higher precision of the PPs in comparison to the SVMs suggests that dealing with sequences is advantageous for detailed protein classification, as precision is essential to avoid annotation errors. The fact that the PPs performed better than BLAST for the smaller datasets demonstrates the potential of the methodology, but the drop in performance observed for the larger datasets indicates that further development is required.</p> <p>Possible strategies to address this issue include partitioning the datasets into smaller subsets and training individual PPs for each subset, or training several PPs for each dataset and combining them using a bagging strategy.</p

    Patient-provider interaction from the perspectives of type 2 diabetes patients in Muscat, Oman: a qualitative study

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Patients' expectations and perceptions of the medical encounter and interactions are important tools in diabetes management. Some problems regarding the interaction during encounters may be related to a lack of communication skills on the part of either the physician or the patient.</p> <p>This study aimed at exploring the perceptions of type 2 diabetes patients regarding the medical encounters and quality of interactions with their primary health-care providers.</p> <p>Methods</p> <p>Four focus group discussions (two women and two men groups) were conducted among 27 purposively selected patients (13 men and 14 women) from six primary health-care centres in Muscat, Oman. Qualitative content analysis was applied.</p> <p>Results</p> <p>The patients identified some weaknesses regarding the patient-provider communication like: unfriendly welcoming; interrupted consultation privacy; poor attention and eye contact; lack of encouraging the patients to ask questions on the providers' side; and inability to participate in medical dialogue or express concerns on the patients' side. Other barriers and difficulties related to issues of patient-centeredness, organization of diabetes clinics, health education and professional competency regarding diabetes care were also identified.</p> <p>Conclusion</p> <p>The diabetes patients' experiences with the primary health-care providers showed dissatisfaction with the services. We suggest appropriate training for health-care providers with regard to diabetes care and developing of communication skills with emphasis on a patient-centred approach. An efficient use of available resources in diabetes clinics and distributing responsibilities between team members in close collaboration with patients and their families seems necessary. Further exploration of the providers' work situation and barriers to good interaction is needed. Our findings can help the policy makers in Oman, and countries with similar health systems, to improve the quality and organizational efficiency of diabetes care services.</p

    Unique arbuscular mycorrhizal fungal communities uncovered in date palm plantations and surrounding desert habitats of Southern Arabia

    Get PDF
    The main objective of this study was to shed light on the previously unknown arbuscular mycorrhizal fungal (AMF) communities in Southern Arabia. We explored AMF communities in two date palm (Phoenix dactylifera) plantations and the natural vegetation of their surrounding arid habitats. The plantations were managed traditionally in an oasis and according to conventional guidelines at an experimental station. Based on spore morphotyping, the AMF communities under the date palms appeared to be quite diverse at both plantations and more similar to each other than to the communities under the ruderal plant, Polygala erioptera, growing at the experimental station on the dry strip between the palm trees, and to the communities uncovered under the native vegetation (Zygophyllum hamiense, Salvadora persica, Prosopis cineraria, inter-plant area) of adjacent undisturbed arid habitat. AMF spore abundance and species richness were higher under date palms than under the ruderal and native plants. Sampling in a remote sand dune area under Heliotropium kotschyi yielded only two AMF morphospecies and only after trap culturing. Overall, 25 AMF morphospecies were detected encompassing all study habitats. Eighteen belonged to the genus Glomus including four undescribed species. Glomus sinuosum, a species typically found in undisturbed habitats, was the most frequently occurring morphospecies under the date palms. Using molecular tools, it was also found as a phylogenetic taxon associated with date palm roots. These roots were associated with nine phylogenetic taxa, among them eight from Glomus group A, but the majority could not be assigned to known morphospecies or to environmental sequences in public databases. Some phylogenetic taxa seemed to be site specific. Despite the use of group-specific primers and efficient trapping systems with a bait plant consortium, surprisingly, two of the globally most frequently found species, Glomus intraradices and Glomus mosseae, were not detected neither as phylogenetic taxa in the date palm roots nor as spores under the date palms, the intermediate ruderal plant, or the surrounding natural vegetation. The results highlight the uniqueness of AMF communities inhabiting these diverse habitats exposed to the harsh climatic conditions of Southern Arabia
    corecore