256 research outputs found

    Modeling Bacterial Species: Using Sequence Similarity with Clustering Techniques

    Get PDF
    Existing studies have challenged the current definition of named bacterial species, especially in the case of highly recombinogenic bacteria. This has led to considering the use of computational procedures to examine potential bacterial clusters that are not identified by species naming. This paper describes the use of sequence data obtained from MLST databases as input for a k-means algorithm extended to deal with housekeeping gene sequences as a metric of similarity for the clustering process. An implementation of the k-means algorithm has been developed based on an existing source code implementation, and it has been evaluated against MLST data. Results point out to potential bacterial clusters that are close to more than one different named species and thus may become candidates for alternative classifications accounting for genotypic information. The use of hierarchical clustering with sequence comparison as similarity metric has the potential to find clusters different from named species by using a more informed cluster formation strategy than a conventional nominal variant of the algorithm

    Bayesian Non-Exhaustive Classification A Case Study: Online Name Disambiguation using Temporal Record Streams

    Get PDF
    The name entity disambiguation task aims to partition the records of multiple real-life persons so that each partition contains records pertaining to a unique person. Most of the existing solutions for this task operate in a batch mode, where all records to be disambiguated are initially available to the algorithm. However, more realistic settings require that the name disambiguation task be performed in an online fashion, in addition to, being able to identify records of new ambiguous entities having no preexisting records. In this work, we propose a Bayesian non-exhaustive classification framework for solving online name disambiguation task. Our proposed method uses a Dirichlet process prior with a Normal * Normal * Inverse Wishart data model which enables identification of new ambiguous entities who have no records in the training data. For online classification, we use one sweep Gibbs sampler which is very efficient and effective. As a case study we consider bibliographic data in a temporal stream format and disambiguate authors by partitioning their papers into homogeneous groups. Our experimental results demonstrate that the proposed method is better than existing methods for performing online name disambiguation task.Comment: to appear in CIKM 201

    An image classification approach to analyze the suppression of plant immunity by the human pathogen <it>Salmonella</it> Typhimurium

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The enteric pathogen <it>Salmonella</it> is the causative agent of the majority of food-borne bacterial poisonings. Resent research revealed that colonization of plants by <it>Salmonella</it> is an active infection process. <it>Salmonella</it> changes the metabolism and adjust the plant host by suppressing the defense mechanisms. In this report we developed an automatic algorithm to quantify the symptoms caused by <it>Salmonella</it> infection on <it>Arabidopsis</it>.</p> <p>Results</p> <p>The algorithm is designed to attribute image pixels into one of the two classes: healthy and unhealthy. The task is solved in three steps. First, we perform segmentation to divide the image into foreground and background. In the second step, a support vector machine (SVM) is applied to predict the class of each pixel belonging to the foreground. And finally, we do refinement by a neighborhood-check in order to omit all falsely classified pixels from the second step. The developed algorithm was tested on infection with the non-pathogenic <it>E. coli</it> and the plant pathogen <it>Pseudomonas syringae</it> and used to study the interaction between plants and <it>Salmonella</it> wild type and T3SS mutants. We proved that T3SS mutants of <it>Salmonella</it> are unable to suppress the plant defenses. Results obtained through the automatic analyses were further verified on biochemical and transcriptome levels.</p> <p>Conclusion</p> <p>This report presents an automatic pixel-based classification method for detecting “unhealthy” regions in leaf images. The proposed method was compared to existing method and showed a higher accuracy. We used this algorithm to study the impact of the human pathogenic bacterium <it>Salmonella</it> Typhimurium on plants immune system. The comparison between wild type bacteria and T3SS mutants showed similarity in the infection process in animals and in plants. Plant epidemiology is only one possible application of the proposed algorithm, it can be easily extended to other detection tasks, which also rely on color information, or even extended to other features.</p

    Computational Identification of the Plausible Molecular Vaccine Candidates of Multidrug-Resistant <em>Salmonella enterica</em>

    Get PDF
    Salmonella enterica serovars are responsible for the life-threatening, fatal, invasive diseases that are common in children and young adults. According to the most recent estimates, globally, there are approximately 11–20 million cases of morbidity and between 128,000 and 161,000 mortality per year. The high incidence rates of diseases like typhoid, caused by the serovars Typhi and Paratyphi, and gastroenteritis, caused by the non-typhoidal Salmonellae, have become worse, with the ever-increasing pathogenic strains being resistant to fluoroquinolones or almost even the third generation cephalosporins, such as ciprofloxacin and ceftriaxone. With vaccination still being one of the chosen methods of eradicating this disease, identification of candidate proteins, to be utilized for effective molecular vaccines, has probably remained a challenging issue. In our study here, we portray the usage of computational tools to analyze and predict potential vaccine candidate(s) for the multi-drug resistant serovars of S. enterica

    Multilocus Sequence Typing as a Replacement for Serotyping in Salmonella enterica

    Get PDF
    Salmonella enterica subspecies enterica is traditionally subdivided into serovars by serological and nutritional characteristics. We used Multilocus Sequence Typing (MLST) to assign 4,257 isolates from 554 serovars to 1092 sequence types (STs). The majority of the isolates and many STs were grouped into 138 genetically closely related clusters called eBurstGroups (eBGs). Many eBGs correspond to a serovar, for example most Typhimurium are in eBG1 and most Enteritidis are in eBG4, but many eBGs contained more than one serovar. Furthermore, most serovars were polyphyletic and are distributed across multiple unrelated eBGs. Thus, serovar designations confounded genetically unrelated isolates and failed to recognize natural evolutionary groupings. An inability of serotyping to correctly group isolates was most apparent for Paratyphi B and its variant Java. Most Paratyphi B were included within a sub-cluster of STs belonging to eBG5, which also encompasses a separate sub-cluster of Java STs. However, diphasic Java variants were also found in two other eBGs and monophasic Java variants were in four other eBGs or STs, one of which is in subspecies salamae and a second of which includes isolates assigned to Enteritidis, Dublin and monophasic Paratyphi B. Similarly, Choleraesuis was found in eBG6 and is closely related to Paratyphi C, which is in eBG20. However, Choleraesuis var. Decatur consists of isolates from seven other, unrelated eBGs or STs. The serological assignment of these Decatur isolates to Choleraesuis likely reflects lateral gene transfer of flagellar genes between unrelated bacteria plus purifying selection. By confounding multiple evolutionary groups, serotyping can be misleading about the disease potential of S. enterica. Unlike serotyping, MLST recognizes evolutionary groupings and we recommend that Salmonella classification by serotyping should be replaced by MLST or its equivalents
    • 

    corecore