2,453 research outputs found

    Global Analysis of Circulating Immune Cells by Matrix-Assisted Laser Desorption Ionization Time-of-Flight Mass Spectrometry

    Get PDF
    Background: MALDI-TOF mass spectrometry is currently used in microbiological diagnosis to characterize bacterial populations. Our aim was to determine whether this technique could be applied to intact eukaryotic cells, and in particular, to cells involved in the immune response. Methodology/Principal Findings: A comparison of frozen monocytes, T lymphocytes and polymorphonuclear leukocytes revealed specific peak profiles. We also found that twenty cell types had specific profiles, permitting the establishment of a cell database. The circulating immune cells, namely monocytes, T lymphocytes and polymorphonuclear cells, were distinct from tissue immune cells such as monocyte-derived macrophages and dendritic cells. In addition, MALDI-TOF mass spectrometry was valuable to easily identify the signatures of monocytes and T lymphocytes in peripheral mononuclear cells. Conclusions/Significance: This method was rapid and easy to perform, and unlike flow cytometry, it did not require any additional components such as specific antibodies. The MALDI-TOF mass spectrometry approach could be extended t

    Computational Metagenomics: Network, Classification and Assembly

    Get PDF
    Due to the rapid advance of DNA sequencing technologies in recent 10 years, large amounts of short DNA reads can be obtained quickly and cheaply. For example, a single Illumina HiSeq machine can produce several terabytes of data sets within a week. Metagenomics is a new scientific field that involves the analysis of genomic DNA sequences obtained directly from the environment, enabling studies of novel microbial systems. Metagenomics was made possible from high-throughput sequencing technologies. The analysis of the resulting data requires sophisticated computational analyses and data mining. In clinical settings, a fundamental goal of metagenomics is to help people diagnose and cure disease in clinical settings. One major bottleneck so far is how to analyze the huge noisy data sets quickly and precisely. My PhD research focuses on developing algorithms and tools to tackle these challenging and interesting computational problems. From the functional perspective, a metagenomic sample can be represented as a weighted metabolic network, in which the nodes are molecules, edges are enzymes encoded by genes, and the weights can be considered as the number of organisms providing the functions. One goal of functional comparison between metagenomic samples is to find differentially abundant metabolic subnetworks between two groups under comparison. We have developed a statistical network analysis tool - MetaPath, which uses a greedy search algorithm to find maximum weight subnetwork and a nonparametric permutation test to measure the statistical significance. Unlike previous approaches, MetaPath explicitly searches for significant subnetwork in the global network, enabling us to detect signatures at a finer level. In addition, we developed statistical methods that take into account the topology of the network when testing the significance of the subnetworks. Another computational problem involves classifying anonymous DNA sequences obtained from metagenomic samples. There are several challenges here: (1) The classification labels follow a hierarchical tree structure, in which the leaves are most specific, and the internal nodes are more general. How can we classify novel sequences that do not belong to leaf categories (species) but belong to internal groups (e.g., phylum)? (2) For each classification how can we compute a confidence score, such that the users have a tradeoff between sensitivity and specificity? (3) How can we analyze billions of data items quickly? We have developed a novel hierarchical classifier (MetaPhyler) for the classification of anonymous DNA reads. Through simulation, MetaPhyler models the distribution of pairwise similarities within different hierarchical groups with nonparametric density estimation. The confidence score is computed by the ratio of likelihood function. For a query DNA sequence with arbitrary length, its similarity can be calculated through linear approximation. Through benchmark comparison, we have shown that MetaPhyler is significantly faster and more accurate than previous tools. DNA sequencing machines can only produce very short strings (e.g., 100bp) relative to the size of a genome (e.g., a typical bacterial genome is 5Mbp). One of the most challenging computational tasks is the assembly of millions of short reads into longer contigs, which are used as the basis of subsequent computational analyses. In this project, we have developed a comparative metagenomic assembler (MetaCompass), which utilizes the genomes that have already been sequenced previously, and produces long contigs through read mapping (alignment) and assembly. Given the availability of thousands of existing bacteria genomes, for a particular sample, MetaCompass first chooses a best subset as reference based on the taxonomic composition. Then, the reads are aligned against these genomes using MUMmer-map or Bowtie2. Afterwards, we use a greedy algorithm of the minimum set-covering problem to build long contigs, and the consensus sequences are computed by the majority rule. We also propose an iterative approach to improve the performance. Finally, MetaCompass has been successfully evaluated and tested on over 20 terabytes of metagenomic data sets generated from the Human Microbiome Project. In addition, to facilitate the identification and characterization of antibiotic resistance genes, we have created Antibiotic Resistance Genes Database (ARDB), which provides a centralized compendium of information on antibiotic resistance. Furthermore, we have applied our tools to the analysis of a novel oral microbiome data set, and have discovered interesting functional mechanisms and ecological changes underlying the transition from health to periodontal disease of human mouth at a system level

    Developing Algorithms for Quantifying the Super Resolution Microscopic Data: Applications to the Quantification of Protein-Reorganization in Bacteria Responding to Treatment by Silver Ions

    Get PDF
    Histone-like nucleoid structuring proteins (HNS) play significant roles in shaping the chromosomal DNA, regulation of transcriptional networks in microbes, as well as bacterial responses to environmental changes such as temperature fluctuations. In this work, the intracellular organization of HNS proteins in E. coli bacteria was investigated utilizing super-resolution fluorescence microscopy, which surpasses conventional microscopy by 10–20 fold in spatial resolution. More importantly, the changes of the spatial distribution of HNS proteins in E. coli, by addition of silver ions into the growth medium were explored. To quantify the spatial distribution of HNS in bacteria and its changes, an automatic method based on Voronoi diagram was implemented. The HNS proteins localized in super-resolution fluorescence microscopy were segmented and clustered based on several quantitative parameters, such as molecular areas, molecular densities, and mean inter-molecular distances of the k-th rank, all of which were computed from the Voronoi diagrams. These parameters, as well as the associated clustering analysis, allowed us to quantify how the spatial organization of HNS proteins responds to silver, and provided insight into understanding how microbes adapt to new environments

    Perturbation Detection Through Modeling of Gene Expression on a Latent Biological Pathway Network: A Bayesian hierarchical approach

    Full text link
    Cellular response to a perturbation is the result of a dynamic system of biological variables linked in a complex network. A major challenge in drug and disease studies is identifying the key factors of a biological network that are essential in determining the cell's fate. Here our goal is the identification of perturbed pathways from high-throughput gene expression data. We develop a three-level hierarchical model, where (i) the first level captures the relationship between gene expression and biological pathways using confirmatory factor analysis, (ii) the second level models the behavior within an underlying network of pathways induced by an unknown perturbation using a conditional autoregressive model, and (iii) the third level is a spike-and-slab prior on the perturbations. We then identify perturbations through posterior-based variable selection. We illustrate our approach using gene transcription drug perturbation profiles from the DREAM7 drug sensitivity predication challenge data set. Our proposed method identified regulatory pathways that are known to play a causative role and that were not readily resolved using gene set enrichment analysis or exploratory factor models. Simulation results are presented assessing the performance of this model relative to a network-free variant and its robustness to inaccuracies in biological databases

    Development of a framework for the classification of antibiotics adjuvants

    Get PDF
    Dissertação de mestrado em BioInformaticsThroughout the last decades, bacteria have become increasingly resistant to available antibiotics, leading to a growing need for new antibiotics and new drug development methodologies. In the last 40 years, there are no records of the development of new antibiotics, which has begun to shorten possible alternatives. Therefore, finding new antibiotics and bringing them to market is increasingly challenging. One approach is finding compounds that restore or leverage the activity of existing antibiotics against biofilm bacteria. As the information in this field is very limited and there is no database regarding this theme, machine learning models were used to predict the relevance of the documents regarding adjuvants. In this project, the BIOFILMad - Catalog of antimicrobial adjuvants to tackle biofilms application was developed to help researchers save time in their daily research. This application was constructed using Django and Django REST Framework for the backend and React for the frontend. As for the backend, a database needed to be constructed since no database entirely focuses on this topic. For that, a machine learning model was trained to help us classify articles. Three different algorithms were used, Support-Vector Machine (SVM), Random Forest (RF), and Logistic Regression (LR), combined with a different number of features used, more precisely, 945 and 1890. When analyzing all metrics, model LR-1 performed the best for classifying relevant documents with an accuracy score of 0.8461, a recall score of 0.6170, an f1-score of 0.6904, and a precision score of 0.7837. This model is the best at correctly predicting the relevant documents, as proven by the higher recall score compared to the other models. With this model, our database was populated with relevant information. Our backend has a unique feature, the aggregation feature constructed with Named Entity Recognition (NER). The goal is to identify specific entity types, in our case, it identifies CHEMICAL and DISEASE. An association between these entities was made, thus delivering the user the respective associations, saving researchers time. For example, a researcher can see with which compounds "pseudomonas aeruginosa" has already been tested thanks to this aggregation feature. The frontend was implemented so the user could access this aggregation feature, see the articles present in the database, use the machine learning models to classify new documents, and insert them in the database if they are relevant.Ao longo das últimas décadas, as bactérias tornaram-se cada vez mais resistentes aos antibióticos disponíveis, levando a uma crescente necessidade de novos antibióticos e novas metodologias de desenvolvimento de medicamentos. Nos últimos 40 anos, não há registos do desenvolvimento de novos antibióticos, o que começa a reduzir as alternativas possíveis. Portanto, criar novos antibióticos e torna-los disponíveis no mercado é cada vez mais desafiante. Uma abordagem é a descoberta de compostos que restaurem ou potencializem a atividade dos antibióticos existentes contra bactérias multirresistentes. Como as informações neste campo são muito limitadas e não há uma base de dados sobre este tema, modelos de Machine Learning foram utilizados para prever a relevância dos documentos acerca dos adjuvantes. Neste projeto, foi desenvolvida a aplicação BIOFILMad - Catalog of antimicrobial adjuvants to tackle biofilms para ajudar os investigadores a economizar tempo nas suas pesquisas. Esta aplicação foi construída usando o Django e Django REST Framework para o backend e React para o frontend. Quanto ao backend, foi necessário construir uma base de dados, pois não existe nenhuma que se concentre inteiramente neste tópico. Para isso, foi treinado um modelo machine learning para nos ajudar a classificar os artigos. Três algoritmos diferentes foram usados: Support-Vector Machine (SVM), Random Forest (RF) e Logistic Regression (LR), combinados com um número diferente de features, mais precisamente, 945 e 1890. Ao analisar todas as métricas, o modelo LR-1 teve o melhor desempenho para classificar artigos relevantes com uma accuracy de 0,8461, um recall de 0,6170, um f1-score de 0,6904 e uma precision de 0,7837. Este modelo foi o melhor a prever corretamente os artigos relevantes, comprovado pelo alto recall em comparação com os outros modelos. Com este modelo, a base de dados foi populda com informação relevante. O backend apresenta uma caracteristica particular, a agregação construída com Named-Entity-Recognition (NER). O objetivo é identificar tipos específicos de entidades, no nosso caso, identifica QUÍMICOS e DOENÇAS. Esta classificação serviu para formar associações entre entidades, demonstrando ao utilizador as respetivas associações feitas, permitindo economizar o tempo dos investigadores. Por exemplo, um investigador pode ver com quais compostos a "pseudomonas aeruginosa" já foi testada graças à funcionalidade de agregação. O frontend foi implementado para que o utilizador possa ter acesso a esta funcionalidade de agregação, ver os artigos presentes na base de dados, utilizar o modelo de machine learning para classificar novos artigos e inseri-los na base de dados caso sejam relevantes
    corecore