2,453 research outputs found
Global Analysis of Circulating Immune Cells by Matrix-Assisted Laser Desorption Ionization Time-of-Flight Mass Spectrometry
Background: MALDI-TOF mass spectrometry is currently used in microbiological diagnosis to characterize bacterial populations. Our aim was to determine whether this technique could be applied to intact eukaryotic cells, and in particular, to cells involved in the immune response. Methodology/Principal Findings: A comparison of frozen monocytes, T lymphocytes and polymorphonuclear leukocytes revealed specific peak profiles. We also found that twenty cell types had specific profiles, permitting the establishment of a cell database. The circulating immune cells, namely monocytes, T lymphocytes and polymorphonuclear cells, were distinct from tissue immune cells such as monocyte-derived macrophages and dendritic cells. In addition, MALDI-TOF mass spectrometry was valuable to easily identify the signatures of monocytes and T lymphocytes in peripheral mononuclear cells. Conclusions/Significance: This method was rapid and easy to perform, and unlike flow cytometry, it did not require any additional components such as specific antibodies. The MALDI-TOF mass spectrometry approach could be extended t
Recommended from our members
The Computational Diet: A Review of Computational Methods Across Diet, Microbiome, and Health.
Food and human health are inextricably linked. As such, revolutionary impacts on health have been derived from advances in the production and distribution of food relating to food safety and fortification with micronutrients. During the past two decades, it has become apparent that the human microbiome has the potential to modulate health, including in ways that may be related to diet and the composition of specific foods. Despite the excitement and potential surrounding this area, the complexity of the gut microbiome, the chemical composition of food, and their interplay in situ remains a daunting task to fully understand. However, recent advances in high-throughput sequencing, metabolomics profiling, compositional analysis of food, and the emergence of electronic health records provide new sources of data that can contribute to addressing this challenge. Computational science will play an essential role in this effort as it will provide the foundation to integrate these data layers and derive insights capable of revealing and understanding the complex interactions between diet, gut microbiome, and health. Here, we review the current knowledge on diet-health-gut microbiota, relevant data sources, bioinformatics tools, machine learning capabilities, as well as the intellectual property and legislative regulatory landscape. We provide guidance on employing machine learning and data analytics, identify gaps in current methods, and describe new scenarios to be unlocked in the next few years in the context of current knowledge
Computational Metagenomics: Network, Classification and Assembly
Due to the rapid advance of DNA sequencing technologies in recent 10 years, large amounts of short DNA reads can be obtained quickly and cheaply. For example, a single Illumina HiSeq machine can produce several terabytes of data sets within a week. Metagenomics is a new scientific field that involves the analysis of genomic DNA sequences obtained directly from the environment, enabling studies of novel microbial systems. Metagenomics was made possible from high-throughput sequencing technologies. The analysis of the resulting data requires sophisticated computational analyses and data mining. In clinical settings, a fundamental goal of metagenomics is to help people diagnose and cure disease in clinical settings. One major bottleneck so far is how to analyze the huge noisy data sets quickly and precisely. My PhD research focuses on developing algorithms and tools to tackle these challenging and interesting computational problems.
From the functional perspective, a metagenomic sample can be represented as a weighted metabolic network, in which the nodes are molecules, edges are enzymes encoded by genes, and the weights can be considered as the number of organisms providing the functions. One goal of functional comparison between metagenomic samples is to find differentially abundant metabolic subnetworks between two groups under comparison. We have developed a statistical network analysis tool - MetaPath, which uses a greedy search algorithm to find maximum weight subnetwork and a nonparametric permutation test to measure the statistical significance. Unlike previous approaches, MetaPath explicitly searches for significant subnetwork in the global network, enabling us to detect signatures at a finer level. In addition, we developed statistical methods that take into account the topology of the network when testing the significance of the subnetworks.
Another computational problem involves classifying anonymous DNA sequences obtained from metagenomic samples. There are several challenges here: (1) The classification labels follow a hierarchical tree structure, in which the leaves are most specific, and the internal nodes are more general. How can we classify novel sequences that do not belong to leaf categories (species) but belong to internal groups (e.g., phylum)? (2) For each classification how can we compute a confidence score, such that the users have a tradeoff between sensitivity and specificity? (3) How can we analyze billions of data items quickly? We have developed a novel hierarchical classifier (MetaPhyler) for the classification of anonymous DNA reads. Through simulation, MetaPhyler models the distribution of pairwise similarities within different hierarchical groups with nonparametric density estimation. The confidence score is computed by the ratio of likelihood function. For a query DNA sequence with arbitrary length, its similarity can be calculated through linear approximation. Through benchmark comparison, we have shown that MetaPhyler is significantly faster and more accurate than previous tools.
DNA sequencing machines can only produce very short strings (e.g., 100bp) relative to the size of a genome (e.g., a typical bacterial genome is 5Mbp). One of the most challenging computational tasks is the assembly of millions of short reads into longer contigs, which are used as the basis of subsequent computational analyses. In this project, we have developed a comparative metagenomic assembler (MetaCompass), which utilizes the genomes that have already been sequenced previously, and produces long contigs through read mapping (alignment) and assembly. Given the availability of thousands of existing bacteria genomes, for a particular sample, MetaCompass first chooses a best subset as reference based on the taxonomic composition. Then, the reads are aligned against these genomes using MUMmer-map or Bowtie2. Afterwards, we use a greedy algorithm of the minimum set-covering problem to build long contigs, and the consensus sequences are computed by the majority rule. We also propose an iterative approach to improve the performance. Finally, MetaCompass has been successfully evaluated and tested on over 20 terabytes of metagenomic data sets generated from the Human Microbiome Project.
In addition, to facilitate the identification and characterization of antibiotic resistance genes, we have created Antibiotic Resistance Genes Database (ARDB), which provides a centralized compendium of information on antibiotic resistance. Furthermore, we have applied our tools to the analysis of a novel oral microbiome data set, and have discovered interesting functional mechanisms and ecological changes underlying the transition from health to periodontal disease of human mouth at a system level
Developing Algorithms for Quantifying the Super Resolution Microscopic Data: Applications to the Quantification of Protein-Reorganization in Bacteria Responding to Treatment by Silver Ions
Histone-like nucleoid structuring proteins (HNS) play significant roles in shaping the chromosomal DNA, regulation of transcriptional networks in microbes, as well as bacterial responses to environmental changes such as temperature fluctuations. In this work, the intracellular organization of HNS proteins in E. coli bacteria was investigated utilizing super-resolution fluorescence microscopy, which surpasses conventional microscopy by 10–20 fold in spatial resolution. More importantly, the changes of the spatial distribution of HNS proteins in E. coli, by addition of silver ions into the growth medium were explored. To quantify the spatial distribution of HNS in bacteria and its changes, an automatic method based on Voronoi diagram was implemented. The HNS proteins localized in super-resolution fluorescence microscopy were segmented and clustered based on several quantitative parameters, such as molecular areas, molecular densities, and mean inter-molecular distances of the k-th rank, all of which were computed from the Voronoi diagrams. These parameters, as well as the associated clustering analysis, allowed us to quantify how the spatial organization of HNS proteins responds to silver, and provided insight into understanding how microbes adapt to new environments
Perturbation Detection Through Modeling of Gene Expression on a Latent Biological Pathway Network: A Bayesian hierarchical approach
Cellular response to a perturbation is the result of a dynamic system of
biological variables linked in a complex network. A major challenge in drug and
disease studies is identifying the key factors of a biological network that are
essential in determining the cell's fate.
Here our goal is the identification of perturbed pathways from
high-throughput gene expression data. We develop a three-level hierarchical
model, where (i) the first level captures the relationship between gene
expression and biological pathways using confirmatory factor analysis, (ii) the
second level models the behavior within an underlying network of pathways
induced by an unknown perturbation using a conditional autoregressive model,
and (iii) the third level is a spike-and-slab prior on the perturbations. We
then identify perturbations through posterior-based variable selection.
We illustrate our approach using gene transcription drug perturbation
profiles from the DREAM7 drug sensitivity predication challenge data set. Our
proposed method identified regulatory pathways that are known to play a
causative role and that were not readily resolved using gene set enrichment
analysis or exploratory factor models. Simulation results are presented
assessing the performance of this model relative to a network-free variant and
its robustness to inaccuracies in biological databases
Development of a framework for the classification of antibiotics adjuvants
Dissertação de mestrado em BioInformaticsThroughout the last decades, bacteria have become increasingly resistant to available
antibiotics, leading to a growing need for new antibiotics and new drug development
methodologies. In the last 40 years, there are no records of the development of new
antibiotics, which has begun to shorten possible alternatives. Therefore, finding new
antibiotics and bringing them to market is increasingly challenging. One approach is finding
compounds that restore or leverage the activity of existing antibiotics against biofilm bacteria.
As the information in this field is very limited and there is no database regarding this theme,
machine learning models were used to predict the relevance of the documents regarding
adjuvants.
In this project, the BIOFILMad - Catalog of antimicrobial adjuvants to tackle biofilms
application was developed to help researchers save time in their daily research. This
application was constructed using Django and Django REST Framework for the backend
and React for the frontend.
As for the backend, a database needed to be constructed since no database entirely
focuses on this topic. For that, a machine learning model was trained to help us classify
articles. Three different algorithms were used, Support-Vector Machine (SVM), Random
Forest (RF), and Logistic Regression (LR), combined with a different number of features
used, more precisely, 945 and 1890. When analyzing all metrics, model LR-1 performed
the best for classifying relevant documents with an accuracy score of 0.8461, a recall score
of 0.6170, an f1-score of 0.6904, and a precision score of 0.7837. This model is the best at
correctly predicting the relevant documents, as proven by the higher recall score compared
to the other models. With this model, our database was populated with relevant information.
Our backend has a unique feature, the aggregation feature constructed with Named
Entity Recognition (NER). The goal is to identify specific entity types, in our case, it identifies CHEMICAL and DISEASE. An association between these entities was made, thus delivering
the user the respective associations, saving researchers time. For example, a researcher can
see with which compounds "pseudomonas aeruginosa" has already been tested thanks to this
aggregation feature.
The frontend was implemented so the user could access this aggregation feature, see the
articles present in the database, use the machine learning models to classify new documents,
and insert them in the database if they are relevant.Ao longo das últimas décadas, as bactérias tornaram-se cada vez mais resistentes aos
antibióticos disponíveis, levando a uma crescente necessidade de novos antibióticos e novas
metodologias de desenvolvimento de medicamentos. Nos últimos 40 anos, não há registos
do desenvolvimento de novos antibióticos, o que começa a reduzir as alternativas possíveis.
Portanto, criar novos antibióticos e torna-los disponíveis no mercado é cada vez mais
desafiante. Uma abordagem é a descoberta de compostos que restaurem ou potencializem a
atividade dos antibióticos existentes contra bactérias multirresistentes. Como as informações
neste campo são muito limitadas e não há uma base de dados sobre este tema, modelos
de Machine Learning foram utilizados para prever a relevância dos documentos acerca dos
adjuvantes.
Neste projeto, foi desenvolvida a aplicação BIOFILMad - Catalog of antimicrobial adjuvants
to tackle biofilms para ajudar os investigadores a economizar tempo nas suas pesquisas. Esta
aplicação foi construída usando o Django e Django REST Framework para o backend e React
para o frontend.
Quanto ao backend, foi necessário construir uma base de dados, pois não existe nenhuma
que se concentre inteiramente neste tópico. Para isso, foi treinado um modelo machine
learning para nos ajudar a classificar os artigos. Três algoritmos diferentes foram usados:
Support-Vector Machine (SVM), Random Forest (RF) e Logistic Regression (LR), combinados
com um número diferente de features, mais precisamente, 945 e 1890. Ao analisar todas as
métricas, o modelo LR-1 teve o melhor desempenho para classificar artigos relevantes com
uma accuracy de 0,8461, um recall de 0,6170, um f1-score de 0,6904 e uma precision de 0,7837.
Este modelo foi o melhor a prever corretamente os artigos relevantes, comprovado pelo
alto recall em comparação com os outros modelos. Com este modelo, a base de dados foi
populda com informação relevante.
O backend apresenta uma caracteristica particular, a agregação construída com Named-Entity-Recognition (NER). O objetivo é identificar tipos específicos de entidades, no nosso
caso, identifica QUÍMICOS e DOENÇAS. Esta classificação serviu para formar associações
entre entidades, demonstrando ao utilizador as respetivas associações feitas, permitindo
economizar o tempo dos investigadores. Por exemplo, um investigador pode ver com quais
compostos a "pseudomonas aeruginosa" já foi testada graças à funcionalidade de agregação.
O frontend foi implementado para que o utilizador possa ter acesso a esta
funcionalidade de agregação, ver os artigos presentes na base de dados, utilizar o modelo
de machine learning para classificar novos artigos e inseri-los na base de dados caso sejam
relevantes
- …