159 research outputs found

    Events Recognition System for Water Treatment Works

    Get PDF
    The supply of drinking water in sufficient quantity and required quality is a challenging task for water companies. Tackling this task successfully depends largely on ensuring a continuous high quality level of water treatment at Water Treatment Works (WTW). Therefore, processes at WTWs are highly automated and controlled. A reliable and rapid detection of faulty sensor data and failure events at WTWs processes is of prime importance for its efficient and effective operation. Therefore, the vast majority of WTWs operated in the UK make use of event detection systems that automatically generate alarms after the detection of abnormal behaviour on observed signals to ensure an early detection of WTW’s process failures. Event detection systems usually deployed at WTWs apply thresholds to the monitored signals for the recognition of WTW’s faulty processes. The research work described in this thesis investigates new methods for near real-time event detection at WTWs by the implementation of statistical process control and machine learning techniques applied for an automated near real-time recognition of failure events at WTWs processes. The resulting novel Hybrid CUSUM Event Recognition System (HC-ERS) makes use of new online sensor data validation and pre-processing techniques and utilises two distinct detection methodologies: first for fault detection on individual signals and second for the recognition of faulty processes and events at WTWs. The fault detection methodology automatically detects abnormal behaviour of observed water quality parameters in near real-time using the data of the corresponding sensors that is online validated and pre-processed. The methodology utilises CUSUM control charts to predict the presence of faults by tracking the variation of each signal individually to identify abnormal shifts in its mean. The basic CUSUM methodology was refined by investigating optimised interdependent parameters for each signal individually. The combined predictions of CUSUM fault detection on individual signals serves the basis for application of the second event detection methodology. The second event detection methodology automatically identifies faults at WTW’s processes respectively failure events at WTWs in near real-time, utilising the faults detected by CUSUM fault detection on individual signals beforehand. The method applies Random Forest classifiers to predict the presence of an event at WTW’s processes. All methods have been developed to be generic and generalising well across different drinking water treatment processes at WTWs. HC-ERS has proved to be effective in the detection of failure events at WTWs demonstrated by the application on real data of water quality signals with historical events from a UK’s WTWs. The methodology achieved a peak F1 value of 0.84 and generates 0.3 false alarms per week. These results demonstrate the ability of method to automatically and reliably detect failure events at WTW’s processes in near real-time and also show promise for practical application of the HC-ERS in industry. The combination of both methodologies presents a unique contribution to the field of near real-time event detection at WTW

    Vol. 11, No. 2 (Full Issue)

    Get PDF

    Small-Sample Analysis and Inference of Networked Dependency Structures from Complex Genomic Data

    Get PDF
    Die vorliegende Arbeit beschĂ€ftigt sich mit der statistischen Modellierung und Inferenz genetischer Netzwerke. Assoziationsstrukturen und wechselseitige EinflĂŒsse sind ein wichtiges Thema in der Systembiologie. Genexpressionsdaten weisen eine hohe DimensionalitĂ€t auf, die geringen StichprobenumfĂ€ngen gegenĂŒbersteht ("small n, large p"). Die Analyse von Interaktionsstrukturen mit Hilfe graphischer Modelle ist demnach ein schlecht gestelltes (inverses) Problem, dessen Lösung Methoden zur Regularisierung erfordert. Ich schlage neuartige SchĂ€tzfunktionen fĂŒr Kovarianzstrukturen und (partielle) Korrelationen vor. Diese basieren entweder auf Resampling-Verfahren oder auf Shrinkage zur Varianzreduktion. In der letzteren Methode wird die optimale Shrinkage IntensitĂ€t analytisch berechnet. Im Vergleich zur klassischen Stichprobenkovarianzmatrix besitzt speziell diese SchĂ€tzfunktion wĂŒnschenswerte Eigenschaften im Sinne von gesteigerter Effizienz und von kleinerem mittleren quadratischen Fehler. Außerdem ergeben sich stets positiv definite und gut konditionierte ParameterschĂ€tzungen. Zur Bestimmung der Netzwerktopologie wird auf das Konzept graphischer Gaußscher Modelle zurĂŒckgegriffen, mit deren Hilfe sich sowohl marginale als auch bedingte UnabhĂ€ngigkeiten darstellen lassen. Es wird eine Methode zur Modellselektion vorgestellt, die auf einer multiplen Testprozedur mit Kontrolle der False Discovery Rate beruht. Dabei wird die zugrunde liegende Nullverteilung adaptiv geschĂ€tzt. Das vorgeschlagene Framework ist rechentechnisch effizient und schneidet im Vergleich mit konkurrierenden Verfahren sowohl in Simulationen als auch in der Anwendung auf molekulare Daten sehr gut ab

    Bi-Directional Testing for Change Point Detection in Poisson Processes

    Full text link
    Point processes often serve as a natural language to chronicle an event\u27s temporal evolution, and significant changes in the flow, synonymous with non-stationarity, are usually triggered by assignable and frequently preventable causes, often heralding devastating ramifications. Examples include amplified restlessness of a volcano, increased frequencies of airplane crashes, hurricanes, mining mishaps, among others. Guessing these time points of changes, therefore, merits utmost care. Switching the way time traditionally propagates, we posit a new genre of bidirectional tests which, despite a frugal construct, prove to be exceedingly efficient in culling out non-stationarity under a wide spectrum of environments. A journey surveying a lavish class of intensities, ranging from the tralatitious power laws to the deucedly germane rough steps, tracks the established unidirectional forward and backward test\u27s evolution into a p-value induced dual bidirectional test, the best member of the proffered category. Niched within a hospitable Poissonian framework, this dissertation, through a prudent harnessing of the bidirectional category\u27s classification prowess, incites a refreshing alternative to estimating changes plaguing a soporific flow, by conducting a sequence of tests. Validation tools, predominantly graphical, rid the structure of forbidding technicalities, aggrandizing the swath of applicability. Extensive simulations, conducted especially under hostile premises of hard non-stationarity detection, document minimal estimation error and reveal the algorithm\u27s obstinate versatility at its most unerring

    Identification of genetic drivers of colorectal cancer via bioinformatics and machine learning

    Get PDF
    Machine learning methods have been widely used in a range of areas within genetics and genomics, it is maybe one of the most useful tools for the interpretation of large genomic data sets and has been used to annotate and analyse a wide variety of genomic sequence elements due to its ability to analyze and learn how to extract data insights from large heterogeneous data sets. In this work, we mainly focus on identifying gene markers that are associated with an increased risk of colorectal cancer (CRC) one of the most common cancers worldwide, showing the highest mortality. In this research, we look into feature selection methods based on variant relevancy toward the development of hereditary diseases. With this approach, we aim to find rel- evant frequently occurring variants and also rare variant occurrences, this way we will identify potentially valuable disease biomarkers. We analysed 8339 different variants and determined 765 to be relevant to CRC. We will also use feature clustering methods for the identification of co-occurrence between certain genetic variants, this will allow us to identify genetic links and non-co-occurring variants that are both rare and associated with an increased risk of development of CRC. Using this method we can determine differ- ent co-occurring variant groups with an additional one being composed of independent variants. We expect the identification of these gene markers to allow for better clinical manage- ment of the patients, namely due to the identification of genetic predispositions to CRC that will allow for a better risk assessment of patients and change the type of exams to be performed and their frequency, which will have a strong impact not only on their clinical screening but also on that of their family members, this can allow for early identification of tumours or even benign lesions, therefore contributing to CRC prevention. We believe that this study will contribute to the overall understanding of CRC causes and will further advance the study of its prevention. We also expect to give insights on how to identify the biological mechanisms underlying gene variant occurrences for not only CRC but also other hereditary cancer syndromes.MĂ©todos de aprendizagem automĂĄtica tĂȘm sido amplamente utilizados em diversas ĂĄreas dentro da genĂ©tica e genĂŽmica. A aprendizagem automĂĄtica Ă© talvez uma das ferramentas mais Ășteis para a interpretação de grandes conjuntos de dados genĂŽmicos e tem sido usado para anotar e analisar uma ampla variedade de elementos de sequĂȘncias genĂŽ- micas. A sua capacidade para analisar e aprender a extraindo informação de grandes conjuntos de dados heterogĂ©neos. Vamos nos concentrar principalmente na identificação de marcadores genĂ©ticos que estĂŁo associados a um risco aumentado de cancro colo-retal (CCR), um dos cancros mais comuns em todo o mundo, apresentando uma das maiores mortalidades. Neste estudo, analisamos os mĂ©todos de feature selection com base na relevĂąncia da variante genĂ©tica para o desenvolvimento de CCR. Com estes mĂ©todos, pretendemos en- contrar variantes relevantes que ocorrem com frequĂȘncia e tambĂ©m variantes raras, desta forma identificaremos biomarcadores potencialmente valiosos. Analisamos 8339 varian- tes diferentes e determinamos que 765 sĂŁo relevantes para o desenvolvimento de CCR. TambĂ©m usaremos mĂ©todos de clustering de variantes genĂ©ticas para a identificação de correlação entre certas variantes genĂ©ticas, o que nos permitirĂĄ identificar ligaçÔes genĂ©ti- cas e ocorrĂȘncias de variantes independentes que estĂŁo associadas a um risco aumentado de desenvolvimento de CCR. Usando esse mĂ©todo, determinamos que hĂĄ 4 diferentes gru- pos de variantes relevantes, sendo um adicional composto por variantes independentes. Esperamos que a identificação destes marcadores genĂ©ticos permita uma melhor ges- tĂŁo clĂ­nica dos doentes, nomeadamente devido Ă  identificação de predisposiçÔes genĂ©ticas para CCR que permitirĂŁo uma melhor avaliação do risco dos doentes e alterar o tipo de exames a serem realizados e a sua frequĂȘncia, que terĂĄ forte impacto nĂŁo sĂł na sua triagem clĂ­nica, mas tambĂ©m na dos seus familiares, isto pode permitir a identificação precoce de tumores ou mesmo lesĂ”es benignas, contribuindo assim para a prevenção de CCR. Acreditamos que este estudo contribuirĂĄ para a compreensĂŁo geral das causas CCR e avançarĂĄ o estudo da sua prevenção. TambĂ©m esperamos fornecer mĂ©todos de como identificar os mecanismos biolĂłgicos subjacentes Ă s ocorrĂȘncias de variantes genĂ©ticas nĂŁo apenas para CCR, mas tambĂ©m para outras sĂ­ndromes de cĂąncer hereditĂĄrio

    From tools and databases to clinically relevant applications in miRNA research

    Get PDF
    While especially early research focused on the small portion of the human genome that encodes proteins, it became apparent that molecules responsible for many key functions were also encoded in the remaining regions. Originally, non-coding RNAs, i.e., molecules that are not translated into proteins, were thought to be composed of only two classes (ribosomal RNAs and transfer RNAs). However, starting from the early 1980s many other non-coding RNA classes were discovered. In the past two decades, small non-coding RNAs (sncRNAs) and in particular microRNAs (miRNAs), have become essential molecules in biological and biomedical research. In this thesis, five aspects of miRNA research have been addressed. Starting from the development of advanced computational software to analyze miRNA data (1), an in-depth understanding of human and non-human miRNAs was generated and databases hosting this knowledge were created (2). In addition, the effects of technological advances were evaluated (3). We also contributed to the understanding on how miRNAs act in an orchestrated manner to target human genes (4). Finally, based on the insights gained from the tools and resources of the mentioned aspects we evaluated the suitability of miRNAs as biomarkers (5). With the establishment of next-generation sequencing, the primary goal of this thesis was the creation of an advanced bioinformatics analysis pipeline for high-throughput miRNA sequencing data, primarily focused on human. Consequently, miRMaster, a web-based software solution to analyze hundreds sequencing samples within few hours was implemented. The tool was implemented in a way that it could support different sequencing technologies and library preparation techniques. This flexibility allowed miRMaster to build a consequent user-base, resulting in over 120,000 processed samples and 1,5 billion processed reads, as of July 2021, and therefore laid out the basis for the second goal of this thesis. Indeed, the implementation of a feature allowing users to share their uploaded data contributed strongly to the generation of a detailed annotation of the human small non-coding transcriptome. This annotation was integrated into a new miRNA database, miRCarta, modelling thousands of miRNA candidates and corresponding read expression profiles. A subset of these candidates was then evaluated in the context of different diseases and validated. The thereby gained knowledge was subsequently used to validate additional miRNA candidates and to generate an estimate of the number of miRNAs in human. The large collection of samples, gathered over many years with miRMaster was also integrated into a web server evaluating miRNA arm shifts and switches, miRSwitch. Finally, we published an updated version of miRMaster, expanding its scope to other species and adding additional downstream analysis capabilities. The second goal of this thesis was further pursued by investigating the distribution of miRNAs across different human tissues and body fluids, as well as the variability of miRNA profiles over the four seasons of the year. Furthermore, small non-coding RNAs in zoo animals were examined and a tissue atlas of small non-coding RNAs for mice was generated. The third goal, the assessment of technological advances, was addressed by evaluating the new combinatorial probe-anchor synthesis-based sequencing technology published by BGI, analyzing the effect of RNA integrity on sequencing data, analyzing low-input library preparation protocols, and comparing template-switch based library preparation protocols to ligation-based ones. In addition, an antibody-based labeling sequencing chemistry, CoolMPS, was investigated. Deriving an understanding of the orchestrated regulation by miRNAs, the fourth goal of this thesis, was pursued in a first step by the implementation of a web server visualizing miRNA-gene interaction networks, miRTargetLink. Subsequently, miRPathDB, a database incorporating pathways affected by miRNAs and their targets was implemented, as well as miEAA 2.0, a web server offering quick miRNA set enrichment analyses in over 130,000 categories spanning 10 different species. In addition, miRSNPdb, a database evaluating the effects of single nucleotide polymorphisms and variants in miRNAs or in their target genes was created. Finally, the fifth goal of the thesis, the evaluation of the suitability of miRNAs as biomarkers for human diseases was tackled by investigating the expression profiles of miRNAs with machine learning. An Alzheimer's disease cohort with over 400 individuals was analyzed, as well as another neurodegenerative disease cohort with multiple time points of Parkinson's disease patients and healthy controls. Furthermore, a lung cancer cohort covering 3,000 individuals was examined to evaluate the suitability of an early detection test. In addition, we evaluated the expression profile changes induced by aging on a cohort of 1,334 healthy individuals and over 3,000 diseased patients. Altogether, the herein described tools, databases and research papers present valuable advances and insights into the miRNA research field and have been used and cited by the research community over 2,000 times as of July 2021.WĂ€hrend insbesondere die frĂŒhe Genetik-Forschung sich auf den kleinen Teil des menschlichen Genoms konzentrierte, der fĂŒr Proteine kodiert, wurde deutlich, dass auch in den ĂŒbrigen Regionen MolekĂŒle kodiert werden, die fĂŒr viele wichtige Funktionen verantwortlich sind. UrsprĂŒnglich ging man davon aus, dass nicht codierende RNAs, d. h. MolekĂŒle, die nicht in Proteine ĂŒbersetzt werden, nur aus zwei Klassen bestehen (ribosomale RNAs und Transfer-RNAs). Seit den frĂŒhen 1980er Jahren wurden jedoch viele andere nicht-kodierende RNA-Klassen entdeckt. In den letzten zwei Jahrzehnten sind kleine nichtcodierende RNAs (sncRNAs) und insbesondere microRNAs (miRNAs) zu wichtigen MolekĂŒlen in der biologischen und biomedizinischen Forschung geworden. In dieser Arbeit werden fĂŒnf Aspekte der miRNA-Forschung behandelt. Ausgehend von der Entwicklung fortschrittlicher Computersoftware zur Analyse von miRNA-Daten (1) wurde ein tiefgreifendes VerstĂ€ndnis menschlicher und nicht-menschlicher miRNAs entwickelt und Datenbanken mit diesem Wissen erstellt (2). DarĂŒber hinaus wurden die Auswirkungen des technologischen Fortschritts bewertet (3). Wir haben auch dazu beigetragen, zu verstehen, wie miRNAs koordiniert agieren, um menschliche Gene zu regulieren (4). Schließlich bewerteten wir anhand der Erkenntnisse, die wir mit den Tools und Ressourcen der genannten Aspekte gewonnen hatten, die Eignung von miRNAs als Biomarker (5). Mit der Etablierung der Sequenzierung der nĂ€chsten Generation war das primĂ€re Ziel dieser Arbeit die Schaffung einer fortschrittlichen bioinformatischen Analysepipeline fĂŒr Hochdurchsatz-MiRNA-Sequenzierungsdaten, die sich in erster Linie auf den Menschen konzentriert. Daher wurde miRMaster, eine webbasierte Softwarelösung zur Analyse von Hunderten von Sequenzierproben innerhalb weniger Stunden, implementiert. Das Tool wurde so implementiert, dass es verschiedene Sequenzierungstechnologien und Bibliotheksvorbereitungstechniken unterstĂŒtzen kann. Diese FlexibilitĂ€t ermöglichte es miRMaster, eine konsequente Nutzerbasis aufzubauen, die im Juli 2021 ĂŒber 120.000 verarbeitete Proben und 1,5 Milliarden verarbeitete Reads umfasste, womit die Grundlage fĂŒr das zweite Ziel dieser Arbeit geschaffen wurde. Die Implementierung einer Funktion, die es den Nutzern ermöglicht, ihre hochgeladenen Daten mit anderen zu teilen, trug wesentlich zur Erstellung einer detaillierten Annotation des menschlichen kleinen nicht-kodierenden Transkriptoms bei. Diese Annotation wurde in eine neue miRNA-Datenbank, miRCarta, integriert, die Tausende von miRNA-Kandidaten und entsprechende Expressionsprofile abbildet. Eine Teilmenge dieser Kandidaten wurde dann im Zusammenhang mit verschiedenen Krankheiten bewertet und validiert. Die so gewonnenen Erkenntnisse wurden anschließend genutzt, um weitere miRNA-Kandidaten zu validieren und eine SchĂ€tzung der Anzahl der miRNAs im Menschen vorzunehmen. Die große Sammlung von Proben, die ĂŒber viele Jahre mit miRMaster gesammelt wurde, wurde auch in einen Webserver integriert, der miRNA-Armverschiebungen und -Wechsel auswertet, miRSwitch. Schließlich haben wir eine aktualisierte Version von miRMaster veröffentlicht, die den Anwendungsbereich auf andere Spezies ausweitet und zusĂ€tzliche Downstream-Analysefunktionen hinzufĂŒgt. Das zweite Ziel dieser Arbeit wurde weiterverfolgt, indem die Verteilung von miRNAs in verschiedenen menschlichen Geweben und KörperflĂŒssigkeiten sowie die VariabilitĂ€t der miRNA-Profile ĂŒber die vier Jahreszeiten hinweg untersucht wurde. DarĂŒber hinaus wurden kleine nichtkodierende RNAs in Zootieren untersucht und ein Gewebeatlas der kleinen nichtkodierenden RNAs fĂŒr MĂ€use erstellt. Das dritte Ziel, die EinschĂ€tzung des technologischen Fortschritts, wurde angegangen, indem die neue kombinatorische Sonden-Anker-Synthese-basierte Sequenzierungstechnologie, die vom BGI veröffentlicht wurde, bewertet wurde, die Auswirkungen der RNA-IntegritĂ€t auf die Sequenzierungsdaten analysiert wurden, Protokolle fĂŒr die Bibliotheksvorbereitung mit geringem Input analysiert wurden und Protokolle fĂŒr die Bibliotheksvorbereitung auf der Basis von Template-Switch mit solchen auf Ligationsbasis verglichen wurden. DarĂŒber hinaus wurde eine auf Antikörpern basierende Labeling-Sequenzierungschemie, CoolMPS, untersucht. Das vierte Ziel dieser Arbeit, das VerstĂ€ndnis der orchestrierten Regulation durch miRNAs, wurde in einem ersten Schritt durch die Implementierung eines Webservers zur Visualisierung von miRNA-Gen-Interaktionsnetzwerken, miRTargetLink, verfolgt. Anschließend wurde miRPathDB implementiert, eine Datenbank, die von miRNAs und ihren Zielgenen beeinflusste Pfade enthĂ€lt, sowie miEAA 2.0, ein Webserver, der schnelle miRNA-Anreicherungsanalysen in ĂŒber 130.000 Kategorien aus 10 verschiedenen Spezies bietet. DarĂŒber hinaus wurde miRSNPdb, eine Datenbank zur Bewertung der Auswirkungen von Einzelnukleotid-Polymorphismen und Varianten in miRNAs oder ihren Zielgenen, erstellt. Schließlich wurde das fĂŒnfte Ziel der Arbeit, die Bewertung der Eignung von miRNAs als Biomarker fĂŒr menschliche Krankheiten, durch die Untersuchung der Expressionsprofile von miRNAs anhand von maschinellem Lernen angegangen. Eine Alzheimer-Kohorte mit ĂŒber 400 Personen wurde analysiert, ebenso wie eine weitere neurodegenerative Krankheitskohorte mit Parkinson-Patienten an mehreren Zeitpunkten der Krankheit und gesunden Kontrollen. Außerdem wurde eine Lungenkrebskohorte mit 3.000 Personen untersucht, um die Eignung eines FrĂŒherkennungstests zu bewerten. DarĂŒber hinaus haben wir die altersbedingten VerĂ€nderungen des Expressionsprofils bei einer Kohorte von 1.334 gesunden Personen und ĂŒber 3.000 kranken Patienten untersucht. Insgesamt stellen die hier beschriebenen Tools, Datenbanken und Forschungsarbeiten wertvolle Fortschritte und Erkenntnisse auf dem Gebiet der miRNA-Forschung dar und wurden bis Juli 2021 von der Forschungsgemeinschaft ĂŒber 2.000 Mal verwendet und zitiert

    Quantitative Analysis of Proteome Dynamics in Chinese Hamster Ovary cells

    Get PDF
    The overall goal of this research was to better understand the mechanisms underlying the physiology of CHO cells, the most important mammalian host for recombinant protein production. The publication of complete genome of CHO cells allowed the use of mass-spectrometry based proteomic tools to study protein expression. Among several different sample preparation methods for mass spectrometry, in-gel trypsin digest and FASP were found to be the most robust and optimal for high-coverage CHO proteome analysis. Global changes in protein expression between exponential and stationary phases were determined using SILAC for parental GS K-O and producing E22 cell lines. >4000 proteins have been quantified and more than 100 proteins have been statistically differentiated. Proteins up-regulated in exponential phase control cell cycle and DNA replication, while proteins up-regulated in the stationary phase are involved in stress response and signalling, making them interesting targets for cellular engineering. In addition to quantifying relative changes in protein expression between two phases of cell culture, more than 4000 protein copy numbers were calculated for parental and producing cell lines using TPA method. Protein turnover, described as the balance between protein synthesis and degradation, was calculated for >3000 cellular proteins. Combining these two parameters together allowed determination of top 10 proteins corresponding to 20% of global turnover rate. Production of monoclonal antibody was top priority, causing metabolic burden on cells. KEGG and GO annotation suggests that 600 up-regulated proteins in E22 producing cell line explained their clonal selection based on highest growth and productivity. Interestingly, there was no major differences found between amino acid and codon usage between parental and producing cell lines. In summary, a large-scale proteomic data set containing qualitative, quantitative and dynamic information on protein expression for industrially relevant CHO cell lines

    Diving into the depth of primary motor cortex: a high-resolution investigation of the motor system using 7Tesla fMRI

    Get PDF
    Dissertação para a obtenção do Grau de Mestre em Engenharia BiomĂ©dicaHuman behaviour is grounded in our ability to perform complex tasks. While human motor function has been studied for over a century the cortical processes underlying motor behaviour are still under debate. Central to the execution of action is the primary motor cortex (M1), which has previously been considered to be responsible for the execution of movements planned in the premotor cortex, yet recent studies point to more complex roles for M1 in orchestrating motor-related information. The purpose of this project is to study the functional properties of primary motor cortex using ultra-high fMRI. The spatial resolution made possible by using a high field magnet allows us to investigate novel questions such as the existence of cortical columns, the functional organization pattern for single fingers and functional involvement of M1 in motor imagery and observation. Thirteen young healthy subjects participated in this study. Functional and anatomical high resolution images were acquired. Four functional scans were acquired for the different tasks: motor execution; motor imagery; movement observation and rest. The paradigm used was a randomized finger tapping. The images analysis was performed with the Brainvoyager QX program. Using the novel high resolution cortical grid sampling analysis tools, different cortical laminas of human M1 were examined. Our results reveal a distributed pattern (intermingled with somatotopic “hot spots”) for single fingers activity in M1. Furthermore we show novel evidence of columnar structures in M1 and show that non motor tasks such as motor imagery and action observation also activate this region. We conclude that the primary motor cortex has much more un-expected complex roles regarding the processing of movement related information, not only due to their involvement in tasks that do not imply muscle movement, but also due to their intriguing organization pattern

    A communal catalogue reveals Earth’s multiscale microbial diversity

    Get PDF
    Our growing awareness of the microbial world’s importance and diversity contrasts starkly with our limited understanding of its fundamental structure. Despite recent advances in DNA sequencing, a lack of standardized protocols and common analytical frameworks impedes comparisons among studies, hindering the development of global inferences about microbial life on Earth. Here we present a meta-analysis of microbial community samples collected by hundreds of researchers for the Earth Microbiome Project. Coordinated protocols and new analytical methods, particularly the use of exact sequences instead of clustered operational taxonomic units, enable bacterial and archaeal ribosomal RNA gene sequences to be followed across multiple studies and allow us to explore patterns of diversity at an unprecedented scale. The result is both a reference database giving global context to DNA sequence data and a framework for incorporating data from future studies, fostering increasingly complete characterization of Earth’s microbial diversity

    A communal catalogue reveals Earth's multiscale microbial diversity

    Get PDF
    Our growing awareness of the microbial world's importance and diversity contrasts starkly with our limited understanding of its fundamental structure. Despite recent advances in DNA sequencing, a lack of standardized protocols and common analytical frameworks impedes comparisons among studies, hindering the development of global inferences about microbial life on Earth. Here we present a meta-analysis of microbial community samples collected by hundreds of researchers for the Earth Microbiome Project. Coordinated protocols and new analytical methods, particularly the use of exact sequences instead of clustered operational taxonomic units, enable bacterial and archaeal ribosomal RNA gene sequences to be followed across multiple studies and allow us to explore patterns of diversity at an unprecedented scale. The result is both a reference database giving global context to DNA sequence data and a framework for incorporating data from future studies, fostering increasingly complete characterization of Earth's microbial diversity.Peer reviewe
    • 

    corecore