17 research outputs found

    Tilbake til det grunnleggende : forenkling av mikrobielle samfunn for å tolke komplekse interaksjoner

    Get PDF
    Microbes are everywhere and contribute to many essential processes relevant for planet Earth, ranging from biogeochemical cycles to complex human behavior. The means to achieve these colossal tasks for such small and, at first glance, simple organisms rely on their ability to assemble in heterogeneous communities in which populations with different taxonomies and functions coexist and complement each other. Some microbes are of particular interest for human civilization and have long been used for everyday tasks, such as the production of bread and wine. More recently, large-scale industrial and civil projects have taken advantage of the transformative capabilities of microbial communities, with key examples being biogas reactors, mining and wastewater treatment. Decades of classical microbiology, based on pure culture isolates and their physiological characterization, have built the foundations of modern microbial ecology. Molecular analysis of microbes and microbial communities has generated an understanding that for many microbial populations cultivation is hard to achieve and that breaking a community apart impacts its function. These limitations have driven the development of technical tools that bring us directly in contact with communities in their natural environment. In the mid 2000’s the recently established “omics” techniques were quickly adapted to their “meta-omics” version, enabling direct analysis of the microbial samples without culture. Every class of molecules (DNA, RNA, protein, metabolite, etc.) can now theoretically be analyzed from the entire community within a given sample. Metagenomics uses community DNA to build the phylogenetic picture and the genetic potential, whereas metatranscriptomics and metaproteomics employ RNA and proteins respectively to inquire the gene expression of the community. Finally, meta-metabolomics can close the loop and describe the metabolic activity of the microbes. Here, we combined the four aforementioned major meta-omics disciplines in a gene- and population-centric perspective to re-iterate the same Aristotelian question underlying microbial ecology: how is it possible that the whole is more than the sum of its parts? Along the detailed answers provided by the individual communities in various environments, we also tried to learn something about biology itself. We first addressed in a saccharolytic and methane-producing minimalistic consortium (SEM1b), the strain-specific interplay engaged in (hemi)cellulose degradation, explaining the ubiquity of Coprothermobacter proteolyticus in biogas reactors. We showed through the genetic potential of the C. proteolyticus-affiliated COPR1 population, the putative acquisition via horizontal gene transfer of a gene cassette for hemicellulose degradation. Moreover, we showed how the gene expression of these COPR1 genes were both coherent with the release of hemicellulose by another population of the community (RCLO1) and synced with the gene expression of the orthologous genes of an already known hemicellulolytic population (CLOS1). Conclusively, we demonstrated how the same purified COPR1 protein (Glycosyl Hydrolases 16) showed endoglucanase activity on several hemicellulose substrates. Secondly, we explored the combined application of absolute omics-based quantification of RNA and proteins using SEM1b as a benchmark community, due to its lower complexity (less than 12 populations) and relatively resolved biology. We subsequently demonstrated that the uncultured bacterial populations in SEM1b followed the expected protein-to-RNA ratio (102-104) of previously analyzed cultured bacteria in exponential phase. In contrast, an archaeon population from SEM1b showed values in the range 103-105, the same as what has been reported for eukaryotes (yeast and human) in the literature. In addition, we modeled the linearity (k) between genome-centric transcriptomes and proteomes over time and used it to predict the essential metabolic populations of the SEM1b community through converging and parallel k-trends, which was subsequently confirmed via classical pathway analysis. Finally, we estimated the translation and the protein degradation rates, coming to the conclusion that some of the processes in the cell that require a rapid tuning (e.g. metabolism and motility) are regulated (also) post-transcriptionally. Thirdly we sought to apply our approach of collapsing complex datasets into simplistic metrics in order to identify underlying community trends, onto a more complex and “real-world” microbiome. To do this, we resolved more than one year of weekly sampling from a lipid-accumulating community (Shif-LAO) that inhabits a wastewater treatment in Shifflange (Luxembourg), and showed an extreme genetic redundancy and turnover in contrast to a more conservative trend in functions. Moreover, we demonstrated how the time patterns (e.g. seasonality) in both gene count and gene expression are linked with the physico-chemical parameters associated with the corresponding samples. Furthermore, we built the static reaction network underlying the whole community over the complete dataset (51 temporal samples). From this, we characterized the sub-network for lipid accumulation, and showed that its more expressed nodes were defined by resource competition between different taxa (deduced via inverse taxonomic richness and gene expression over time). In contrast, the nitrogen metabolism sub-network instead exhibited a dominant taxon and a keystone ammonia oxidizing monooxygenase, the first enzyme of ammonia oxidation, which may lead to the production of nitrous gas (a powerful greenhouse gas). Overall, our results presented in this thesis build a comprehensive repertoire of interactions in microbial communities ranging from a simplistic (10’s of populations) consortium to a natural complex microbiome (100’s of populations). These were ultimately uncovered using an array of techniques, including unsupervised gene expression clustering, pathway analysis, reaction networks, co-expression networks, eigengenes and linearity trends between transcriptome and proteome. Moreover, we learnt that to achieve a full understanding of microbial ecology and detailed interactions, we need to integrate all the meta-omics layers quantified with absolute measurements. However, when scaling these approaches to real-world communities the massive amounts of generated data brings new challenges and necessitates simplifying strategies to reduce complexity and extrapolate ecological trends.Mikroorganismer er overalt og de bidrar til mange essensielle prosesser som er viktige for planeten vår, alt fra biokjemiske sykluser til kompleks menneskelig oppførsel. Midlene disse små, og ved første øyekast enkle organismene bruker for å oppnå så betydelige oppgaver på, ligger i deres evne til å forenes i et heterogent samfunn der ulike populasjoner med en forskjellig taksonomi og funksjoner sameksisterer og utfyller hverandre. Noen mikrobielle samfunn er av særlig interesse for oss mennesker, og har i lang tid blitt utnyttet i hverdagslige gjøremål, slik som produksjon av brød og vin. I senere tid har også stor-skala industri og kommunale anlegg, for eksempel biogass reaktorer og renseanlegg, dratt nytte av mikrobesamfunns evne til å transformere. Tiår med klassisk mikrobiologi, basert på dyrking og fysiologisk karakterisering av renkulturer har bygget grunnlaget for moderne mikrobiell økologi. Molekylære analyser av mikrober og mikrobielle samfunn har resultert i forståelsen om at mange mikrobielle populasjoner er vanskelige å kultivere, og at en oppdeling av samfunnet vil påvirke dens funksjoner. Disse begrensningene har vært en drivkraft for utviklingen av tekniske verktøy som kan bringe oss i direkte kontakt med mikrobesamfunnet i deres naturlige miljø. I midten av 2000-talles ble de nylig etablerte «omikk»-teknikkene raskt adoptert til også å gjelde «meta-omikk», som muliggjør direkte analysering av mikrobielle samfunn uten kultivering. I dag kan i teorien hver molekylerære klasse (DNA, RNA, proteiner, metabolitter, osv.) bli analysert fra hele mikrobesamfunn i en bestemt prøve. I metagenomikk benyttes DNA-innholdet til å konstruere et fylogenetisk bilde av samfunnet og det genetiske potensiale, mens metatranskriptomikk og metaproteomikk bruker henholdsvis RNA og proteiner for å se på gen-uttrykket i samfunnet. Meta-metabolomikk kan slutte sirkelen ved å beskrive den metabolske aktiviteten til mikrobene. I arbeidet som ligger til grunn for denne avhandlingen, kombinerte vi fire av de nevnte fagfeltene innen meta-omikk i et gen- og populasjons-orientert perspektiv for å gjenta det samme Aristoteliske spørsmålet bak mikrobiell økologi: hvordan er det mulig at helheten er større enn summen av enkeltdelene? Sammen med de detaljerte svarene som ble gitt av de enkelte mikrobesamfunnene i ulike miljøer, forsøkte vi også å lære noe om biologi i seg selv. Først adresserte vi det stamme-spesifikke samspillet involvert i (hemi)cellulose degradering i et sakkarolytisk og metan-produserende minimalistisk konsortium (SEM1b), som belyser omfanget av Coprothermobacter proteolyticus i biogass reaktorer. Gjennom det genetiske potensiale til COPR1-populasjonen tilknyttet C. proteolyticus, viste vi den antatte ervervelsen, via horisontal gen-overføring, av en gen-kassett for nedbrytning av hemicellulose. Videre viste vi hvordan genuttrykket til disse COPR1-genene var i samsvar med frigivelsen av hemicellulose av en annen populasjon i samfunnet (RCLO1), og synkronisert med genuttrykket av de ortologe genene fra en allerede kjent hemicellulolytisk populasjon (CLOS1). Avslutningsvis demonstrerte vi hvordan det samme rensede COPR1-proteinet (glykosid-hydrolase 16) viste endoglukanase-aktivitet på flere hemicellulosesubstrater. På grunn av lavere kompleksitet (færre enn 12 populasjoner) og en relativt kjent biologi, benytte vi SEM1b videre som et referansesamfunn for å utforske den kombinerte anvendelsen av absolutt omikk-basert kvantifisering av RNA og proteiner. Vi demonstrerte deretter at de ukultiverte bakterie-populasjonene i SEM1b fulgte en protein-til-RNA ratio (102-104) som var forventet basert på tidligere analyser av bakteriekulturer i eksponentiell fase. I kontrast til dette viste en arkeonpopulasjon fra SEM1b verdier i området mellom 103-105, som er det samme som tidligere rapportert i litteraturen for eukaryote (gjær og menneske). I tillegg modellerte vi lineariteten (k) mellom genom-orienterte transkriptomer og proteomer over tid, og brukte dette til å forutsi de essensielle metabolsk populasjon i SEM1b-samfunnet gjennom konvergerende og parallelle k-trender, som senere ble bekreftet via klassiske analyser av metabolske synteseveier. Til slutt estimerte vi frekvensen av translasjon og protein degradering, hvorpå vi konkluderte med at noen av prosessene i en celle som krever rask innstilling (som for eksempel metabolisme og bevegelse) er regulert (også) post- transkripsjonelt. Til slutt ønsket vi å anvende vår tilnærming for å sette komplekse datasett inn i forenklede matriser for å identifisere underliggende trender i mikrosamfunnet, på et mer komplekst og virkelighetsnært mikrobiom. Til dette benyttet vi et mer enn ett år med ukentlige prøvetakninger fra en lipid-akkumulerende mikrobesamfunn (Shif-LAO) i et renseanlegg i Shifflange (Luxembourg), og avdekket en ekstrem genetisk redundans og turnover, i motsetning til en mer konservativ trend i funksjoner. Videre demonstrerte vi hvordan tidsavhengige mønstre (som for eksempel sesongvariasjoner) i både antall gener og genuttrykk er knyttet til fysisk-kjemiske parameter assosiert med de tilsvarende prøvene. I tillegg rekonstruerte vi det underliggende statiske reaksjonsnettverket til mikrobesamfunnet over hele datasettet (51 prøver over tid). Basert på dette, karakteriserte vi sub-nettverk for lipid-akkumulering, og demonstrerte at mer uttrykte noder var definert av konkurransen om ressurser mellom ulike taksonomiske grupper (antatt via reversert taksonomisk diversitet og genuttrykk over tid). I motsetning til dette, viste nettverket for nitrogen-metabolismen i stedet et dominerende taxon og en keystone ammoniakk-oksiderende monooxygenase, det første enzymet i ammoniakk oksidasjon, som fører til produksjonen av lystgass (en svært sterk klimagass). Resultatene presentert i denne doktorgradsavhandlingen bygger på et omfattende repertoar av interaksjoner i mikrobielle samfunn som spenner fra et forenklet konsortium (titalls populasjoner) til et naturlig komplekst mikrobiom (hundretalls populasjoner). Disse mikrobiomene ble til slutt kartlagt ved hjelp av en rekke teknikker, blant annet unsupervised gruppering av genutrykk, analyser av metabolisk synteseveier, nettverk av reaksjoner og co-uttrykte gener, eigengener og lineære trender mellom transkriptom og proteom. I tillegg erfarte vi at for å oppnå en full forståelse av mikrobiell økologi og detaljerte interaksjoner må vi integrere alle lagene av meta-omikk, kvantifisert med absolutte målinger. Når man oppskalering disse tilnærmingen til virkelige mikrobesamfunn, bringer imidlertid enorme mengder generert data til nye utfordringer som nødvendiggjør en forenkling av strategier for å redusere kompleksiteten og ekstrapolerer økologiske trender

    The Telecommunications and Data Acquisition Report

    Get PDF
    Deep Space Network advanced systems, very large scale integration architecture for decoders, radar interface and control units, microwave time delays, microwave antenna holography, and a radio frequency interference survey are among the topics discussed

    Parallel and scalable combinatorial string algorithms on distributed memory systems

    Get PDF
    Methods for processing and analyzing DNA and genomic data are built upon combinatorial graph and string algorithms. The advent of high-throughput DNA sequencing is enabling the generation of billions of reads per experiment. Classical and sequential algorithms can no longer deal with these growing data sizes - which for the last 10 years have greatly out-paced advances in processor speeds. Processing and analyzing state-of-the-art genomic data sets require the design of scalable and efficient parallel algorithms and the use of large computing clusters. Suffix arrays and trees are fundamental string data structures, which lie at the foundation of many string algorithms, with important applications in text processing, information retrieval, and computational biology. Conversely, the parallel construction of these indices is an actively studied problem. However, prior approaches lacked good worst-case run-time guarantees and exhibit poor scaling and overall performance. In this work, we present our distributed-memory parallel algorithms for indexing large datasets, including algorithms for the distributed construction of suffix arrays, LCP arrays, and suffix trees. We formulate a generalized version of the All-Nearest-Smaller-Values problem, provide an optimal distributed solution, and apply it to the distributed construction of suffix trees - yielding a work-optimal parallel algorithm. Our algorithms for distributed suffix array and suffix tree construction improve the state-of-the-art by simultaneously improving worst-case run-time bounds and achieving superior practical performance. Next, we introduce a novel distributed string index, the Distributed Enhanced Suffix Array (DESA) - based on the suffix and LCP arrays, the DESA consists of these and additional distributed data structures. The DESA is designed to allow efficient pattern search queries in distributed memory while requiring at most O(n/p) memory per process. We present efficient distributed-memory parallel algorithms for querying, as well as for the efficient construction of this distributed index. Finally, we present our work on distributed-memory algorithms for clustering de Bruijn graphs and its application to solving a grand challenge metagenomic dataset.Ph.D

    Efficient approximate string matching techniques for sequence alignment

    Get PDF
    One of the outstanding milestones achieved in recent years in the field of biotechnology research has been the development of high-throughput sequencing (HTS). Due to the fact that at the moment it is technically impossible to decode the genome as a whole, HTS technologies read billions of relatively short chunks of a genome at random locations. Such reads then need to be located within a reference for the species being studied (that is aligned or mapped to the genome): for each read one identifies in the reference regions that share a large sequence similarity with it, therefore indicating what the read¿s point or points of origin may be. HTS technologies are able to re-sequence a human individual (i.e. to establish the differences between his/her individual genome and the reference genome for the human species) in a very short period of time. They have also paved the way for the development of a number of new protocols and methods, leading to novel insights in genomics and biology in general. However, HTS technologies also pose a challenge to traditional data analysis methods; this is due to the sheer amount of data to be processed and the need for improved alignment algorithms that can generate accurate results quickly. This thesis tackles the problem of sequence alignment as a step within the analysis of HTS data. Its contributions focus on both the methodological aspects and the algorithmic challenges towards efficient, scalable, and accurate HTS mapping. From a methodological standpoint, this thesis strives to establish a comprehensive framework able to assess the quality of HTS mapping results. In order to be able to do so one has to understand the source and nature of mapping conflicts, and explore the accuracy limits inherent in how sequence alignment is performed for current HTS technologies. From an algorithmic standpoint, this work introduces state-of-the-art index structures and approximate string matching algorithms. They contribute novel insights that can be used in practical applications towards efficient and accurate read mapping. More in detail, first we present methods able to reduce the storage space taken by indexes for genome-scale references, while still providing fast query access in order to support effective search algorithms. Second, we describe novel filtering techniques that vastly reduce the computational requirements of sequence mapping, but are nonetheless capable of giving strict algorithmic guarantees on the completeness of the results. Finally, this thesis presents new incremental algorithmic techniques able to combine several approximate string matching algorithms; this leads to efficient and flexible search algorithms allowing the user to reach arbitrary search depths. All algorithms and methodological contributions of this thesis have been implemented as components of a production aligner, the GEM-mapper, which is publicly available, widely used worldwide and cited by a sizeable body of literature. It offers flexible and accurate sequence mapping while outperforming other HTS mappers both as to running time and to the quality of the results it produces.Uno de los avances más importantes de los últimos años en el campo de la biotecnología ha sido el desarrollo de las llamadas técnicas de secuenciación de alto rendimiento (high-throughput sequencing, HTS). Debido a las limitaciones técnicas para secuenciar un genoma, las técnicas de alto rendimiento secuencian individualmente billones de pequeñas partes del genoma provenientes de regiones aleatorias. Posteriormente, estas pequeñas secuencias han de ser localizadas en el genoma de referencia del organismo en cuestión. Este proceso se denomina alineamiento - o mapeado - y consiste en identificar aquellas regiones del genoma de referencia que comparten una alta similaridad con las lecturas producidas por el secuenciador. De esta manera, en cuestión de horas, la secuenciación de alto rendimiento puede secuenciar un individuo y establecer las diferencias de este con el resto de la especie. En última instancia, estas tecnologías han potenciado nuevos protocolos y metodologías de investigación con un profundo impacto en el campo de la genómica, la medicina y la biología en general. La secuenciación alto rendimiento, sin embargo, supone un reto para los procesos tradicionales de análisis de datos. Debido a la elevada cantidad de datos a analizar, se necesitan nuevas y mejoradas técnicas algorítmicas que puedan escalar con el volumen de datos y producir resultados precisos. Esta tesis aborda dicho problema. Las contribuciones que en ella se realizan se enfocan desde una perspectiva metodológica y otra algorítmica que propone el desarrollo de nuevos algoritmos y técnicas que permitan alinear secuencias de manera eficiente, precisa y escalable. Desde el punto de vista metodológico, esta tesis analiza y propone un marco de referencia para evaluar la calidad de los resultados del alineamiento de secuencias. Para ello, se analiza el origen de los conflictos durante la alineación de secuencias y se exploran los límites alcanzables en calidad con las tecnologías de secuenciación de alto rendimiento. Desde el punto de vista algorítmico, en el contexto de la búsqueda aproximada de patrones, esta tesis propone nuevas técnicas algorítmicas y de diseño de índices con el objetivo de mejorar la calidad y el desempeño de las herramientas dedicadas a alinear secuencias. En concreto, esta tesis presenta técnicas de diseño de índices genómicos enfocados a obtener un acceso más eficiente y escalable. También se presentan nuevas técnicas algorítmicas de filtrado con el fin de reducir el tiempo de ejecución necesario para alinear secuencias. Y, por último, se proponen algoritmos incrementales y técnicas híbridas para combinar métodos de alineamiento y mejorar el rendimiento en búsquedas donde el error esperado es alto. Todo ello sin degradar la calidad de los resultados y con garantías formales de precisión. Para concluir, es preciso apuntar que todos los algoritmos y metodologías propuestos en esta tesis están implementados y forman parte del alineador GEM. Este versátil alineador ofrece resultados de alta calidad en entornos de producción siendo varias veces más rápido que otros alineadores. En la actualidad este software se ofrece gratuitamente, tiene una amplia comunidad de usuarios y ha sido citado en numerosas publicaciones científicas.Postprint (published version

    New Algorithms for Fast and Economic Assembly: Advances in Transcriptome and Genome Assembly

    Get PDF
    Great efforts have been devoted to decipher the sequence composition of the genomes and transcriptomes of diverse organisms. Continuing advances in high-throughput sequencing technologies have led to a decline in associated costs, facilitating a rapid increase in the amount of available genetic data. In particular genome studies have undergone a fundamental paradigm shift where genome projects are no longer limited by sequencing costs, but rather by computational problems associated with assembly. There is an urgent demand for more efficient and more accurate methods. Most recently, “hybrid” methods that integrate short- and long-read data have been devised to address this need. LazyB is a new, low-cost hybrid genome assembler. It starts from a bipartite overlap graph between long reads and restrictively filtered short-read unitigs. This graph is translated into a long-read overlap graph. By design, unitigs are both unique and almost free of assembly errors. As a consequence, only few spurious overlaps are introduced into the graph. Instead of the more conventional approach of removing tips, bubbles, and other local features, LazyB extracts subgraphs whose global properties approach a disjoint union of paths in multiple steps, utilizing properties of proper interval graphs. A prototype implementation of LazyB, entirely written in Python, not only yields significantly more accurate assemblies of the yeast, fruit fly, and human genomes compared to state-of-the-art pipelines, but also requires much less computational effort. An optimized C++ implementation dubbed MuCHSALSA further significantly reduces resource demands. Advances in RNA-seq have facilitated tremendous insights into the role of both coding and non-coding transcripts. Yet, the complete and accurate annotation of the transciptomes of even model organisms has remained elusive. RNA-seq produces reads significantly shorter than the average distance between related splice events and presents high noise levels and other biases The computational reconstruction remains a critical bottleneck. Ryūtō implements an extension of common splice graphs facilitating the integration of reads spanning multiple splice sites and paired-end reads bridging distant transcript parts. The decomposition of read coverage patterns is modeled as a minimum-cost flow problem. Using phasing information from multi-splice and paired-end reads, nodes with uncertain connections are decomposed step-wise via Linear Programming. Ryūtōs performance compares favorably with state-of-the-art methods on both simulated and real-life datasets. Despite ongoing research and our own contributions, progress on traditional single sample assembly has brought no major breakthrough. Multi-sample RNA-Seq experiments provide more information which, however, is challenging to utilize due to the large amount of accumulating errors. An extension to Ryūtō enables the reconstruction of consensus transcriptomes from multiple RNA-seq data sets, incorporating consensus calling at low level features. Benchmarks show stable improvements already at 3 replicates. Ryūtō outperforms competing approaches, providing a better and user-adjustable sensitivity-precision trade-off. Ryūtō consistently improves assembly on replicates, demonstrable also when mixing conditions or time series and for differential expression analysis. Ryūtōs approach towards guided assembly is equally unique. It allows users to adjust results based on the quality of the guide, even for multi-sample assembly.:1 Preface 1.1 Assembly: A vast and fast evolving field 1.2 Structure of this Work 1.3 Available 2 Introduction 2.1 Mathematical Background 2.2 High-Throughput Sequencing 2.3 Assembly 2.4 Transcriptome Expression 3 From LazyB to MuCHSALSA - Fast and Cheap Genome Assembly 3.1 Background 3.2 Strategy 3.3 Data preprocessing 3.4 Processing of the overlap graph 3.5 Post Processing of the Path Decomposition 3.6 Benchmarking 3.7 MuCHSALSA – Moving towards the future 4 Ryūtō - Versatile, Fast, and Effective Transcript Assembly 4.1 Background 4.2 Strategy 4.3 The Ryūtō core algorithm 4.4 Improved Multi-sample transcript assembly with Ryūtō 5 Conclusion & Future Work 5.1 Discussion and Outlook 5.2 Summary and Conclusio

    Bioinformatics approaches to study antibiotics resistance emergence across levels of biological organization.

    Get PDF
    The Review on Antimicrobial Resistance predicts that in thirty years infections with antibiotic-resistant microorganisms will become one of the leading causes of death. The discovery of new antibiotics has so far been too slow to ensure continuous use of antibiotics in the face of growing resistance. Therefore, efforts to curb resistance emergence gain in importance. These efforts comprise two complementary strategies. The first focuses on the mechanisms of resistance emergence, in the hope that it would enable development of pharmacological agents constraining resistance emergence. The second aims at improving antibiotic use practices, based on studies of the impact of antibiotics on resistance emergence within patient populations. Antibiotic resistance emerges in bacterial cells, negatively influences the human gut microbiome, and transfers between people. Hence, antibiotic resistance has impacts across several levels of biological organization. This thesis describes four projects, which concerned various aspects of antibiotics resistance. The first two projects deal with basic resistance emergence mechanisms, on the level of bacterial strains and bacterial consortia, whereas the other two deal with finding better practices for antibiotic use on a population level. During the first project, I analyzed changes in genomes of MRSA strains isolated from several patients throughout antibiotic therapies and developing MRSA infections. I observed changes in number and types of virulence factors responsible for interacting with the human body, which are attributed to mobile genetic elements. In the second project, I showed that, prompted by antibiotic therapy, within the human gut microbiome resistance transfers from bacterial genomes onto plasmids, prophages, and free phages. Hence, resistance emergence depends not only on the antibiotic therapy but also on the state of the gut microbiome, which again results from the patients’ overall health and previous antibiotic therapies. The third project, SATURN, employed machine learning methods for a large set of data regarding patients’ demographics, comorbidities, antibiotic therapies, surgeries, and colonization with multi-drug resistant bacteria. The final classifiers were made available on the AskSaturn website where the doctors can compare antibiotic therapies based on the probability of colonization with multi-drug resistant bacteria. The fourth project, Tübiom, focused on the antibiotic-influenced gut microbiomes of the healthy population. The first two projects rely on genome and metagenome sequencing data. For them, I designed specialized bioinformatics analysis pipelines. The latter two projects use mixed data, which were analyzed with machine learning algorithms. These projects also involved web development and data visualization. Although each of the projects requires different data and methods, each of them provides a crucial part in a pipeline aiming at utilizing gut microbiome information in medical practice to constrain resistance emergence

    Assessing cost efficiency and economies of scale in the European banking system, a Bayesian stochastic frontier approach

    Get PDF
    Cost e_x000E_fficiency of banks is a key indicator that provides valuable insight to researchers and policymakers about the functioning of the _x000C_financial intermediation process, as well as, the overall performance of the entire financial system. This thesis focuses on the cost effi_x000E_ciency of the European banking market for which we identify fourteen nation-specific frontiers and also perform cross country comparisons under a common frontier assumption. Our interest in the subject is twofold. At the nation level, cost e_x000E_fficiency influences the relative competitiveness of banks, setting the profile of the national banking industry with direct implications on economic growth. At the European Union level, the financial, institutional and regulatory integration raise questions about the existence of a common cost frontier or the presence of economies of scale as they could encourage banks to take advantage of the single market and consolidate. The empirical approach uses a more general Bayesian stochastic frontier model that allows for a continuous shift from the individual frontiers of each country to the common European frontier through varying priors. Results show differences in the frontiers of the countries that we studied, and the selected banks exhibit economies of scale greater than one more often than not, irrespective of size
    corecore