Search CORE

837 research outputs found

Effective Reproducible Research with Org-Mode and Git

Author: Legrand Arnaud
Stanisic Luka
Publication venue: HAL CCSD
Publication date: 01/01/2014
Field of study

International audienceIn this article we address the question of developing a lightweight and effective workflow for conducting experimental research on modern parallel computer systems in a reproducible way. Our workflowsimply builds on two well-known tools (Org-mode and Git) and enablesto address issues such as provenance tracking, experimental setup reconstruction, replicable analysis. Although this workflow is perfectible and cannot be seen as a final solution, we have been usingit for two years now and we have recently published a fully reproduciblearticle, which demonstrates the effectiveness of our proposal

CiteSeerX

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Self-reinoculation with fecal flora changes microbiota density and composition leading to an altered bile-acid profile in the mouse small intestine

Author: Bogatyrev Said R.
Ismagilov Rustem F.
Rolando Justin C.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/10/2019
Field of study

Background: The upper gastrointestinal tract plays a prominent role in human physiology as the primary site for enzymatic digestion and nutrient absorption, immune sampling, and drug uptake. Alterations to the small intestine microbiome have been implicated in various human diseases, such as non-alcoholic steatohepatitis and inflammatory bowel conditions. Yet, the physiological and functional roles of the small intestine microbiota in humans remain poorly characterized because of the complexities associated with its sampling. Rodent models are used extensively in microbiome research and enable the spatial, temporal, compositional, and functional interrogation of the gastrointestinal microbiota and its effects on the host physiology and disease phenotype. Classical, culture-based studies have documented that fecal microbial self-reinoculation (via coprophagy) affects the composition and abundance of microbes in the murine proximal gastrointestinal tract. This pervasive self-reinoculation behavior could be a particularly relevant study factor when investigating small intestine microbiota. Modern microbiome studies either do not take self-reinoculation into account, or assume that approaches such as single housing mice or housing on wire mesh floors eliminate it. These assumptions have not been rigorously tested with modern tools. Here, we used quantitative 16S rRNA gene amplicon sequencing, quantitative microbial functional gene content inference, and metabolomic analyses of bile acids to evaluate the effects of self-reinoculation on microbial loads, composition, and function in the murine upper gastrointestinal tract. Results: In coprophagic mice, continuous self-exposure to the fecal flora had substantial quantitative and qualitative effects on the upper gastrointestinal microbiome. These differences in microbial abundance and community composition were associated with an altered profile of the small intestine bile acid pool, and, importantly, could not be inferred from analyzing large intestine or stool samples. Overall, the patterns observed in the small intestine of non-coprophagic mice (reduced total microbial load, low abundance of anaerobic microbiota, and bile acids predominantly in the conjugated form) resemble those typically seen in the human small intestine. Conclusions: Future studies need to take self-reinoculation into account when using mouse models to evaluate gastrointestinal microbial colonization and function in relation to xenobiotic transformation and pharmacokinetics or in the context of physiological states and diseases linked to small intestine microbiome and to small intestine dysbiosis

Caltech Authors

CaltechDATA (California Institute of Technology Research Data Repository)

Reproducible and User-Controlled Software Environments in HPC with Guix

Author: C Boettiger
C Ruiz
E Jeanvoine
Luka Stanisic
M Gavish
PV Gorp
Publication venue
Publication date: 01/01/2015
Field of study

Support teams of high-performance computing (HPC) systems often find themselves between a rock and a hard place: on one hand, they understandably administrate these large systems in a conservative way, but on the other hand, they try to satisfy their users by deploying up-to-date tool chains as well as libraries and scientific software. HPC system users often have no guarantee that they will be able to reproduce results at a later point in time, even on the same system-software may have been upgraded, removed, or recompiled under their feet, and they have little hope of being able to reproduce the same software environment elsewhere. We present GNU Guix and the functional package management paradigm and show how it can improve reproducibility and sharing among researchers with representative use cases.Comment: 2nd International Workshop on Reproducibility in Parallel Computing (RepPar), Aug 2015, Vienne, Austria. http://reppar.org

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

MDC Repository

HAL-Rennes 1

An Effective Git And Org-Mode Based Workflow For Reproducible Research

Author: Arnaud Legrand
Drummond C.
Luka Stanisic
Ruiz Sanabria C. C.
Vincent Danjean
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Contribution à la convergence d'infrastructure entre le calcul haute performance et le traitement de données à large échelle

Author: Mercier Michael
Publication venue: HAL CCSD
Publication date: 01/07/2019
Field of study

The amount of produced data, either in the scientific community or the commercialworld, is constantly growing. The field of Big Data has emerged to handle largeamounts of data on distributed computing infrastructures. High-Performance Computing (HPC) infrastructures are traditionally used for the execution of computeintensive workloads. However, the HPC community is also facing an increasingneed to process large amounts of data derived from high definition sensors andlarge physics apparati. The convergence of the two fields -HPC and Big Data- iscurrently taking place. In fact, the HPC community already uses Big Data tools,which are not always integrated correctly, especially at the level of the file systemand the Resource and Job Management System (RJMS).In order to understand how we can leverage HPC clusters for Big Data usage, andwhat are the challenges for the HPC infrastructures, we have studied multipleaspects of the convergence: We initially provide a survey on the software provisioning methods, with a focus on data-intensive applications. We contribute a newRJMS collaboration technique called BeBiDa which is based on 50 lines of codewhereas similar solutions use at least 1000 times more. We evaluate this mechanism on real conditions and in simulated environment with our simulator Batsim.Furthermore, we provide extensions to Batsim to support I/O, and showcase thedevelopments of a generic file system model along with a Big Data applicationmodel. This allows us to complement BeBiDa real conditions experiments withsimulations while enabling us to study file system dimensioning and trade-offs.All the experiments and analysis of this work have been done with reproducibilityin mind. Based on this experience, we propose to integrate the developmentworkflow and data analysis in the reproducibility mindset, and give feedback onour experiences with a list of best practices.RésuméLa quantité de données produites, que ce soit dans la communauté scientifiqueou commerciale, est en croissance constante. Le domaine du Big Data a émergéface au traitement de grandes quantités de données sur les infrastructures informatiques distribuées. Les infrastructures de calcul haute performance (HPC) sont traditionnellement utilisées pour l’exécution de charges de travail intensives en calcul. Cependant, la communauté HPC fait également face à un nombre croissant debesoin de traitement de grandes quantités de données dérivées de capteurs hautedéfinition et de grands appareils physique. La convergence des deux domaines-HPC et Big Data- est en cours. En fait, la communauté HPC utilise déjà des outilsBig Data, qui ne sont pas toujours correctement intégrés, en particulier au niveaudu système de fichiers ainsi que du système de gestion des ressources (RJMS).Afin de comprendre comment nous pouvons tirer parti des clusters HPC pourl’utilisation du Big Data, et quels sont les défis pour les infrastructures HPC, nousavons étudié plusieurs aspects de la convergence: nous avons d’abord proposé uneétude sur les méthodes de provisionnement logiciel, en mettant l’accent sur lesapplications utilisant beaucoup de données. Nous contribuons a l’état de l’art avecune nouvelle technique de collaboration entre RJMS appelée BeBiDa basée sur 50lignes de code alors que des solutions similaires en utilisent au moins 1000 fois plus.Nous évaluons ce mécanisme en conditions réelles et en environnement simuléavec notre simulateur Batsim. En outre, nous fournissons des extensions à Batsimpour prendre en charge les entrées/sorties et présentons le développements d’unmodèle de système de fichiers générique accompagné d’un modèle d’applicationBig Data. Cela nous permet de compléter les expériences en conditions réellesde BeBiDa en simulation tout en étudiant le dimensionnement et les différentscompromis autours des systèmes de fichiers.Toutes les expériences et analyses de ce travail ont été effectuées avec la reproductibilité à l’esprit. Sur la base de cette expérience, nous proposons d’intégrerle flux de travail du développement et de l’analyse des données dans l’esprit dela reproductibilité, et de donner un retour sur nos expériences avec une liste debonnes pratiques

Self-reinoculation with fecal flora changes microbiota density and composition leading to an altered bile-acid profile in the mouse small intestine

Author: Bogatyrev Said R.
Ismagilov Rustem F.
Rolando Justin C.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 12/02/2020
Field of study

Packaging data analytical work reproducibly using R (and friends)

Author
Publication venue: 'PeerJ'
Publication date
Field of study

Crossref

An Introduction to Programming for Bioscientists: A Python-based Primer

Author: Ekmekci Berk
McAnany Charles E.
Mura Cameron
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 17/05/2016
Field of study

Computing has revolutionized the biological sciences over the past several decades, such that virtually all contemporary research in the biosciences utilizes computer programs. The computational advances have come on many fronts, spurred by fundamental developments in hardware, software, and algorithms. These advances have influenced, and even engendered, a phenomenal array of bioscience fields, including molecular evolution and bioinformatics; genome-, proteome-, transcriptome- and metabolome-wide experimental studies; structural genomics; and atomistic simulations of cellular-scale molecular assemblies as large as ribosomes and intact viruses. In short, much of post-genomic biology is increasingly becoming a form of computational biology. The ability to design and write computer programs is among the most indispensable skills that a modern researcher can cultivate. Python has become a popular programming language in the biosciences, largely because (i) its straightforward semantics and clean syntax make it a readily accessible first language; (ii) it is expressive and well-suited to object-oriented programming, as well as other modern paradigms; and (iii) the many available libraries and third-party toolkits extend the functionality of the core language into virtually every biological domain (sequence and structure analyses, phylogenomics, workflow management systems, etc.). This primer offers a basic introduction to coding, via Python, and it includes concrete examples and exercises to illustrate the language's usage and capabilities; the main text culminates with a final project in structural bioinformatics. A suite of Supplemental Chapters is also provided. Starting with basic concepts, such as that of a 'variable', the Chapters methodically advance the reader to the point of writing a graphical user interface to compute the Hamming distance between two DNA sequences.Comment: 65 pages total, including 45 pages text, 3 figures, 4 tables, numerous exercises, and 19 pages of Supporting Information; currently in press at PLOS Computational Biolog

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

FigShare