Search CORE

20 research outputs found

Galaxy CloudMan: delivering cloud compute clusters

Author: Afgan Enis
Baker Dannon
Chapman Brad
Coraor Nate
Nekrutenko Anton
Taylor James
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Fast and accurate genome-wide predictions and structural modeling of protein-protein interactions using Galaxy.

Author: Baker Dannon
Bouvier Dave
Coraor Nate
Gruening Bjoern
Guerler Aysam
Nekrutenko Anton
Schatz Michael
Shank Stephen
van den Beek Marius
Zehr Jordan
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 19/03/2021
Field of study

Cold Spring Harbor Laboratory Institutional Repository

Recommended from our members

Training Infrastructure as a Service

Author: Bacon Wendi
Bretaudeau Anthony
Coraor Nate
Cuccuru Gianmauro
Davis John
Gladman Simon
Grüning Björn
Hillman-Jackson Jennifer
Hiltemann Saskia
Hyde Cameron
Rasche Helena
Serrano-Solano Beatriz
Stubbs Andrew
Zhou Miaomiao
Publication venue
Publication date: 28/12/2022
Field of study

Background Hands-on training, whether in bioinformatics or other domains, often requires significant technical resources and knowledge to set up and run. Instructors must have access to powerful compute infrastructure that can support resource-intensive jobs running efficiently. Often this is achieved using a private server where there is no contention for the queue. However, this places a significant prerequisite knowledge or labor barrier for instructors, who must spend time coordinating deployment and management of compute resources. Furthermore, with the increase of virtual and hybrid teaching, where learners are located in separate physical locations, it is difficult to track student progress as efficiently as during in-person courses. Findings Originally developed by Galaxy Europe and the Gallantries project, together with the Galaxy community, we have created Training Infrastructure-as-a-Service (TIaaS), aimed at providing user-friendly training infrastructure to the global training community. TIaaS provides dedicated training resources for Galaxy-based courses and events. Event organizers register their course, after which trainees are transparently placed in a private queue on the compute infrastructure, which ensures jobs complete quickly, even when the main queue is experiencing high wait times. A built-in dashboard allows instructors to monitor student progress. Conclusions TIaaS provides a significant improvement for instructors and learners, as well as infrastructure administrators. The instructor dashboard makes remote events not only possible but also easy. Students experience continuity of learning, as all training happens on Galaxy, which they can continue to use after the event. In the past 60 months, 504 training events with over 24,000 learners have used this infrastructure for Galaxy training

Open Research Online

EUR Research Repository

HAL-Rennes 1

The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update

Author: Afgan E. (Enis)
Baker D. (Dannon)
Batut B. (Bérénice)
Blankenberg D. (Daniel)
Bouvier D. (Dave)
Chilton J. (John)
Clements D. (Dave)
Coraor N. (Nate)
Ech M. (Martin)
Goecks J. (Jeremy)
Grüning B.A. (Björn A.)
Guerler A. (Aysam)
Hillman-Jackson J. (Jennifer)
Hiltemann S. (Saskia)
Jalili V. (Vahid)
Nekrutenko A. (Anton)
Rasche H. (Helena)
Soranzo N. (Nicola)
Taylor J. (James)
Van Den Beek M. (Marius)
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2018
Field of study

Galaxy (homepage: https://galaxyproject.org, main public server: https://usegalaxy.org) is a web-based scientific analysis platform used by tens of thousands of scientists across the world to analyze large biomedical datasets such as those found in genomics, proteomics, metabolomics and imaging. Started in 2005, Galaxy continues to focus on three key challenges of data-driven biomedical science: making analyses accessible to all researchers, ensuring analyses are completely reproducible, and making it simple to communicate analyses so that they can be reused and extended. During the last two years, the Galaxy team and the open-source community around Galaxy have made substantial improvements to Galaxy's core framework, user interface, tools, and training materials. Framework and user interface improvements now enable Galaxy to be used for analyzing tens of thousands of datasets, and >5500 tools are now available from the Galaxy ToolShed. The Galaxy community has led an effort to create numerous high-quality tutorials focused on common types of genomic analyses. The Galaxy developer and user communities continue to grow and be integral to Galaxy's development. The number of Galaxy public servers, developers contributing to the Galaxy framework and its tools, and users of the main Galaxy server have all increased substantially

Crossref

EUR Research Repository

Erasmus University Digital Repository

Fast and accurate genome-wide predictions and structural modeling of protein–protein interactions using Galaxy

Author: Anton Nekrutenko
Aysam Guerler
Bjoern Gruening
Dannon Baker
Dave Bouvier
Jordan D. Zehr
Marius van den Beek
Michael C. Schatz
Nate Coraor
Stephen D. Shank
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2023
Field of study

Abstract Background Protein–protein interactions play a crucial role in almost all cellular processes. Identifying interacting proteins reveals insight into living organisms and yields novel drug targets for disease treatment. Here, we present a publicly available, automated pipeline to predict genome-wide protein–protein interactions and produce high-quality multimeric structural models. Results Application of our method to the Human and Yeast genomes yield protein–protein interaction networks similar in quality to common experimental methods. We identified and modeled Human proteins likely to interact with the papain-like protease of SARS-CoV2’s non-structural protein 3. We also produced models of SARS-CoV2’s spike protein (S) interacting with myelin-oligodendrocyte glycoprotein receptor and dipeptidyl peptidase-4. Conclusions The presented method is capable of confidently identifying interactions while providing high-quality multimeric structural models for experimental validation. The interactome modeling pipeline is available at usegalaxy.org and usegalaxy.eu

Directory of Open Access Journals

Jupyter and Galaxy: Easing entry barriers into complex data analyses for biomedical researchers

Author: Anton Nekrutenko (29194)
Björn A. Grüning (3196326)
Boris Rebolledo-Jaramillo (498907)
Carl Eberhard (284222)
Eric Rasche (4046965)
James Taylor (158393)
John Chilton (2562886)
Nate Coraor (4046962)
Rolf Backofen (11285)
Torsten Houwaart (1611853)
Publication venue
Publication date: 01/05/2017
Field of study

<div>What does it take to convert a heap of sequencing data into a publishable result? First, common tools are employed to reduce primary data (sequencing reads) to a form suitable for further analyses (i.e., the list of variable sites). The subsequent exploratory stage is much more ad hoc and requires the development of custom scripts and pipelines, making it problematic for biomedical researchers. Here, we describe a hybrid platform combining common analysis pathways with the ability to explore data interactively. It aims to fully encompass and simplify the "raw data-to-publication" pathway and make it reproducible.</div

Directory of Open Access Journals

The Francis Crick Institute

Congruence between Galaxy and Jupyter as a function of their pros and cons.

Author: Anton Nekrutenko (29194)
Björn A. Grüning (3196326)
Boris Rebolledo-Jaramillo (498907)
Carl Eberhard (284222)
Eric Rasche (4046965)
James Taylor (158393)
John Chilton (2562886)
Nate Coraor (4046962)
Rolf Backofen (11285)
Torsten Houwaart (1611853)
Publication venue
Publication date
Field of study

Congruence between Galaxy and Jupyter as a function of their pros and cons.</p

The Francis Crick Institute

Reanalysis of data from [14] using Galaxy and Jupyter.

Author: Anton Nekrutenko (29194)
Björn A. Grüning (3196326)
Boris Rebolledo-Jaramillo (498907)
Carl Eberhard (284222)
Eric Rasche (4046965)
James Taylor (158393)
John Chilton (2562886)
Nate Coraor (4046962)
Rolf Backofen (11285)
Torsten Houwaart (1611853)
Publication venue
Publication date
Field of study

A. Workflow used in the analysis. As an input, the workflow takes a collection of paired Illumina datasets and outputs an unfiltered list of variable sites. B. Galaxy history showing all steps of these analyses. It only contains 12 steps because we use dataset collections to combine multiple similar datasets into a small number of history entries. This significantly simplifies processing. For example, collection 313 contains all 312 paired-end Illumina datasets generated for this study. This allows us to deal with just one history item instead of 312. The next item in the history is a collection of BAM datasets generated by mapping each read-pair from collection 313 against human genome (hg38) with bwa-mem. These BAM datasets are de-duplicated (collection 627), filtered (by only retaining reads mapping to mitochondrial DNA, with mapping quality of 20 or higher, and mapped in a proper pair; collection 941), realigned to mitigate misalignment around indels or structural variant calls (collection 1098), and used to call variants with Naive Variant Caller [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005425#pcbi.1005425.ref021" target="_blank">21</a>]. Finally, we use Variant Annotator to process VCF datasets generated by Naive Variant Caller and to create a list of variants (collection 1412) and the concatenation tool to reduce collection 1412 into a single table (dataset 1413). This dataset is used for further processing with Jupyter. C. The relationship of minor allele frequencies for heteroplasmic sites between tissues (panels A and B) and individuals (panels C and D). D. Estimates for bottleneck size with (red) and without (blue) accounting for mitotic segregation.</p

The Francis Crick Institute

The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update

Author: Afgan Enis
Baker Dannon
Blankenberg Daniel
Bouvier Dave
Cech Martiň
Chilton John
Clements Dave
Coraor Nate
Eberhard Carl
Goecks Jeremy
Grüning Bjorn
Guerler Aysam
Hillman-Jackson Jennifer
Nekrutenko Anton
Rasche Eric
Soranzo Nicola
Taylor James
Turaga Nitesh
Van Den Beek Marius
Von Kuster Greg
Publication venue: 'Oxford University Press (OUP)'
Publication date: 02/05/2016
Field of study

International audienceHigh-throughput data production technologies, particularly 'next-generation' DNA sequencing, have ushered in widespread and disruptive changes to biomedical research. Making sense of the large datasets produced by these technologies requires sophisticated statistical and computational methods , as well as substantial computational power. This has led to an acute crisis in life sciences, as researchers without informatics training attempt to perform computation-dependent analyses. Since 2005, the Galaxy project has worked to address this problem by providing a framework that makes advanced computational tools usable by non experts. Galaxy seeks to make data-intensive research more accessible , transparent and reproducible by providing a Web-based environment in which users can perform computational analyses and have all of the details automatically tracked for later inspection, publication , or reuse. In this report we highlight recently added features enabling biomedical analyses on a large scale

PubMed Central

Full-text Institutional Repository of the Ruđer Bošković Institute