142 research outputs found
Ten Simple Rules for Reproducible Research in Jupyter Notebooks
Reproducibility of computational studies is a hallmark of scientific
methodology. It enables researchers to build with confidence on the methods and
findings of others, reuse and extend computational pipelines, and thereby drive
scientific progress. Since many experimental studies rely on computational
analyses, biologists need guidance on how to set up and document reproducible
data analyses or simulations.
In this paper, we address several questions about reproducibility. For
example, what are the technical and non-technical barriers to reproducible
computational studies? What opportunities and challenges do computational
notebooks offer to overcome some of these barriers? What tools are available
and how can they be used effectively?
We have developed a set of rules to serve as a guide to scientists with a
specific focus on computational notebook systems, such as Jupyter Notebooks,
which have become a tool of choice for many applications. Notebooks combine
detailed workflows with narrative text and visualization of results. Combined
with software repositories and open source licensing, notebooks are powerful
tools for transparent, collaborative, reproducible, and reusable data analyses
Notes on Notebooks: Is Jupyter the Bringer of Jollity?
As the interactive computational notebook becomes a more prominent code development medium, we examine advantages and disadvantages of this particular source code format. We specify the structure of a coding notebook layout. We describe complexities in notebook programming; some of these are incidental whereas others may be inherent complexities. We outline how we envisage research and development might proceed to advance the cause of notebook programming
ScreenTrack: Using a Visual History of a Computer Screen to Retrieve Documents and Web Pages
Computers are used for various purposes, so frequent context switching is
inevitable. In this setting, retrieving the documents, files, and web pages
that have been used for a task can be a challenge. While modern applications
provide a history of recent documents for users to resume work, this is not
sufficient to retrieve all the digital resources relevant to a given primary
document. The histories currently available do not take into account the
complex dependencies among resources across applications. To address this
problem, we tested the idea of using a visual history of a computer screen to
retrieve digital resources within a few days of their use through the
development of ScreenTrack. ScreenTrack is software that captures screenshots
of a computer at regular intervals. It then generates a time-lapse video from
the captured screenshots and lets users retrieve a recently opened document or
web page from a screenshot after recognizing the resource by its appearance. A
controlled user study found that participants were able to retrieve requested
information more quickly with ScreenTrack than under the baseline condition
with existing tools. A follow-up study showed that the participants used
ScreenTrack to retrieve previously used resources and to recover the context
for task resumption.Comment: CHI 2020, 10 pages, 7 figure
Paths Explored, Paths Omitted, Paths Obscured: Decision Points & Selective Reporting in End-to-End Data Analysis
Drawing reliable inferences from data involves many, sometimes arbitrary,
decisions across phases of data collection, wrangling, and modeling. As
different choices can lead to diverging conclusions, understanding how
researchers make analytic decisions is important for supporting robust and
replicable analysis. In this study, we pore over nine published research
studies and conduct semi-structured interviews with their authors. We observe
that researchers often base their decisions on methodological or theoretical
concerns, but subject to constraints arising from the data, expertise, or
perceived interpretability. We confirm that researchers may experiment with
choices in search of desirable results, but also identify other reasons why
researchers explore alternatives yet omit findings. In concert with our
interviews, we also contribute visualizations for communicating decision
processes throughout an analysis. Based on our results, we identify design
opportunities for strengthening end-to-end analysis, for instance via tracking
and meta-analysis of multiple decision paths
Finishing the euchromatic sequence of the human genome
The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers ∼99% of the euchromatic genome and is accurate to an error rate of ∼1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human enome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead
Fludarabine, cytarabine, granulocyte colony-stimulating factor, and idarubicin with gemtuzumab ozogamicin improves event-free survival in younger patients with newly diagnosed aml and overall survival in patients with npm1 and flt3 mutations
Purpose
To determine the optimal induction chemotherapy regimen for younger adults with newly diagnosed AML without known adverse risk cytogenetics.
Patients and Methods
One thousand thirty-three patients were randomly assigned to intensified (fludarabine, cytarabine, granulocyte colony-stimulating factor, and idarubicin [FLAG-Ida]) or standard (daunorubicin and Ara-C [DA]) induction chemotherapy, with one or two doses of gemtuzumab ozogamicin (GO). The primary end point was overall survival (OS).
Results
There was no difference in remission rate after two courses between FLAG-Ida + GO and DA + GO (complete remission [CR] + CR with incomplete hematologic recovery 93% v 91%) or in day 60 mortality (4.3% v 4.6%). There was no difference in OS (66% v 63%; P = .41); however, the risk of relapse was lower with FLAG-Ida + GO (24% v 41%; P < .001) and 3-year event-free survival was higher (57% v 45%; P < .001). In patients with an NPM1 mutation (30%), 3-year OS was significantly higher with FLAG-Ida + GO (82% v 64%; P = .005). NPM1 measurable residual disease (MRD) clearance was also greater, with 88% versus 77% becoming MRD-negative in peripheral blood after cycle 2 (P = .02). Three-year OS was also higher in patients with a FLT3 mutation (64% v 54%; P = .047). Fewer transplants were performed in patients receiving FLAG-Ida + GO (238 v 278; P = .02). There was no difference in outcome according to the number of GO doses, although NPM1 MRD clearance was higher with two doses in the DA arm. Patients with core binding factor AML treated with DA and one dose of GO had a 3-year OS of 96% with no survival benefit from FLAG-Ida + GO.
Conclusion
Overall, FLAG-Ida + GO significantly reduced relapse without improving OS. However, exploratory analyses show that patients with NPM1 and FLT3 mutations had substantial improvements in OS. By contrast, in patients with core binding factor AML, outcomes were excellent with DA + GO with no FLAG-Ida benefit
- …