12 research outputs found

    A very simple and fast way to access and validate algorithms in reproducible research

    Get PDF
    The reproducibility of research in bioinformatics refers to the notion that new methodologies/ algorithms and scientific claims have to be published together with their data and source code, in a way that other researchers may verify the findings to further build more knowledge upon them. The replication and corroboration of research results are key to the scientific process and many journals are discussing the matter nowadays, taking concrete steps in this direction. In this journal itself, a very recent opinion note has appeared highlighting the increasing importance of this topic in bioinformatics and computational biology, inviting the community to further discuss the matter. In agreement with that article, we would like to propose here another step into that direction with a tool that allows the automatic generation of a web interface, named web-demo, directly from source code in a very simple and straightforward way. We believe this contribution can help make research not only reproducible but also more easily accessible. A web-demo associated to a published paper can accelerate an algorithm validation with real data, wide-spreading its use with just a few clicks.Fil: Stegmayer, Georgina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Pividori, Milton Damián. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Milone, Diego Humberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentin

    FAST: FAST Analysis of Sequences Toolbox.

    Get PDF
    FAST (FAST Analysis of Sequences Toolbox) provides simple, powerful open source command-line tools to filter, transform, annotate and analyze biological sequence data. Modeled after the GNU (GNU's Not Unix) Textutils such as grep, cut, and tr, FAST tools such as fasgrep, fascut, and fastr make it easy to rapidly prototype expressive bioinformatic workflows in a compact and generic command vocabulary. Compact combinatorial encoding of data workflows with FAST commands can simplify the documentation and reproducibility of bioinformatic protocols, supporting better transparency in biological data science. Interface self-consistency and conformity with conventions of GNU, Matlab, Perl, BioPerl, R, and GenBank help make FAST easy and rewarding to learn. FAST automates numerical, taxonomic, and text-based sorting, selection and transformation of sequence records and alignment sites based on content, index ranges, descriptive tags, annotated features, and in-line calculated analytics, including composition and codon usage. Automated content- and feature-based extraction of sites and support for molecular population genetic statistics make FAST useful for molecular evolutionary analysis. FAST is portable, easy to install and secure thanks to the relative maturity of its Perl and BioPerl foundations, with stable releases posted to CPAN. Development as well as a publicly accessible Cookbook and Wiki are available on the FAST GitHub repository at https://github.com/tlawrence3/FAST. The default data exchange format in FAST is Multi-FastA (specifically, a restriction of BioPerl FastA format). Sanger and Illumina 1.8+ FastQ formatted files are also supported. FAST makes it easier for non-programmer biologists to interactively investigate and control biological data at the speed of thought

    cSPider – Evaluation of a Free and Open-Source Automated Tool to Analyze Corticomotor Silent Period

    Get PDF
    Background The corticomotor silent period (CSP), as assessed noninvasively by transcranial magnetic stimulation (TMS) in the primary motor cortex, has been found to reflect intracortical inhibitory mechanisms. Analysis of CSP is mostly conducted manually. However, this approach is time-consuming, and comparison of results from different laboratories may be compromised by inter- rater variability in analysis. No open source program for automated analysis is currently available. Methods/Results Here, we describe cross-validation with the manual analysis of an in-house written automated tool to assess CSP (cSPider). Results from automated routine were compared with results of the manual evaluation. We found high inter-method reliability between automated and manual analysis (p<0.001), and significantly reduced time for CSP analysis (median = 10.3 sec for automated analysis of 10 CSPs vs. median = 270 sec for manual analysis of 10 CSPs). cSPider can be downloaded free of charge. Conclusion cSPider allows automated analysis of CSP in a reliable and time- efficient manner. Use of this open-source tool may help to improve comparison of data from different laboratories

    Many methods, many microbes: methodological diversity and standardization in the deep subseafloor biosphere

    Get PDF
    Standardization is widely assumed to be important to advance science. This assumption is typically embedded in initiatives to devise infrastructure and policies to support scientific work. This paper examines a movement comprising scientists advocating methods standardization in an emerging scientific domain, the deep subseafloor biosphere. This movement is not primarily motivated by the usual rationales for standardization, but instead by the aim of intervening in the politics of an infrastructure upon which the domain depends, scientific ocean drilling cruises. This infrastructure is shared and contested with other domains, and this movement regards standardization as a critical step in reconfiguring the infrastructure to secure a greater share of resources for the deep subseafloor biosphere. This movement encounters two tensions. One tension is between the perceived benefits of standardization vs. methodological diversity. Another tension is between perceived benefits for the domain vs. a lack of incentives for individuals to perform necessary standardization work

    Combining techniques for screening and evaluating interaction terms on high-dimensional time-to-event data

    Get PDF
    BACKGROUND: Molecular data, e.g. arising from microarray technology, is often used for predicting survival probabilities of patients. For multivariate risk prediction models on such high-dimensional data, there are established techniques that combine parameter estimation and variable selection. One big challenge is to incorporate interactions into such prediction models. In this feasibility study, we present building blocks for evaluating and incorporating interactions terms in high-dimensional time-to-event settings, especially for settings in which it is computationally too expensive to check all possible interactions. RESULTS: We use a boosting technique for estimation of effects and the following building blocks for pre-selecting interactions: (1) resampling, (2) random forests and (3) orthogonalization as a data pre-processing step. In a simulation study, the strategy that uses all building blocks is able to detect true main effects and interactions with high sensitivity in different kinds of scenarios. The main challenge are interactions composed of variables that do not represent main effects, but our findings are also promising in this regard. Results on real world data illustrate that effect sizes of interactions frequently may not be large enough to improve prediction performance, even though the interactions are potentially of biological relevance. CONCLUSION: Screening interactions through random forests is feasible and useful, when one is interested in finding relevant two-way interactions. The other building blocks also contribute considerably to an enhanced pre-selection of interactions. We determined the limits of interaction detection in terms of necessary effect sizes. Our study emphasizes the importance of making full use of existing methods in addition to establishing new ones

    DataPackageR: Reproducible data preprocessing, standardization and sharing using R/Bioconductor for collaborative data analysis [version 2; referees: 2 approved, 1 approved with reservations]

    Get PDF
    A central tenet of reproducible research is that scientific results are published along with the underlying data and software code necessary to reproduce and verify the findings. A host of tools and software have been released that facilitate such work-flows and scientific journals have increasingly demanded that code and primary data be made available with publications. There has been little practical advice on implementing reproducible research work-flows for large ’omics’ or systems biology data sets used by teams of analysts working in collaboration. In such instances it is important to ensure all analysts use the same version of a data set for their analyses. Yet, instantiating relational databases and standard operating procedures can be unwieldy, with high "startup" costs and poor adherence to procedures when they deviate substantially from an analyst’s usual work-flow. Ideally a reproducible research work-flow should fit naturally into an individual’s existing work-flow, with minimal disruption. Here, we provide an overview of how we have leveraged popular open source tools, including Bioconductor, Rmarkdown, git version control, R, and specifically R’s package system combined with a new tool DataPackageR, to implement a lightweight reproducible research work-flow for preprocessing large data sets, suitable for sharing among small-to-medium sized teams of computational scientists. Our primary contribution is the DataPackageR tool, which decouples time-consuming data processing from data analysis while leaving a traceable record of how raw data is processed into analysis-ready data sets. The software ensures packaged data objects are properly documented and performs checksum verification of these along with basic package version management, and importantly, leaves a record of data processing code in the form of package vignettes. Our group has implemented this work-flow to manage, analyze and report on pre-clinical immunological trial data from multi-center, multi-assay studies for the past three years

    Cross-species Analyses of Intra-species Behavioral Differences in Mammals and Fish

    Get PDF
    Multiple species display robust behavioral variance among individuals due to different genetic, genomic, epigenetic, neuroplasticity and environmental factors. Behavioral individuality has been extensively studied in various animal models, including rodents and other mammals. Fish, such as zebrafish (Danio rerio), have recently emerged as powerful aquatic model organisms with overt individual differences in behavioral, nociceptive and other CNS traits. Here, we evaluate individual behavioral differences in mammals and fish, emphasizing the importance of cross-species analyses of intraspecies variance in experimental models of normal and pathological CNS functions. © 2019 IBROAVK laboratory is supported by the Southwest University (Chongqing, China) Zebrafish Platform construction funds. This research is supported by the Russian Science Foundation grant 19-15-00053 . KAD is supported by the Russian Foundation for Basic Research ( RFBR) grant 18-34-00996 , Fellowship of the President of Russia and Special Rector’s Fellowship for SPSU PhD Students. DBR receives the CNPq research productivity grant (305051/2018-0), and his work is also supported by the PROEX/CAPES fellowhip grant 23038.004173/2019-93 (Brazil). MP receives funding from the British Academy (UK) . BDF is supported by a CAPES Foundation studentship (Brazil). FC is supported by the Father’s Foundation and the Fast Data Sharing-2036 programs. AVK is the Chair of the International Zebrafish Neuroscience Research Consortium (ZNRC) Special 2018-2019 Task Force that coordinated this multi-laboratory collaborative project
    corecore