30 research outputs found

    Balancing shared infrastructure and domain-specific tools in data management

    Get PDF
    Presentation at RepoFringe 2017, 'Balancing shared infrastructure and domain-specific tools in data management' by Andrew Millar, Eilidh Troup andTomasz Zielińsk

    From blank page to electronic labbook: turning the University wiki into an Electronic Lab Notebook

    Get PDF
    Funders and governments are increasingly requiring researchers to make their data Findable, Accessible, Interoperable, and Reusable (FAIR) in order to improve quality of research and promote open science. The main driving force behind open science is the re-use of data, but if the data is to be re-used it must be well described. The current culture of providing metadata at the point of sharing is inefficient, it is treated as an extra burden by researchers and tends to lead to limited and bad quality metadata. We are looking at ways in which metadata can be captured as part of the research workflow, as it is the most optimal time for recording such information. The University is uniquely positioned to provide a data management services with assured longevity and security as opposed to a solutions created or contracted by individual research groups. With the aim of helping scientists make their data reusable, we have focussed on the first step of capturing information electronically when it is created, adapting the Confluence wiki provided by the University to act as a fully functional ELN. The wiki as lab notebook can be tailored to meet the needs of individual groups, it improves communication within the group and reduces the risk of information being lost when researchers or students leave. We are going to present how, by adopting this university service we can provide an affordable and sustainable solution to early metadata acquisition. A video of this presentation can be viewed at https://media.ed.ac.uk/media/0_3gsk6up

    Practical evaluation of SEEK and OpenBIS for biological data management in SynthSys; first report

    Get PDF
    Author contributions: ET and TZ evaluated systems and developed software; IC, PS and AJM provided use cases; AJM and TZ designed the evaluation; ET, TZ and AJM wrote the report with input from all authors. Acknowledgements: We gratefully acknowledge training and support from the SEEK and OpenBIS project teams, who also checked this document.The project evaluated two existing data management systems for a small set of users, who represent diverse needs within the SynthSys Centre, in order to inform wider adoption for biological research. SEEK’s strengths are support for the Investigation, Study, Assay (ISA) standard and a fine grained access control. This makes SEEK an excellent tool for collaborative work and publishing results. OpenBIS is well suited for automatic metadata processing and incorporation into analysis workflows. Both data management systems provided useful and complementary functionality, so our recommendation is that both are hosted for use in SynthSys. This also aligns well with the EU FAIRDOM project which is currently integrating SEEK and OpenBIS into one platform

    Practical evaluation of SEEK and OpenBIS for biological data management in SynthSys; second report.

    Get PDF
    Author contributions: ET and TZ modified and configured the software; KT provided servers and services; TZ and AJM wrote the report with input from all authors.The objective of this joint project between University Information Services (IS) and the School of Biological Sciences (SBS) is to evaluate the provision of Biological Data Management systems and their integration with University Research Data Management solutions. The long-term aim is to comply with the University and Funder data mandates, while also adding value to ongoing research in SBS and the wider University. The benefits from streamlining data management would help to balance the School and user investment in establishing and adopting data management systems

    Strengths and limitations of period estimation methods for circadian data

    Get PDF
    A key step in the analysis of circadian data is to make an accurate estimate of the underlying period. There are many different techniques and algorithms for determining period, all with different assumptions and with differing levels of complexity. Choosing which algorithm, which implementation and which measures of accuracy to use can offer many pitfalls, especially for the non-expert. We have developed the BioDare system, an online service allowing data-sharing (including public dissemination), data-processing and analysis. Circadian experiments are the main focus of BioDare hence performing period analysis is a major feature of the system. Six methods have been incorporated into BioDare: Enright and Lomb-Scargle periodograms, FFT-NLLS, mFourfit, MESA and Spectrum Resampling. Here we review those six techniques, explain the principles behind each algorithm and evaluate their performance. In order to quantify the methods' accuracy, we examine the algorithms against artificial mathematical test signals and model-generated mRNA data. Our re-implementation of each method in Java allows meaningful comparisons of the computational complexity and computing time associated with each algorithm. Finally, we provide guidelines on which algorithms are most appropriate for which data types, and recommendations on experimental design to extract optimal data for analysis

    Strengths and Limitations of Period Estimation Methods for Circadian Data

    No full text
    A key step in the analysis of circadian data is to make an accurate estimate of the underlying period. There are many different techniques and algorithms for determining period, all with different assumptions and with differing levels of complexity. Choosing which algorithm, which implementation and which measures of accuracy to use can offer many pitfalls, especially for the non-expert. We have developed the BioDare system, an online service allowing data-sharing (including public dissemination), data-processing and analysis. Circadian experiments are the main focus of BioDare hence performing period analysis is a major feature of the system. Six methods have been incorporated into BioDare: Enright and Lomb-Scargle periodograms, FFT-NLLS, mFourfit, MESA and Spectrum Resampling. Here we review those six techniques, explain the principles behind each algorithm and evaluate their performance. In order to quantify the methods ’ accuracy, we examine the algorithms against artificial mathematical test signals and model-generated mRNA data. Our re-implementation of each method in Java allows meaningful comparisons of the computational complexity and computing time associated with each algorithm. Finally, we provide guidelines on which algorithms are most appropriate for which data types, and recommendations on experimental design to extract optimal data for analysis

    Impact of baseline trend on periods estimates.

    No full text
    <p>Data sets with different levels of baseline trend and different trend forms were analysed using all the methods and the mean period value is reported in the table (standard deviation is given in brackets). Data sets were created by taking a standard pulse signal data set (5 days data, hourly sampled, 80% walking noise level, 24 h underlying period) and adding to it 5 different envelope shapes with increasing amplitude. 1) The trend/envelope shapes: linear increase (lin); exponential increase (exp); inverse parabola (ipar); 2/3 inverse parabola (2/3ipar) and 1/3 parabola (1/3par). 2) The baseline level is defined as ration between trend total amplitude and the original signal amplitude (0 no trend, 20 trend is 20 times higher than signal). See SI <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0096462#pone.0096462.s014" target="_blank">Table S7</a> for the full table.</p

    Impact of sampling frequency on period estimates (data sets with walking noise).

    No full text
    <p>Data sets with different time intervals and selected durations were analysed using all the methods and the mean period value is reported in the table (standard deviations are omitted for clarity). Data sets were created by adding walking noise of 80% of the original signal amplitude to the templates of different duration and time interval between points. The underlying period was 24.08 h for asym data and 24.00 h for the other signals. 1) The base shape of the signal: cosine (cos), pulse (pul); double pulse (dpl); shoulder (shl) and moderate asymmetry (asym), (all) represents aggregated results from all the sets. 2) the time interval (sampling frequency) in the data set and in brackets the data duration. See SI <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0096462#pone.0096462.s012" target="_blank">Table S5</a> for the full table.</p

    Impact of sampling frequency on absolute error (uniform noise data set).

    No full text
    <p>Data sets with different time intervals and selected durations were analysed using all the methods and the mean absolute error value is plotted. Data sets were created by adding 80% uniform noise to the templates of different duration and time interval between points. The X axis represents time intervals with data duration in brackets. The underlying period was 24.08 h for asym. data and 24.00 h for the other signals. A) cosine data, B) pulse data, C) double pulse data, D) DNFL shoulder data, E) DNFL asymmetry data, F) aggregated results from all the shapes.</p
    corecore