Data integration and handling

Abstract

Modern technology allows researchers to generate data at an ever increasing rate, outpacing the capacity of researchers to analyse it. Developing automated support systems for the collection, management and distribution of information is therefore an important step to reduce error rates and accelerate progress to enable high-quality research based on big data volumes. This thesis encompasses five articles, describing strategies for the creation of technical research platforms, as well as descriptions of the technical platforms themselves. The key conclusion of the thesis is that technical solutions for many issues have been available for a long time. These technical solutions are however overlooked, or simply ignored, if they fail to recognise the social dimensions of the issues they try to solve. The Molecular Methods database is an example of a technically sound but only partially successful solution in regards to social viability. Thousands of researchers have used the website to access protocols, but only a handful have shared their own work on MolMeth. Experiences from the Molecular Methods database and other projects have provided a foundation for studies supporting the development of the eB3Kit The eB3Kit is a portable, robust and scalable informatics platform for structured data management. Deploying the platform enables research groups to carry out advanced research projects with very limited means. With the eB3Kit researchers can integrate data from a wide variety of sources, including the local laboratory information management system and analyse it using the Galaksio interface. Galaksio provides user friendly access to the Galaxy workflow management system and provides eB3Kit users with access to tools developed by a far larger user community than the one actively developing the eB3Kit. Using a workflow management system improves reproducibility and enables bioinformaticians to prepare workflows without directly accessing ethically or commercially sensitive data. Therefore, it is especially well- suited for applications where researchers are worried about privacy and during disease outbreaks where persistent storage and analysis capacity must be established quickly

    Similar works