17 research outputs found

    Optimization of a parallel permutation testing function for the SPRINT R package

    Get PDF
    The statistical language R and its Bioconductor package are favoured by many biostatisticians for processing microarray data. The amount of data produced by some analyses has reached the limits of many common bioinformatics computing infrastructures. High Performance Computing systems offer a solution to this issue. The Simple Parallel R Interface (SPRINT) is a package that provides biostatisticians with easy access to High Performance Computing systems and allows the addition of parallelized functions to R. Previous work has established that the SPRINT implementation of an R permutation testing function has close to optimal scaling on up to 512 processors on a supercomputer. Access to supercomputers, however, is not always possible, and so the work presented here compares the performance of the SPRINT implementation on a supercomputer with benchmarks on a range of platforms including cloud resources and a common desktop machine with multiprocessing capabilities

    GPX-Macrophage Expression Atlas: A database for expression profiles of macrophages challenged with a variety of pro-inflammatory, anti-inflammatory, benign and pathogen insults

    Get PDF
    BACKGROUND: Macrophages play an integral role in the host immune system, bridging innate and adaptive immunity. As such, they are finely attuned to extracellular and intracellular stimuli and respond by rapidly initiating multiple signalling cascades with diverse effector functions. The macrophage cell is therefore an experimentally and clinically amenable biological system for the mapping of biological pathways. The goal of the macrophage expression atlas is to systematically investigate the pathway biology and interaction network of macrophages challenged with a variety of insults, in particular via infection and activation with key inflammatory mediators. As an important first step towards this we present a single searchable database resource containing high-throughput macrophage gene expression studies. DESCRIPTION: The GPX Macrophage Expression Atlas (GPX-MEA) is an online resource for gene expression based studies of a range of macrophage cell types following treatment with pathogens and immune modulators. GPX-MEA follows the MIAME standard and includes an objective quality score with each experiment. It places special emphasis on rigorously capturing the experimental design and enables the searching of expression data from different microarray experiments. Studies may be queried on the basis of experimental parameters, sample information and quality assessment score. The ability to compare the expression values of individual genes across multiple experiments is provided. In addition, the database offers access to experimental annotation and analysis files and includes experiments and raw data previously unavailable to the research community. CONCLUSION: GPX-MEA is the first example of a quality scored gene expression database focussed on a macrophage cellular system that allows efficient identification of transcriptional patterns. The resource will provide novel insights into the phenotypic response of macrophages to a variety of benign, inflammatory, and pathogen insults. GPX-MEA is available through the GPX website at

    Where data and journal content collide: what does it mean to 'publish your data'?

    No full text
    References: Burnhill, P. (2013). “Tales from The Keepers Registry: Serial Issues About Archiving & the Web” in Serials Review 39 (2013) 3–20 https://www.era.lib.ed.ac.uk/handle/1842/6682 Burnhill, P. (2014). “A Legacy of Inspiration and an Enduring Smile” in IASSIST Quarterly Special Issue: A Pioneer Data Librarian”. IASSIST Quarterly 2013: Spring, pp 18-27, April, 2014. http://iassistdata.org/iq/issue/37/1This presentation uses two studies to inform and illustrate thinking by a PI on what data could be considered as candidate for deposit into the University’s Datashare repository. The findings from these two studies have featured in conference presentations and in blogs but have not yet been ‘fixed’ in a journal article. This has also prompted thoughts on when it is sensible to deposit data and the prospect of doing so early under a pre-publication embargo. Three types of data are considered. The first type of data (Type A) comprise sources of data that are external to a project, the databases that are drawn upon in a study; although they are cited in publications they are often not the responsibility of the PI. The second type (Type B) are the assembled datasets that were used in the analyses, the findings from which are reported and used as evidence in published scholarly statements. The third type (Type C) are the ‘supplementary data’ (the data behind the graph) which enhance the publication of the results reported in scholarly statement, forming what might be regarded as a multi-part work on the Web. Study 1 forms part of the Hiberlink project funded by the Andrew Mellon Foundation into ‘reference rot’ which goes beyond link rot (404, not found) to engage with ‘content drift’ (when the content referenced at the end of the link has evolved, has changed dramatically, or has disappeared completely). This is an exploratory investigation into ‘reference rot’ for the references (c.46,000 URIs) made from c.7,000 e-theses doctoral theses to web-based resources. This project is being carried out at the University of Edinburgh (at EDINA and the Language Technology Group in the School of Informatics) jointly with the Research Library at Los Alamos National Laboratory. It has its main focus on references from hundreds of ‘000s of journal articles to the ‘web at large’: progress is reported at http://hiberlink.org . Study 2 is an ‘unfunded’ investigation associated with the Keepers Registry, a service that is run for Jisc to monitor which e-journals are being kept safe by the world’s archiving organisations in order to highlight which should be regarded as at risk of loss. As this is ‘indirectly funded’ it is unlikely to be noticed by the research office. This study has involved use of the logs of the UK OpenURL Router, distilling the 10.4m requests made annually by researchers and students in UK universities for articles through this link resolver service. The c.53,000 online serials identified by ISSN were cross checked against the reports into the Keepers Registry, noting that less than one third of those were being kept safe. Further details are found in Burnhill (2013) and by following links on http://edina.ac.uk . Opportunity is also taken to look to the new research objects that are ‘resident on the Web’, including the implications that may have for the integrity of the scholarly record given the dynamics of the Web. Not only is the Web becoming a dominant means to access scholarly statement but it is also enabling rich aggregations of linked content into composite web-based research objects, to include data as intrinsic to the statement. Moreover, as scholarly statement has become digital, so it has become malleable, with challenges to notions of fixity, citation and continuity of access. This shift to a broader view of scholarly works in digital format should not be regarded as completely new and alien but builds on an observation made thirty years ago by Sue Dodd in the pre-Web era of the Internet: “In the near future, libraries will have no choice but to become more involved with computerized files and programs. 
 There is no doubt that machine-readable data will play an even greater role in research and development programs of the future,” (Dodd, 1982 in Burnhill, 2014)

    Too many copies: confusion between duplication and versioning

    No full text
    Presentation at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014General Track, 24x7 PresentationsThe session was recorded and is available for watching (this presentation starts at 0:09:19)Duplication is a fundamental part of how the internet works. In a cloud environment, resources are expected to be available, always. Duplication provides the solution. Several copies scattered around the web ensure that at least one of them is accessible at any given time. Duplication is a good thing. But how do repositories deal with duplication? Should a repository hold more than one copy of a deposit? Open Access mandates encourage this duplication. Institutions request the author to deposit his copy with the institutional repository. Funders require a copy to be deposited in subject repositories. Obviously, publishers advertise their copy. CRIS receive deposits from aggregations and subject repositories. Aggregations collect deposits from institutional and subject repositories. An institutional repository may receive multiple copies of a research publication from different sources. Are these copies identical? How to differentiate them? Which copy should be kept? Should more than one version of a deposit be kept? Support for versioning is required in order to tag deposits and enable repository managers to select the appropriate copy. Quality of the metadata and annotation, provenance information and trust are important. A global approach that engages all systems and partners of the Repository Ecosystems is needed.Mewissen, Muriel (EDINA, The University of Edinburgh, United Kingdom)Stuart, Ian (EDINA, The University of Edinburgh, United Kingdom)Rees, Christine (EDINA, The University of Edinburgh, United Kingdom)Burnhill, Peter (EDINA, The University of Edinburgh, United Kingdom

    Repository Junction Broker

    No full text
    Poster at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014Posters, Demos and Developer "How-To's"The Repository Junction Broker (RJ Broker) offers a brokering pilot service for the delivery of research output between multiple data suppliers such as publishers and subject repositories to multiple institutional repositories (IRs). The RJ Broker accepts data objects, parses the metadata to determine appropriate target repositories and transfers the data to repositories registered with the service. Notification/Alert emails can be provided to repositories that are not registered for direct delivery. A web based user interface and APIs also allow the browsing and downloading of all Open Access (OA) content in the RJ Broker. The RJ Broker supports Open Access and aims to increase the number of deposits in repositories while minimising the efforts required of potential depositors and repository staff, and thereby maximising distribution and exposure of research outputs. Successful trials have taken place and preparations for the service launch are on-going.Mewissen, Muriel (EDINA, The University of Edinburgh, United Kingdom)Stuart, Ian (EDINA, The University of Edinburgh, United Kingdom)Rees, Christine (EDINA, The University of Edinburgh, United Kingdom)Burnhill, Peter (EDINA, The University of Edinburgh, United Kingdom

    Compiling and optimizing for decoupled architectures

    No full text

    Parallel classification and feature selection in microarray data using SPRINT

    Get PDF
    The statistical language R is favoured by many biostatisticians for processing microarray data. In recent times, the quantity of data that can be obtained in experiments has risen significantly, making previously fast analyses time consuming or even not possible at all with the existing software infrastructure. High performance computing (HPC) systems offer a solution to these problems but at the expense of increased complexity for the end user. The Simple Parallel R Interface is a library for R that aims to reduce the complexity of using HPC systems by providing biostatisticians with drop‐in parallelised replacements of existing R functions. In this paper we describe parallel implementations of two popular techniques: exploratory clustering analyses using the random forest classifier and feature selection through identification of differentially expressed genes using the rank product method
    corecore