Search CORE

11 research outputs found

Enabling Interactive Analytics of Secure Data using Cloud Kotta

Author: Babuji Yadu N.
Chard Kyle
Duede Eamon
Publication venue
Publication date: 28/04/2017
Field of study

Research, especially in the social sciences and humanities, is increasingly reliant on the application of data science methods to analyze large amounts of (often private) data. Secure data enclaves provide a solution for managing and analyzing private data. However, such enclaves do not readily support discovery science---a form of exploratory or interactive analysis by which researchers execute a range of (sometimes large) analyses in an iterative and collaborative manner. The batch computing model offered by many data enclaves is well suited to executing large compute tasks; however it is far from ideal for day-to-day discovery science. As researchers must submit jobs to queues and wait for results, the high latencies inherent in queue-based, batch computing systems hinder interactive analysis. In this paper we describe how we have augmented the Cloud Kotta secure data enclave to support collaborative and interactive analysis of sensitive data. Our model uses Jupyter notebooks as a flexible analysis environment and Python language constructs to support the execution of arbitrary functions on private data within this secure framework.Comment: To appear in Proceedings of Workshop on Scientific Cloud Computing, Washington, DC USA, June 2017 (ScienceCloud 2017), 7 page

arXiv.org e-Print Archive

Crossref

Big Data Science Training Program at a Minority Serving Institution: Processes and Initial Outcomes

Author: Behseta Sam
Chandler Laura
Charles Shana
Chauhan Harmanpreet
Cuajungco Math P.
McEligot Archana Jaiswal
Mitra Sinjini
Rusmevichientong Pimbucha
Publication venue: 'California State University Fullerton'
Publication date: 01/06/2018
Field of study

California State University (CSU): Open Journal Systems

Can the NHS be a learning healthcare system in the age of digital technology?

Author: Banerjee A
Drumright LN
Mitchell ARJ
Publication venue
Publication date: 01/10/2018
Field of study

‘Big data’ is defined by ‘7 V’s’: volume (most frequently cited1), velocity, veracity, variety, volatility, validity and value. In healthcare, ‘big data’ is associated with a step-change in the way information is gathered, analysed and used to facilitate disease management and prevention. With greater electronic data capture, there is enthusiasm for increased safety, efficiency and effectiveness in health and social care through, for example, machine learning and other forms of artificial intelligence (AI). However, factors maintaining and widening the gap between the promise and the reality need to be addressed

UCL Discovery

Reproducible big data science: A case study in continuous FAIRness.

Author: Chard Kyle
D\u27Arcy Mike
Deutsch Eric W
Foster Ian
Funk Cory C
Glusman Gustavo
Heavner Ben
Jung Segun C
Kesselman Carl
Madduri Ravi
Price Nathan D
Richards Matthew A
Rodriguez Alexis
Shannon Paul
Sulakhe Dinanath
Publication venue: Providence St. Joseph Health Digital Commons
Publication date: 01/01/2019
Field of study

Big biomedical data create exciting opportunities for discovery, but make it difficult to capture analyses and outputs in forms that are findable, accessible, interoperable, and reusable (FAIR). In response, we describe tools that make it easy to capture, and assign identifiers to, data and code throughout the data lifecycle. We illustrate the use of these tools via a case study involving a multi-step analysis that creates an atlas of putative transcription factor binding sites from terabytes of ENCODE DNase I hypersensitive sites sequencing data. We show how the tools automate routine but complex tasks, capture analysis algorithms in understandable and reusable forms, and harness fast networks and powerful cloud computers to process data rapidly, all without sacrificing usability or reproducibility-thus ensuring that big data are not hard-to-(re)use data. We evaluate our approach via a user study, and show that 91% of participants were able to replicate a complex analysis involving considerable data volumes

Providence St. Joseph Health Digital Commons

Big data and tactical analysis in elite soccer: future challenges and opportunities for sports science

Author: A Baca
A Borrie
A Dutt-Mazumder
A Fujimura
A Grunz
A Lees
A Nevill
A Pikovsky
A Ric
A Tenga
A Tenga
A Yiannakos
AJ Coutts
AK Waljee
AM Noor
AW Toga
AW Toga
B Drust
B Goncalves
B Goncalves
C Carling
C Carling
C Carling
C Carling
C Carling
C Carling
C Collet
C Lago
C Lynch
C Ohmann
C Xue-wen
CH Almeida
CM Bishop
D Araújo
D Blankenberg
D Memmert
D Reed
DJ Watts
DS Valter
E Baro
E Gibney
E Rampinini
FA Moura
FA Moura
FE Ehrmann
FF Costa
G Appelboom
G Barton
G Bisanz
G Kong
G Romanillos
GE Hinton
GR Mota da
H Folgado
H Liu
H Sarmento
H Vogel
HU Bauer
J Bloomfield
J Castellano
J Fernandez-Navarro
J Gama
J Garganta
J Goecks
J Gudmundsson
J Ingebrigtsen
J Perl
J Perl
J Perl
J Sampaio
J Sampaio
J-F Grehaigne
J-F Gréhaigne
J-F Gréhaigne
J-F Gréhaigne
JA Harley
JP Mesirov
K Knauf
K Sitto
L Fradua
L Hood
M Aguiar
M Beetz
M Buchheit
M Bush
M Kempe
M Kempe
M Lames
M Lewis
M Mohr
M Shafizadehkenari
M Vogelbein
MD Hughes
MD Hughes
MI Jordan
N Balagué
N James
N James
N Jones
P Kostkova
P Passos
P Pääkkönen
P Silva
PB Shull
PS Glazier
R Bartlett
R Duarte
R Duarte
R Duarte
R Kannekens
R Leser
R Mackenzie
R Montoliu
R Nakanishi
S Barris
S Fonseca
S Haykin
S Kim
SB Olthof
SM Pincus
T D’Orazio
T Kohonen
T Kohonen
T McGarry
TR Levine
V Marx
V Salvo di
W Frencken
W Frencken
W Frencken
WL Lu
Y LeCun
Y Yu
Y Zhang
Z Yue
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Regulating pharmacogenetics: knowledge management, risk assessments and biothics

Author: Tzimopoulou Apostolia
Τζιμοπούλου Αποστολία
Publication venue
Publication date: 01/01/2022
Field of study

University of Thessaly Institutional Repository

New approaches for unsupervised transcriptomic data analysis based on Dictionary learning

Author: Rams Mona
Publication venue
Publication date: 01/01/2022
Field of study

The era of high-throughput data generation enables new access to biomolecular profiles and exploitation thereof. However, the analysis of such biomolecular data, for example, transcriptomic data, suffers from the so-called "curse of dimensionality". This occurs in the analysis of datasets with a significantly larger number of variables than data points. As a consequence, overfitting and unintentional learning of process-independent patterns can appear. This can lead to insignificant results in the application. A common way of counteracting this problem is the application of dimension reduction methods and subsequent analysis of the resulting low-dimensional representation that has a smaller number of variables. In this thesis, two new methods for the analysis of transcriptomic datasets are introduced and evaluated. Our methods are based on the concepts of Dictionary learning, which is an unsupervised dimension reduction approach. Unlike many dimension reduction approaches that are widely applied for transcriptomic data analysis, Dictionary learning does not impose constraints on the components that are to be derived. This allows for great flexibility when adjusting the representation to the data. Further, Dictionary learning belongs to the class of sparse methods. The result of sparse methods is a model with few non-zero coefficients, which is often preferred for its simplicity and ease of interpretation. Sparse methods exploit the fact that the analysed datasets are highly structured. Indeed, a characteristic of transcriptomic data is particularly their structuredness, which appears due to the connection of genes and pathways, for example. Nonetheless, the application of Dictionary learning in medical data analysis is mainly restricted to image analysis. Another advantage of Dictionary learning is that it is an interpretable approach. Interpretability is a necessity in biomolecular data analysis to gain a holistic understanding of the investigated processes. Our two new transcriptomic data analysis methods are each designed for one main task: (1) identification of subgroups for samples from mixed populations, and (2) temporal ordering of samples from dynamic datasets, also referred to as "pseudotime estimation". Both methods are evaluated on simulated and real-world data and compared to other methods that are widely applied in transcriptomic data analysis. Our methods convince through high performance and overall outperform the comparison methods

Institutional Repository of the Freie Universität Berlin

Precision health approaches: ethical considerations for health data processing

Author: Geneviève Lester Darryl
Publication venue
Publication date: 01/01/2021
Field of study

This thesis provides insights and recommendations on some of the most crucial elements necessary for an effective, legally and ethically sound implementation of precision health approaches in the Swiss context (and beyond), specifically for precision medicine and precision public health. In this regard, this thesis recognizes the centrality of data in these two abovementioned domains, and the ethical and scientific imperative of ensuring the widespread and responsible sharing of high quality health data between the numerous stakeholders involved in healthcare, public health and associated research domains. It also recognizes the need to protect not only the interests of data subjects but also those of data processors. Indeed, it is only through a comprehensive assessment of the needs and expectations of each and every one regarding data sharing activities that sustainable solutions to known ethical and scientific conundrums can be devised and implemented. In addition, the included chapters in this thesis emphasize recommending solutions that could be convincingly applied to real world problems, with the ultimate objective of having a concrete impact on clinical and public health practice and policies, including research activities. Indeed, the strengths of this thesis reside in a careful and in-depth interdisciplinary assessment of the different issues at stake in precision health approaches, with the elaboration of the least disruptive solutions (as far as possible) and recommendations for an easy evaluation and subsequent adoption by relevant stakeholders active in these two domains. This thesis has three main objectives, namely (i) to investigate and identify factors influencing the processing of health data in the Swiss context and suggest some potential solutions and recommendations. A better understanding of these factors is paramount for an effective implementation of precision health approaches given their strong dependence on high quality and easily accessible health datasets; (ii) to identify and explore the ethical, legal and social issues (ELSI) of innovative participatory disease surveillance systems – also falling under precision health approaches – and how research ethics are coping within this field. In addition, this thesis aims to strengthen the ethical approaches currently used to cater for these ELSIs by providing a robust ethical framework; and lastly, (iii) to investigate how precision health approaches might not be able to achieve their social justice and health equity goals, if the impact of structural racism on these initiatives is not given due consideration. After a careful assessment, this thesis provides recommendations and potential actions that could help these precision health approaches adhere to their social justice and health equity goals. This thesis has investigated these three main objectives using both empirical and theoretical research methods. The empirical branch consists of systematic and scoping reviews, both adhering to the PRISMA guidelines, and two interview-based studies carried out with Swiss expert stakeholders. The theoretical branch consists of three chapters, each addressing important aspects concerning precision health approaches

edoc

Big biomedical data as the key resource for discovery science

Author: Arthur W Toga
Benjamin D Heavner
Carl Kesselman
Carver
Crawford
Deutsch
Dinov
Eric W Deutsch
Foster
Foster
Glusman
Glusman
Glusman
Gustavo Glusman
Howe
Ian Foster
Ivo D Dinov
John Van Horn
Joseph Ames
Keller
Keller
Kyle Chard
Leroy Hood
Marazita
Nathan D Price
Nesvizhskii
Neu
Rajasekar
Ravi Madduri
Roach
Roger Kramer
Schnase
Schuler
Schuler
Shoshani
Slagel
Smithies
Stef-Praun
Toga
Toga
Van Horn
Van Horn
Whitcher
Publication venue: 'Oxford University Press (OUP)'
Publication date
Field of study

Crossref

The evaluation and harmonisation of disparate information metamodels in support of epidemiological and public health research

Author: McMahon Christiana
Publication venue: UCL (University College London)
Publication date: 28/10/2017
Field of study

BACKGROUND: Descriptions of data, metadata, provide researchers with the contextual information they need to achieve research goals. Metadata enable data discovery, sharing and reuse, and are fundamental to managing data across the research data lifecycle. However, challenges associated with data discoverability negatively impact on the extent to which these data are known by the wider research community. This, when combined with a lack of quality assessment frameworks and limited awareness of the implications associated with poor quality metadata, are hampering the way in which epidemiological and public health research data are documented and repurposed. Furthermore, the absence of enduring metadata management models to capture consent for record linkage metadata in longitudinal studies can hinder researchers from establishing standardised descriptions of consent. AIM: To examine how metadata management models can be applied to ameliorate the use of research data within the context of epidemiological and public health research. METHODS: A combination of systematic literature reviews, online surveys and qualitative data analyses were used to investigate the current state of the art, identify current perceived challenges and inform creation and evaluation of the models. RESULTS: There are three components to this thesis: a) enhancing data discoverability; b) improving metadata quality assessment; and c) improving the capture of consent for record linkage metadata. First, three models were examined to enhance research data discoverability: data publications, linked data on the World Wide Web and development of an online public health portal. Second, a novel framework to assess epidemiological and public health metadata quality framework was created and evaluated. Third, a novel metadata management model to improve capture of consent for record linkage metadata was created and evaluated. CONCLUSIONS: Findings from these studies have contributed to a set of recommendations for change in research data management policy and practice to enhance stakeholders’ research environment

UCL Discovery