Search CORE

15 research outputs found

Recommended from our members

The classification of gene products in the molecular biology domain: Realism, objectivity, and the limitations of the Gene Ontology

Author: Mayor Charlie
Publication venue
Publication date
Field of study

Background: Controlled vocabularies in the molecular biology domain exist to facilitate data integration across database resources. One such tool is the Gene Ontology (GO), a classification designed to act as a universal index for gene products from any species. The Gene Ontology is used extensively in annotating gene products and analysing gene expression data, yet very little research exists from a library and information science perspective exploring the design principles, philosophy and social role of ontologies in biology. Aim: To explore how molecular biologists, in creating the Gene Ontology, devised guidelines and rules for determining which scientific concepts are included in the ontology, and the criteria for how these concepts are represented. Methods: A domain analysis approach was used to devise a mixed methodology to study the design of the Gene Ontology. Concept analysis of a GO term and a critical discourse analysis of GO developer mailing list texts were used to test whether ontological realism is a tenable basis for constructing objective ontologies. A comparison of the current GO vocabulary construction guidelines and a study of the reasons why GO terms are removed from the ontology further explored the justifications for the design of the Gene Ontology. Finally, a content analysis of published GO papers examined how authors use and cite GO data and terminology. Results: Gene Ontology terms can be presented according to different epistemologies for concepts, indicating that ontological realism is not the only way objective ontologies can be designed. Social roles and the exercise of power were found to play an important role in determining ontology content, and poor synonym control, a lack of clear warrant for deciding terminology and arbitrary decisions to delete and invent new terms undermine the objectivity and universal applicability of the Gene Ontology. Authors exhibited poor compliance with GO data citation policies, and in re-wording and misquoting GO terminology, risk exacerbating the semantic problems this controlled vocabulary was designed to solve. Conclusions: The failure of the Gene Ontology to define what is meant by a molecular function, the exercise of power by GO developers in clearing contentious concepts from the ontology, and the strict adherence to ontological realism, which marginalises social and subjective ways of classifying scientific concepts, limits the utility of the ontology as a tool to unify the molecular biology domain. These limitations to the Gene Ontology design could be overcome with the development of lighter, pluralistic, user-controlled ‘open ontologies’ for gene products that can work alongside more traditional, ‘top-down’ developed vocabularies

City Research Online

Survey: Leakage and Privacy at Inference Time

Author: Jegorova Marija
Kaul Chaitanya
Mayor Charlie
Murray-Smith Roderick
O'Neil Alison Q.
Tsaftaris Sotirios A.
Weir Alexander
Publication venue
Publication date: 01/01/2021
Field of study

Edinburgh Research Explorer

Survey: Leakage and Privacy at Inference Time

Author: Jegorova Marija
Kaul Chaitanya
Mayor Charlie
Murray-Smith Roderick
O'Neil Alison Q.
Tsaftaris Sotirios A.
Weir Alexander
Publication venue
Publication date: 01/01/2021
Field of study

Leakage of data from publicly available Machine Learning (ML) models is an area of growing significance as commercial and government applications of ML can draw on multiple sources of data, potentially including users' and clients' sensitive data. We provide a comprehensive survey of contemporary advances on several fronts, covering involuntary data leakage which is natural to ML models, potential malevolent leakage which is caused by privacy attacks, and currently available defence mechanisms. We focus on inference-time leakage, as the most likely scenario for publicly available models. We first discuss what leakage is in the context of different data, tasks, and model architectures. We then propose a taxonomy across involuntary and malevolent leakage, available defences, followed by the currently available assessment metrics and applications. We conclude with outstanding challenges and open questions, outlining some promising directions for future research

arXiv.org e-Print Archive

Edinburgh Research Explorer

Enlighten

Final Report of the Independent Expert Group for the Unlocking the Value of Data Programme

Author: Birchenall Colin
Daly Angela
Kelly Ronnie
Mayor Charlie
Miyake Esperanza
Shah Ruchir
Sorbie Annie
Weir Alexander
Young Carol
Publication venue: Scottish Government
Publication date: 31/08/2023
Field of study

This report is the final output of the Independent Expert Group on the Unlocking the Value of Data programme, to the Scottish Government. This report is a Ministerial commission, and was originally commissioned by the former Minister for Business, Trade, Tourism and Enterprise

University of Strathclyde Institutional Repository

Use of “Hidden in Plain Sight” de-identification methodology in electronic healthcare data provides minimal risk of misidentification: Results from the iCAIRD Safe Haven Artificial Intelligence Platform.

Author: Alexander Weir
Charlie Mayor
David Harrison
James Blackwood
Jaroslaw Dymiter
Katie Wilde
Lesley Anderson
Luke Farrow
Moragh Boyle
Publication venue: Swansea University
Publication date: 01/08/2022
Field of study

Objectives To determine the risk of misidentification when using a “Hidden In Plain Sight (HIPS)” Named Entity Recognition (NER) de-identification methodology applied to Scottish healthcare data within The Industrial Centre for Artificial Intelligence Research in Digital Diagnostics (iCAIRD) Safe Haven Artificial Intelligence Platform (SHAIP). Approach Rather than the traditional redaction of potential identifiable information in routinely collected healthcare data, our HIPS methodology utilises an NER “find and replace” approach to de-identification that keeps the structure of text intact. This ensures that context is maintained, key to the interpretation of free text information and potential Artificial Intelligence applications. To our knowledge these methods have been previously untested on Scottish healthcare data. We therefore performed assessment of this approach in terms of potential risk of misidentification using HIPS on structured Scottish data deployed in SHAIP as part of the iCAIRD programme. Results Five individual cohorts, with a total of 169,964 patients were included. For each cohort the HIPS approach was applied, and then compared to actual patient information from within the same region, in order to determine the risk of misidentification. The following fields were included: Forename, Surname, Previous Name, Gender, Date of Birth (DOB), and Postcode. Across the five cohorts and varying combinations of identifiable data fields there were a total of 94 instances of potential misidentification (0.06%). 85/94 (90.4%) of these were for the combination of Gender, Date of Birth and Postcode. Across the five cohorts there were only 3 instances (0.002%) of Forename/Surname/DOB, and 5 instances (0.003%) of Forename/Surname/Postcode potential misidentification amongst the 169,964 patients. Conclusions The iCAIRD NER HIPS Methodology provides an acceptably low misidentification rate. Further work is now required to determine the recall and precision rates. Benefits of this approach include retaining the structure of free text, as well as reducing the ability to detect any potential leaked identifiable data

Directory of Open Access Journals

PubMed Central

Barriers and facilitators of cross-sectoral data linkage to inform healthy public policy and practice: lessons from three case study projects in Scotland.

Author: Charlie Mayor
David Henderson
Denise Brown
Emily Tweed
Kristina Cimova
Mirjam Allik
Nick Watson
Peter Craig
Petra Meier
Publication venue: Swansea University
Publication date: 01/08/2022
Field of study

Objectives We sought to describe barriers and facilitators faced by three research projects aiming to link routinely-collected data across various sectors, to produce evidence to inform healthy public policy. We conducted these case studies as a part of a wider research project on cross-sectoral sharing and linkage of secondary data. Approach We selected the case studies to cover a range of target populations and datasets. The chosen projects investigated (1) the health of care-experienced children; (2) the intersection of homelessness, justice involvement, drug use, and severe mental illness; (3) multi-morbidity among adults receiving social care. Information about timelines and governance processes was collected from lead investigators, including specific barriers and facilitators encountered, using a standardised pro forma and follow-up interviews. Thematic analysis was carried out by the research team, informed by themes identified in a parallel scoping review of existing literature on evidence use for healthy public policy and practice. Results Each project involved between 6 and 11 agencies, with co-ordination across multiple institutions and geographies proving challenging. Due to challenges encountered, all projects had to amend their original geographical or demographic scope. Forty-four barriers and facilitators to sharing and linkage of cross-sectoral routinely-collected data for public health research were identified. These included but were not limited to: integration of current data in an ever-changing linkage landscape; the need for timely feedback in undertaking the study; standardisation of information governance processes; highlighting the resourcing and funding issues for data linkage projects; the need for data controllers to recognise the value of such projects; and issues relating to staff turnover and workload pressures. Conclusion The interconnected nature of barriers and facilitators identified by the case studies suggests the importance of a whole-systems approach to cross-sectoral linkage. While literature offers relatively few case studies of cross-sectoral linkage for health research, the value of their insight into the linkage landscape derived from real-life experience is substantial

Directory of Open Access Journals

PubMed Central

A National Network of Safe Havens:A Scottish Perspective

Author: Anderson Lesley
Banks Christopher
Caldwell Jacqueline
Cole Christian
Duncan Chris
Gao Chuang
Gordon Sharon
Hall Christopher
Hume Alastair
Jefferson Emily
Linksted Pamela
Lumsden Joanne
Mayor Charlie
McGilchrist Mark
Mumtaz Shahzad
Munro Vicky
Sibley Michael
Stables Catherine
Wilde Katie
Wozniak Artur
Zurowski John
Publication venue: 'JMIR Publications Inc.'
Publication date: 09/07/2021
Field of study

For over a decade, Scotland has implemented and operationalized a system of Safe Havens, which provides secure analytics platforms for researchers to access linked, deidentified electronic health records (EHRs) while managing the risk of unauthorized reidentification. In this paper, a perspective is provided on the state-of-the-art Scottish Safe Haven network, including its evolution, to define the key activities required to scale the Scottish Safe Haven network’s capability to facilitate research and health care improvement initiatives. A set of processes related to EHR data and their delivery in Scotland have been discussed. An interview with each Safe Haven was conducted to understand their services in detail, as well as their commonalities. The results show how Safe Havens in Scotland have protected privacy while facilitating the reuse of the EHR data. This study provides a common definition of a Safe Haven and promotes a consistent understanding among the Scottish Safe Haven network and the clinical and academic research community. We conclude by identifying areas where efficiencies across the network can be made to meet the needs of population-level studies at scale

Aberdeen University Research

PubMed Central

University of Dundee Online Publications

Masses, radii, and orbits of small Kepler planets : The transition from gaseous to rocky planets

Author: A. Miglio
Adams
Alan Boss
Alan Gould
Albrecht
Alexandre Santerne
Andrea Dupree
Andrej Prsa
Andrew W. Howard
Avi Shporer
Batalha
Batalha
Batalha
Batygin
Borucki
Borucki
Buchhave
C. Karoff
Caldwell
Charlie Sobeck
Chris Burke
Christopher Henze
Chubak
Claire Moutou
Cochran
D. Stello
Daniel C. Fabrycky
Daniel Huber
David Barrado
David Charbonneau
David Ciardi
David G. Koch
David Morrison
David W. Latham
Debra A. Fischer
Demarque
Demory
Dimitar D. Sasselov
Douglas A. Caldwell
Douglas Hudgins
Dressing
Désert
Edna Devore
Elisa V. Quintana
Elisabeth Adams
Endl
Eric Agol
Eric B. Ford
Erik Brugamyer
Erik Petigura
Fabrycky
Fergal Mullally
Fischer
Fortney
Francois Fressin
Fressin
G. R. Davies
Gautier
Geoffrey W. Marcy
Gibor S. Basri
Gilliland
Guillaume Hébrard
Guillermo Torres
Hans Kjeldsen
Hansen
Hartman
Hirano
Horch
Hormuth
Howard
Howard Isaacson
Howell
Huber
Ida
Isaacson
Jack J. Lissauer
Jason F. Rowe
Jason H. Steffen
Jean-Michel Désert
Jeff Coughlin
Jeffrey Van Cleve
Jenkins
Jenkins
Jerome A. Orosz
Jessie Christiansen
Jill Tarter
John Asher Johnson
Johnson
Jon M. Jenkins
Jonathan J. Fortney
Jorge Lillo-Box
Joseph D. Twicken
Josh Carter
Joshua Winn
Justin R. Crepp
Jørgen Christensen-Dalsgaard
Lars A. Buchhave
Latham
Lauren M. Weiss
Leslie Rogers
Lissauer
Lissauer
Lissauer
Lissauer
Lithwick
Lopez
Lopez
M. Lundkvist
M. N. Lund
Mandel
Marcy
Marcy
Mark E. Everett
Martin Still
Matthew J. Holman
Mayor
Michael Endl
Michael R. Haas
Morton
Morton
Muirhead
Natalie M. Batalha
Paul Robertson
Peter Tenenbaum
Petigura
Philip W. Lucas
Phillip MacQueen
R. Handberg
Rappaport
Rea Kolbl
Roberto Sanchis-Ojeda
Roger Hunter
Rogers
Rogers
Rogers
Ronald L. Gilliland
Rowe
S. D. Kawaler
S. Hekker
Samuel N. Quinn
Sanchis-Ojeda
Sara Seager
Sarah Ballard
Sarbani Basu
Seager
Seager
Stephen T. Bryson
Steve B. Howell
Susan E. Thompson
T. L. Campante
T. R. Bedding
T. R. White
T. S. Metcalfe
Thomas Barclay
Thomas N. Gautier
Timothy M. Brown
Timothy Morton
Torres
Torres
Torres
V. Silva Aguirre
Valencia
Valenti
Van Cleve
Wang
Weiss
Weiss
William D. Cochran
William F. Welsh
William J. Borucki
William J. Chaplin
Wu
Wurm
Xie
Y. Elsworth
Yi
Zeng
Publication venue: 'IOP Publishing'
Publication date: 01/01/2014
Field of study

We report on the masses, sizes, and orbits of the planets orbiting 22 Kepler stars. There are 49 planet candidates around these stars, including 42 detected through transits and 7 revealed by precise Doppler measurements of the host stars. Based on an analysis of the Kepler brightness measurements, along with high-resolution imaging and spectroscopy, Doppler spectroscopy, and (for 11 stars) asteroseismology, we establish low false-positive probabilities (FPPs) for all of the transiting planets (41 of 42 have an FPP under 1%), and we constrain their sizes and masses. Most of the transiting planets are smaller than three times the size of Earth. For 16 planets, the Doppler signal was securely detected, providing a direct measurement of the planet's mass. For the other 26 planets we provide either marginal mass measurements or upper limits to their masses and densities; in many cases we can rule out a rocky composition. We identify six planets with densities above 5 g cm-3, suggesting a mostly rocky interior for them. Indeed, the only planets that are compatible with a purely rocky composition are smaller than 2 R ⊕. Larger planets evidently contain a larger fraction of low-density material (H, He, and H2O).Peer reviewedFinal Accepted Versio

HAL AMU

University of Birmingham Research Portal

International Migration, Integration and Social Cohesion online publications

University of Southern Queensland ePrints

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

HAL-INSU

University of Hertfordshire Research Archive

Texas ScholarWorks

SteatoSITE: an Integrated Gene-to-Outcome Data Commons for Precision Medicine Research in NAFLD

Author: Alexander Douglas
Bandiera Lucia
Dunbar Donald
Ellis Harriet
Fallowfield Jonathan
Guha Indra
Jimenez-Ramos Maria
Juncker-Jensen Anna
Kendall Timothy
Kohnen Gabriele
Konanahalli Prakash
Mayor Charlie
McColgan Michael
Menolascina Filippo
Minnier Jessica
Oien Karin
Ramachandran Prakash
Turner Frances
Publication venue
Publication date
Field of study

Nonalcoholic fatty liver disease (NAFLD) is the commonest cause of chronic liver disease worldwide and a growing healthcare burden. The pathobiology of NAFLD is complex, disease progression is variable and unpredictable, and there are no qualified prognostic biomarkers or licensed pharmacotherapies that can improve clinical outcomes; it represents an unmet precision medicine challenge. We established a retrospective multicentre national cohort of 940 patients, across the complete NAFLD spectrum, integrating quantitative digital pathology, hepatic RNA-sequencing and 5.67 million days of longitudinal electronic health record follow-up into a secure, searchable, open resource (SteatoSITE) to inform rational biomarker and drug development and facilitate personalised medicine approaches for NAFLD. A complementary web-based gene browser was also developed. Here, our initial analysis uncovers disease stage-specific gene expression signatures, pathogenic hepatic cell subpopulations and master regulator networks associated with disease progression in NAFLD. Additionally, we construct novel transcriptional risk prediction tools for the development of future hepatic decompensation events

Repository@Nottingham

An integrated gene-to-outcome multimodal database for metabolic dysfunction-associated steatotic liver disease

Author: Alam Masood
Alexander Douglas
Bandiera Lucia
Dunbar Donald R.
Ellis Harriet
Fallowfield Jonathan A.
Guha Indra Neil
Jimenez-Ramos Maria
Juncker-Jensen Anna
Kendall Timothy J.
Kohnen Gabriele
Konanahalli Prakash
Mayor Charlie
McColgan Michael D.
Menolascina Filippo
Minnier Jessica
Oien Karin A.
Ramachandran Prakash
Turner Frances
Publication venue: Nature Publishing Group
Publication date: 30/10/2023
Field of study

Metabolic dysfunction-associated steatotic liver disease (MASLD) is the commonest cause of chronic liver disease worldwide and represents an unmet precision medicine challenge. We established a retrospective national cohort of 940 histologically defined patients (55.4% men, 44.6% women; median body mass index 31.3; 32% with type 2 diabetes) covering the complete MASLD severity spectrum, and created a secure, searchable, open resource (SteatoSITE). In 668 cases and 39 controls, we generated hepatic bulk RNA sequencing data and performed differential gene expression and pathway analysis, including exploration of gender-specific differences. A web-based gene browser was also developed. We integrated histopathological assessments, transcriptomic data and 5.67 million days of time-stamped longitudinal electronic health record data to define disease-stage-specific gene expression signatures, pathogenic hepatic cell subpopulations and master regulator networks associated with adverse outcomes in MASLD. We constructed a 15-gene transcriptional risk score to predict future hepatic decompensation events (area under the receiver operating characteristic curve 0.86, 0.81 and 0.83 for 1-, 3- and 5-year risk, respectively). Additionally, thyroid hormone receptor beta regulon activity was identified as a critical suppressor of disease progression. SteatoSITE supports rational biomarker and drug development and facilitates precision medicine approaches for patients with MASLD

Repository@Nottingham