10 research outputs found
The 3rd DBCLS BioHackathon: improving life science data integration with Semantic Web technologies.
BACKGROUND: BioHackathon 2010 was the third in a series of meetings hosted by the Database Center for Life Sciences (DBCLS) in Tokyo, Japan. The overall goal of the BioHackathon series is to improve the quality and accessibility of life science research data on the Web by bringing together representatives from public databases, analytical tool providers, and cyber-infrastructure researchers to jointly tackle important challenges in the area of in silico biological research. RESULTS: The theme of BioHackathon 2010 was the 'Semantic Web', and all attendees gathered with the shared goal of producing Semantic Web data from their respective resources, and/or consuming or interacting those data using their tools and interfaces. We discussed on topics including guidelines for designing semantic data and interoperability of resources. We consequently developed tools and clients for analysis and visualization. CONCLUSION: We provide a meeting report from BioHackathon 2010, in which we describe the discussions, decisions, and breakthroughs made as we moved towards compliance with Semantic Web technologies - from source provider, through middleware, to the end-consumer.RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are
Assessing the quality of Wikidata referencing
Wikidata is a versatile and broad-based Knowledge Graph (KG) that leverages the
power of collaborative contributions via an open wiki, augmented by bot accounts,
to curate the content. Wikidata represents over 102 million interlinked data entities,
accompanied by over 1.4 billion statements about the items, accessible to the public
via a SPARQL endpoint and diverse dump formats. The Wikidata data model enables assigning references to every single statement. While the quality of Wikidata
statements has been assessed, the quality of references in this knowledge graph is
not well covered in the literature. To cover the gap, we develop and implement
a comprehensive referencing quality assessment framework based on Linked Data
quality dimensions and criteria. We implement the objective metrics of the assessment framework as the Referencing Quality Scoring System - RQSS. RQSS provides
quantified scores by which the referencing quality can be analyzed and compared.
Due to the scale of Wikidata, we developed a subsetting approach to creating
a comparison platform that systematically samples Wikidata. We have used both
well-defined subsets and random samples to evaluate the quality of references in
Wikidata using RQSS. Based on RQSS, the overall referencing quality in Wikidata
subsets is 0.58 out of 1. Random subsets (representative of Wikidata) have higher
overall scores than topical subsets by 0.05, with Gene Wiki having the highest scores
amongst topical subsets. Regarding referencing quality dimensions, all subsets have
high scores in accuracy, availability, security, and understandability, but have weaker
scores in completeness, verifiability, objectivity, and versatility. RQSS scripts can
be reused to monitor the referencing quality over time. The evaluation shows that
RQSS is practical and provides valuable information, which can be used by Wikidata contributors and WikiProject owners to identify the referencing quality gaps.
Although RQSS is developed based on the Wikidata RDF model, its referencing
quality assessment framework can be generalized to any RDF KG.James Watt Scholarship fundin
Visualization Tools for Comparative Genomics applied to Convergent Evolution in Ash Trees
Assembly and analysis of whole genomes is now a routine part of genetic
research, but effective tools for the visualization of whole genomes and their
alignments are few. Here we present two approaches to allow such visualizations
to be done in an efficient and user-friendly manner. These allow researchers to
spot problems and patterns in their data and present them effectively.
First, FluentDNA is developed to tackle single full genome visualization and
assembly tasks by representing nucleotides as colored pixels in a zooming
interface. This enables users to identify features without relying on algorithmic
annotation. FluentDNA also supports visualizing pairwise alignments of wellassembled whole genomes from chromosome to nucleotide resolution.
Second, Pantograph is developed to tackle the problem of visualizing variation
among large numbers of whole genome sequences. This uses a graph genome
approach, which addresses many of the technical challenges of whole genome
multiple sequence alignments by representing aligned sequences as nodes which
can be shared by many individuals. Pantograph is capable of scaling to thousands
of individuals and is applied to SARS and A. thaliana pangenomes.
Alongside the development of these new genomics tools, comparative genomic
research was undertaken on worldwide species of ash trees. I assembled 13 ash
genomes and used FluentDNA to quality check the results and discovered
contaminants and a mitochondrial integration. I annotated protein coding genes
in 28 ash assemblies and aligned their gene families. Using phylogenetic analysis,
I identified gene duplications that likely occurred in an ancient whole genome
duplication shared by all ash species. I examined the fate of these duplicated
genes, showing that losses are concentrated in a subset of gene families more
often than predicted by a null model simulation. I conclude that convergent
evolution has occurred in the loss and retention of duplicated genes in different
ash species.BBSRC BB/S004661/
Data quality issues in electronic health records for large-scale databases
Data Quality (DQ) in Electronic Health Records (EHRs) is one of the core functions that play a decisive role to improve the healthcare service quality. The DQ issues in EHRs are a noticeable trend to improve the introduction of an adaptive framework for interoperability and standards in Large-Scale Databases (LSDB) management systems. Therefore, large data communications are challenging in the traditional approaches to satisfy the needs of the consumers, as data is often not capture directly into the Database Management Systems (DBMS) in a seasonably enough fashion to enable their subsequent uses. In addition, large data plays a vital role in containing plenty of treasures for all the fields in the DBMS. EHRs technology provides portfolio management systems that allow HealthCare Organisations (HCOs) to deliver a higher quality of care to their patients than that which is possible with paper-based records. EHRs are in high demand for HCOs to run their daily services as increasing numbers of huge datasets occur every day. Efficient EHR systems reduce the data redundancy as well as the system application failure and increase the possibility to draw all necessary reports. However, one of the main challenges in developing efficient EHR systems is the inherent difficulty to coherently manage data from diverse heterogeneous sources. It is practically challenging to integrate diverse data into a global schema, which satisfies the need of users. The efficient management of EHR systems using an existing DBMS present challenges because of incompatibility and sometimes inconsistency of data structures. As a result, no common methodological approach is currently in existence to effectively solve every data integration problem. The challenges of the DQ issue raised the need to find an efficient way to integrate large EHRs from diverse heterogeneous sources. To handle and align a large dataset efficiently, the hybrid algorithm method with the logical combination of Fuzzy-Ontology along with a large-scale EHRs analysis platform has shown the results in term of improved accuracy. This study investigated and addressed the raised DQ issues to interventions to overcome these barriers and challenges, including the provision of EHRs as they pertain to DQ and has combined features to search, extract, filter, clean and integrate data to ensure that users can coherently create new consistent data sets. The study researched the design of a hybrid method based on Fuzzy-Ontology with performed mathematical simulations based on the Markov Chain Probability Model. The similarity measurement based on dynamic Hungarian algorithm was followed by the Design Science Research (DSR) methodology, which will increase the quality of service over HCOs in adaptive frameworks
Discovering lesser known molecular players and mechanistic patterns in Alzheimer's disease using an integrative disease modelling approach
Convergence of exponentially advancing technologies is driving medical research with life changing discoveries. On the contrary, repeated failures of high-profile drugs to battle Alzheimer's disease (AD) has made it one of the least successful therapeutic area. This failure pattern has provoked researchers to grapple with their beliefs about Alzheimer's aetiology. Thus, growing realisation that Amyloid-β and tau are not 'the' but rather 'one of the' factors necessitates the reassessment of pre-existing data to add new perspectives. To enable a holistic view of the disease, integrative modelling approaches are emerging as a powerful technique. Combining data at different scales and modes could considerably increase the predictive power of the integrative model by filling biological knowledge gaps. However, the reliability of the derived hypotheses largely depends on the completeness, quality, consistency, and context-specificity of the data. Thus, there is a need for agile methods and approaches that efficiently interrogate and utilise existing public data. This thesis presents the development of novel approaches and methods that address intrinsic issues of data integration and analysis in AD research. It aims to prioritise lesser-known AD candidates using highly curated and precise knowledge derived from integrated data. Here much of the emphasis is put on quality, reliability, and context-specificity. This thesis work showcases the benefit of integrating well-curated and disease-specific heterogeneous data in a semantic web-based framework for mining actionable knowledge. Furthermore, it introduces to the challenges encountered while harvesting information from literature and transcriptomic resources. State-of-the-art text-mining methodology is developed to extract miRNAs and its regulatory role in diseases and genes from the biomedical literature. To enable meta-analysis of biologically related transcriptomic data, a highly-curated metadata database has been developed, which explicates annotations specific to human and animal models. Finally, to corroborate common mechanistic patterns — embedded with novel candidates — across large-scale AD transcriptomic data, a new approach to generate gene regulatory networks has been developed. The work presented here has demonstrated its capability in identifying testable mechanistic hypotheses containing previously unknown or emerging knowledge from public data in two major publicly funded projects for Alzheimer's, Parkinson's and Epilepsy diseases
Open Access Publishing and Scholarly Communication Among Greek Biomedical Scientists
urpose: The purpose of this research is to study in what ways the open access publishing can improve the scholarly communtication among biomedical sciences in Greece over a period of about five years and provide new roles for health librarians to support open access.\ud
Methods: The implementation of Critical Realism as research philosophy allowed the multi-level analysis of the research object; a mixture of research tools were used. Supplementary research methods were adopted to provide more accurate and reliable conclusions. The Literature review contributed to the identification of the open access publishing context and the relations which were forming and re-forming in it. Additionally, similar studies were found and the research gaps were identified as well. Bibliometrics demonstrated the participation of Greek scientists in world research could be evaluated. The research was conducted in five world databases (PUBMED, SCI, BIOMED CENTRAL, DOAJ, GOOGLE) for two different periods (2006-2007 and 2011). Publishers? aggrements provided information about the role of Greek biomedical publishers to the awareness of Greek biomedical scientists on journal related issues such as copyright. Additionally, and journal cost analysis presented publishers? subscription and open access policies and provided an approach of the costs requested for the access to journals. Web 2.0 offers new scholarly communication channels that seem to be cheaper and effective ones. The participation of Greek biomedical scientists in social networks such as ResearchGate, LinkedIn was analysed to evaluate the trends towards these new information sources. Case study methodology provided the qualitative and quantitative tools to explain the attitudes and awareness of Greek biomedical stakeholders about open access publishing and open access biomedical journals and also helped to the longitudinal study of the changes. A questionnaire survey among biomedical scientists took place in three phases (2007-early in 2010, September 2010 to May 2011). In addition, Greek biomedical publishers were interviewed in January and February 2010 .\ud
Findings: The bibliometric findings indicated an increasing participation of Greek scientists and Greek biomedical journals in world research. Greek biomedical scientists also use social networking as a means of scholarly communication. The questionnaire surveys showed that the physicians are the most active researchers and more familiar with the open access publishing concept. However, across all the phases the majority of Greek biomedical scientists seem to be unaware of aspects of publishing in open access journals, although by the third phase more participants seem to be aware. Greek biomedical publishers seem to approve the deposit in repositories, and the self-archiving process under specific terms, because, the publishers? agreements analysis demonstrated, the publishers want to be the copyright holders and information about authors? rights is omitted. Biomedical scientists are confused over copyright. As far as cost analyses are concerned, the journal prices depend on the publisher (commercial or scientific) and the subscriber (the institutional prices are higher than individual ones). The findngs were interpreted according to Roger?s diffusion of innovations theory and Lewin?s force field analysis.\ud
Conclusions: Open access seems to be acceptable in Greece but the stakeholders, including libraries, need to co-operate more. Greek academic biomedical libraries can actively reinforce the driving forces and reduce the restraining forces (around copyright, mainly) (Lewin?s Force Field Analysis) in order to move into the ?refreeze stage?. However, institutional repositories do seem to be an innovation that (according to Rogers? theory) will take time to develop
Recommended from our members
An Autoethnography of T9hacks: "Designing a Welcoming Hackathon for Women and Non-Binary Students to Learn and Explore Computing"
Student hackathons are a type of demographic-specific event that are aimed at college students. Students may attend hackathons because they provide an opportunity for informal learning, networking, and building products for social change Hackathons are usually designed to give their participants opportunities to learn or expand their technical skill sets. During the process of building a project, participants learn about project management, task delegation, and the organization and production of a hack with a working demo within the limited time span. Hackathons are great at giving their participants informal and incidental learning opportunities. Participants may have different goals or motivations for attending a hackathon that can change how they participate in the event. Student hackathons have been growing in popularity over the last decade and are only becoming more popular as the computing field grows in size and demand. In the 2017-2018 school year, over 71,000 students in North America and Europe and over participated in a student hackathon. In 2017, every US university with a top-ranked computer science department hosted at least one student hackathon. However, despite their popularity with students, research about student hackathons is sparse and little work has been done studying student experiences at these events. There are also fewer women attending hackathons than men, on average, only 23\% of the participants are women. This dissertation is situated within the existing hackathon literature and complements the work showing hackathons as places of informal and situated learning.This dissertation focuses on the design of a women and non-binary hackathon, T9Hacks. I founded T9Hacks in the Fall of 2015 and, with a team of undergraduate students, we hosted our first hackathon event in late-February 2016. T9Hacks is open to all students, but specifically encourages women and non-binary students to attend through marketing, structure, and strategic use of competition. Our mission has always been to create a welcoming and safe environment where women and non-binary students can learn and explore with computing. I was drawn to autoethnography as a way to analyze, interpret, and attach meaning to the design of T9Hacks. Autoethnography is a form of self reflection on one's personal experiences within a cultural context to look deeper at social interactions. Articulating the design choices that the team and I made created a list of design principles and lessons learned (listed below) and can give insight into the inclusive practices of student hackathons. This autoethnography discusses the design of T9Hacks, a women and non-binary hackathon, in regards to its branding, design of competition, and structures that supported our participants. I discuss the name of the event, the graphic design, and the labels and the ideologies and values associated with those choices; how the nature, value of prizes, and framing of the contests were impactful to students; how the professional development and technical resources we provided to the students satisfied their personal goals for attending the event. These elements of the hackathon changed multiple times through the most change and give insight into the challenges our team faced when trying to design an inclusive and welcoming hackathon. The decisions the T9Hacks team was faced with can help inform other hackathon designs as well
Generic autonomic adapter architecture and policy model for semantic socio-cyber-physical collaborative network
The cyber-physical system aims to improve the quality of life of citizens by providing intelligent and automated services in a wide variety of sectors like transportations, healthcare,enterprises, self-driving cars, energy sectors and so forth. Recently, considerable amounts of researches have focused on integrating cyber-physical systems in a social context. The idea is to socially connect cyber-physical resources (i.e., physical devices, software elements,networked components, digital contents, etc.) so that they can interact and collaborative for autonomous decision making like humans social networking. However, several challenges remain concerning the designing appropriate methodologies, frameworks and techniques for supporting cyber-physical relation and collaboration within the social context. Most of the existing social software modelling focuses on maintaining human-to-human or human-to object centric interaction only. Existing systems do not recognise how socio-cyber-physical resources can maintain their social status, communicate and interact with both humans and non human entities. The reason may be the lack of understanding and limited approaches or methodologies to semantically (a formal characterisation of the information) represent the socio-cyber-physical resources relation and interactions in a collaborative network. This limits data integration, interoperability, and knowledge discovery from its underlying data sources. Semantic Web’s ontology with a software agent model can help to overcome this limitation by describing and interconnecting socio-cyber-physical objects in a social space.The software agents can act as a representative of these resources to track, manage and update their collaborative activities in a social world.Nevertheless, due to the exponential network growth and uncertainties, the states and relations among socio-cyber-physical objects may keep changing when they are in different situations. Therefore, it is an ardours task and error-prone for humans or traditional software agents to keep track, manage and maintain the larger number of socio-cyber-physical resources and their social dynamics. One potential and flexible solution to this problem is to leverage the autonomic computing approach with social and adaptive goals to make the socio-cyber physical network self-managed and adaptive. Autonomic Computing (AC) approach has laid the necessary foundation to tackle this challenge by developing policy-based Autonomic Adapter (AA) model (e.g., autonomous agent). The AAs can continuously monitor socio cyber-physical resource status, analyse the situation and make a collaborative decision based on the policy knowledge defined by the system administrator. However, autonomic computing model must rely on input knowledge to decide self management operations such as “what”, “where” and “how” to perform the adaptation to the system. Previously, adaptation approaches in a different context have been done in an ad-hoc manner based on the algorithms to predict future circumstances and embed in the program code. This approach is inflexible to dynamic and uncertain environments where system configuration needs to adjust frequently. Defining a flexible policy model and integrating policy into knowledge repository outside the code itself is the most appropriate to manage the autonomic system behaviours during the run-time. Sadly, there has been relatively a little work on developing appropriate policy model and specification language for domain neutral autonomic system.To fulfil the above gaps, our proposed solutions in this thesis has three core contribution to the knowledge. First, we address the establishment of both socio-cyber-physical and human relations and interactions within a social-collaborative network. To achieve this, we propose a software agent-centric Semantic Social-Collaborative Network (SSCN) that provides the functionality to represent and manage cyber-physical resources in a social network. We discuss how nonhuman resources can be represented as socially connected nodes and manage by the software agents. The SSCN is supported by an extended ontology model for semantically describing the concept, properties and relations of human and nonhuman resources. A Java-based software agent API has been implemented to demonstrate some actions performed on behalf of the nonhuman resources in a real-world collaborative healthcare system called, GRiST (www.egrist.org). Second, we propose a Generic Autonomic Social-Collaborative Framework (GASCF) with a policy-based Autonomic Adapter (AA) architecture. The AAs are capable of monitoring system resources, analysing context information, and act accordingly using high-level policy. The AAs can also communicate and exchange data with other AAs through a social network for collaborative decisions making like human social interaction.Third, we propose Event-Condition-Action (ECA) rule-based policy model and specification language for AA by defining Policy Schema Definition (PSD) and Policy Script Specification(PSS) languages, modelled with XML syntax. Finally, we test and evaluate our approach by implementing it to the extended GRiST socio-healthcare service context and eGRiST clinical decision support system. We demonstrate and evaluate how socio-cyber-physical relation,interaction and autonomous decision-making is achieved by integrating AAs and using policy specification to manage AAs behaviour within socio-cyber-physical medical context
The evaluation and harmonisation of disparate information metamodels in support of epidemiological and public health research
BACKGROUND: Descriptions of data, metadata, provide researchers with the contextual information they need to achieve research goals. Metadata enable data discovery, sharing and reuse, and are fundamental to managing data across the research data lifecycle. However, challenges associated with data discoverability negatively impact on the extent to which these data are known by the wider research community. This, when combined with a lack of quality assessment frameworks and limited awareness of the implications associated with poor quality metadata, are hampering the way in which epidemiological and public health research data are documented and repurposed. Furthermore, the absence of enduring metadata management models to capture consent for record linkage metadata in longitudinal studies can hinder researchers from establishing standardised descriptions of consent. AIM: To examine how metadata management models can be applied to ameliorate the use of research data within the context of epidemiological and public health research. METHODS: A combination of systematic literature reviews, online surveys and qualitative data analyses were used to investigate the current state of the art, identify current perceived challenges and inform creation and evaluation of the models. RESULTS: There are three components to this thesis: a) enhancing data discoverability; b) improving metadata quality assessment; and c) improving the capture of consent for record linkage metadata. First, three models were examined to enhance research data discoverability: data publications, linked data on the World Wide Web and development of an online public health portal. Second, a novel framework to assess epidemiological and public health metadata quality framework was created and evaluated. Third, a novel metadata management model to improve capture of consent for record linkage metadata was created and evaluated. CONCLUSIONS: Findings from these studies have contributed to a set of recommendations for change in research data management policy and practice to enhance stakeholders’ research environment