9,035 research outputs found
Cancer Surveillance using Data Warehousing, Data Mining, and Decision Support Systems
This article discusses how data warehousing, data mining, and decision support systems can reduce the national cancer burden or the oral complications of cancer therapies, especially as related to oral and pharyngeal cancers. An information system is presented that will deliver the necessary information technology to clinical, administrative, and policy researchers and analysts in an effective and efficient manner. The system will deliver the technology and knowledge that users need to readily: (1) organize relevant claims data, (2) detect cancer patterns in general and special populations, (3) formulate models that explain the patterns, and (4) evaluate the efficacy of specified treatments and interventions with the formulations. Such a system can be developed through a proven adaptive design strategy, and the implemented system can be tested on State of Maryland Medicaid data (which includes women, minorities, and children)
BioCloud Search EnGene: Surfing Biological Data on the Cloud
The massive production and spread of biomedical data around the web introduces new challenges related to identify computational approaches for providing quality search and browsing of web resources. This papers presents BioCloud Search EnGene (BSE), a cloud application that facilitates searching and integration of the many layers of biological information offered by public large-scale genomic repositories. Grounding on the concept of dataspace, BSE is built on top of a cloud platform that severely curtails issues associated with scalability and performance. Like popular online gene portals, BSE adopts a gene-centric approach: researchers can find their information of interest by means of a simple “Google-like” query interface that accepts standard gene identification as keywords. We present BSE architecture and functionality and discuss how our strategies contribute to successfully tackle big data problems in querying gene-based web resources. BSE is publically available at: http://biocloud-unica.appspot.com/
Semantic processing of EHR data for clinical research
There is a growing need to semantically process and integrate clinical data
from different sources for clinical research. This paper presents an approach
to integrate EHRs from heterogeneous resources and generate integrated data in
different data formats or semantics to support various clinical research
applications. The proposed approach builds semantic data virtualization layers
on top of data sources, which generate data in the requested semantics or
formats on demand. This approach avoids upfront dumping to and synchronizing of
the data with various representations. Data from different EHR systems are
first mapped to RDF data with source semantics, and then converted to
representations with harmonized domain semantics where domain ontologies and
terminologies are used to improve reusability. It is also possible to further
convert data to application semantics and store the converted results in
clinical research databases, e.g. i2b2, OMOP, to support different clinical
research settings. Semantic conversions between different representations are
explicitly expressed using N3 rules and executed by an N3 Reasoner (EYE), which
can also generate proofs of the conversion processes. The solution presented in
this paper has been applied to real-world applications that process large scale
EHR data.Comment: Accepted for publication in Journal of Biomedical Informatics, 2015,
preprint versio
Ontology-assisted database integration to support natural language processing and biomedical data-mining
Successful biomedical data mining and information extraction require a complete picture of biological phenomena such as genes, biological processes, and diseases; as these exist on different levels of granularity. To realize this goal, several freely available heterogeneous databases as well as proprietary structured datasets have to be integrated into a single global customizable scheme. We will present a tool to integrate different biological data sources by mapping them to a proprietary biomedical ontology that has been developed for the purposes of making computers understand medical natural language
FAIRness and Usability for Open-access Omics Data Systems
Omics data sharing is crucial to the biological research community, and the last decade or two has seen a huge rise in collaborative analysis systems, databases, and knowledge bases for omics and other systems biology data. We assessed the FAIRness of NASAs GeneLab Data Systems (GLDS) along with four similar kinds of systems in the research omics data domain, using 14 FAIRness metrics. The range of overall FAIRness scores was 6-12 (out of 14), average 10.1, and standard deviation 2.4. The range of Pass ratings for the metrics was 29-79%, Partial Pass 0-21%, and Fail 7-50%. The systems we evaluated performed the best in the areas of data findability and accessibility, and worst in the area of data interoperability. Reusability of metadata, in particular, was frequently not well supported. We relate our experiences implementing semantic integration of omics data from some of the assessed systems for federated querying and retrieval functions, given their shortcomings in data interoperability. Finally, we propose two new principles that Big Data system developers, in particular, should consider for maximizing data accessibility
Addendum to Informatics for Health 2017: Advancing both science and practice
This article presents presentation and poster abstracts that were mistakenly omitted from the original publication
- …