9,480 research outputs found
XML Schema Clustering with Semantic and Hierarchical Similarity Measures
With the growing popularity of XML as the data representation language, collections of the XML data are exploded in numbers. The methods are required to manage and discover the useful information from them for the improved document handling. We present a schema clustering process by organising the heterogeneous XML schemas into various groups. The methodology considers not only the linguistic and the context of the elements but also the hierarchical structural similarity. We support our findings with experiments and analysis
Method for the semantic indexing of concept hierarchies, uniform representation, use of relational database systems and generic and case-based reasoning
This paper presents a method for semantic indexing and describes its
application in the field of knowledge representation. Starting point of the
semantic indexing is the knowledge represented by concept hierarchies. The goal
is to assign keys to nodes (concepts) that are hierarchically ordered and
syntactically and semantically correct. With the indexing algorithm, keys are
computed such that concepts are partially unifiable with all more specific
concepts and only semantically correct concepts are allowed to be added. The
keys represent terminological relationships. Correctness and completeness of
the underlying indexing algorithm are proven. The use of classical relational
databases for the storage of instances is described. Because of the uniform
representation, inference can be done using case-based reasoning and generic
problem solving methods
Improving ICD-based semantic similarity by accounting for varying degrees of comorbidity
Finding similar patients is a common objective in precision medicine,
facilitating treatment outcome assessment and clinical decision support.
Choosing widely-available patient features and appropriate mathematical methods
for similarity calculations is crucial. International Statistical
Classification of Diseases and Related Health Problems (ICD) codes are used
worldwide to encode diseases and are available for nearly all patients.
Aggregated as sets consisting of primary and secondary diagnoses they can
display a degree of comorbidity and reveal comorbidity patterns. It is possible
to compute the similarity of patients based on their ICD codes by using
semantic similarity algorithms. These algorithms have been traditionally
evaluated using a single-term expert rated data set.
However, real-word patient data often display varying degrees of documented
comorbidities that might impair algorithm performance. To account for this, we
present a scale term that considers documented comorbidity-variance. In this
work, we compared the performance of 80 combinations of established algorithms
in terms of semantic similarity based on ICD-code sets. The sets have been
extracted from patients with a C25.X (pancreatic cancer) primary diagnosis and
provide a variety of different combinations of ICD-codes. Using our scale term
we yielded the best results with a combination of level-based information
content, Leacock & Chodorow concept similarity and bipartite graph matching for
the set similarities reaching a correlation of 0.75 with our expert's ground
truth. Our results highlight the importance of accounting for comorbidity
variance while demonstrating how well current semantic similarity algorithms
perform.Comment: 11 pages, 6 figures, 1 tabl
A Neuro-Ontology for the Neurological Examination
Background: The Use of Clinical Data in Electronic Health Records for Machine-Learning or Data Analytics Depends on the Conversion of Free Text into Machine-Readable Codes. We Have Examined the Feasibility of Capturing the Neurological Examination as Machine-Readable Codes based on UMLS Metathesaurus Concepts. Methods: We Created a Target Ontology for Capturing the Neurological Examination using 1100 Concepts from the UMLS Metathesaurus. We Created a Dataset of 2386 Test-Phrases based on 419 Published Neurological Cases. We Then Mapped the Test-Phrases to the Target Ontology. Results: We Were Able to Map All of the 2386 Test-Phrases to 601 Unique UMLS Concepts. a Neurological Examination Ontology with 1100 Concepts Has Sufficient Breadth and Depth of Coverage to Encode All of the Neurologic Concepts Derived from the 419 Test Cases. using Only Pre-Coordinated Concepts, Component Ontologies of the UMLS, Such as HPO, SNOMED CT, and OMIM, Do Not Have Adequate Depth and Breadth of Coverage to Encode the Complexity of the Neurological Examination. Conclusion: An Ontology based on a Subset of UMLS Has Sufficient Breadth and Depth of Coverage to Convert Deficits from the Neurological Examination into Machine-Readable Codes using Pre-Coordinated Concepts. the Use of a Small Subset of UMLS Concepts for a Neurological Examination Ontology Offers the Advantage of Improved Manageability as Well as the Opportunity to Curate the Hierarchy and Subsumption Relationships
A Biased Topic Modeling Approach for Case Control Study from Health Related Social Media Postings
abstract: Online social networks are the hubs of social activity in cyberspace, and using them to exchange knowledge, experiences, and opinions is common. In this work, an advanced topic modeling framework is designed to analyse complex longitudinal health information from social media with minimal human annotation, and Adverse Drug Events and Reaction (ADR) information is extracted and automatically processed by using a biased topic modeling method. This framework improves and extends existing topic modelling algorithms that incorporate background knowledge. Using this approach, background knowledge such as ADR terms and other biomedical knowledge can be incorporated during the text mining process, with scores which indicate the presence of ADR being generated. A case control study has been performed on a data set of twitter timelines of women that announced their pregnancy, the goals of the study is to compare the ADR risk of medication usage from each medication category during the pregnancy.
In addition, to evaluate the prediction power of this approach, another important aspect of personalized medicine was addressed: the prediction of medication usage through the identification of risk groups. During the prediction process, the health information from Twitter timeline, such as diseases, symptoms, treatments, effects, and etc., is summarized by the topic modelling processes and the summarization results is used for prediction. Dimension reduction and topic similarity measurement are integrated into this framework for timeline classification and prediction. This work could be applied to provide guidelines for FDA drug risk categories. Currently, this process is done based on laboratory results and reported cases.
Finally, a multi-dimensional text data warehouse (MTD) to manage the output from the topic modelling is proposed. Some attempts have been also made to incorporate topic structure (ontology) and the MTD hierarchy. Results demonstrate that proposed methods show promise and this system represents a low-cost approach for drug safety early warning.Dissertation/ThesisDoctoral Dissertation Computer Science 201
Element-centric clustering comparison unifies overlaps and hierarchy
Clustering is one of the most universal approaches for understanding complex
data. A pivotal aspect of clustering analysis is quantitatively comparing
clusterings; clustering comparison is the basis for many tasks such as
clustering evaluation, consensus clustering, and tracking the temporal
evolution of clusters. In particular, the extrinsic evaluation of clustering
methods requires comparing the uncovered clusterings to planted clusterings or
known metadata. Yet, as we demonstrate, existing clustering comparison measures
have critical biases which undermine their usefulness, and no measure
accommodates both overlapping and hierarchical clusterings. Here we unify the
comparison of disjoint, overlapping, and hierarchically structured clusterings
by proposing a new element-centric framework: elements are compared based on
the relationships induced by the cluster structure, as opposed to the
traditional cluster-centric philosophy. We demonstrate that, in contrast to
standard clustering similarity measures, our framework does not suffer from
critical biases and naturally provides unique insights into how the clusterings
differ. We illustrate the strengths of our framework by revealing new insights
into the organization of clusters in two applications: the improved
classification of schizophrenia based on the overlapping and hierarchical
community structure of fMRI brain networks, and the disentanglement of various
social homophily factors in Facebook social networks. The universality of
clustering suggests far-reaching impact of our framework throughout all areas
of science
- …