Search CORE

15 research outputs found

Efficient similarity-based operations for data integration

Author: Schallehn Eike
Publication venue: Universitätsbibliothek
Publication date: 01/01/2004
Field of study

Similarity-based operations, similarity join, similarity grouping, data integrationMagdeburg, Univ., Fak. für Informatik, Diss., 2004von Eike Schalleh

Digital University Library Saxony-Anhalt

Annotation-based feature extraction from sets of SBML models

Author: Dagmar Waltemath
Markus Wolfien
Olaf Wolkenhauer
Rebekka Alm
Ron Henkel
Publication venue: Springer Nature
Publication date: 01/01/2015
Field of study

Background: Model repositories such as BioModels Database provide computational models of biological systems for the scientific community. These models contain rich semantic annotations that link model entities to concepts in well-established bio-ontologies such as Gene Ontology. Consequently, thematically similar models are likely to share similar annotations. Based on this assumption, we argue that semantic annotations are a suitable tool to characterize sets of models. These characteristics improve model classification, allow to identify additional features for model retrieval tasks, and enable the comparison of sets of models. Results: In this paper we discuss four methods for annotation-based feature extraction from model sets. We tested all methods on sets of models in SBML format which were composed from BioModels Database. To characterize each of these sets, we analyzed and extracted concepts from three frequently used ontologies, namely Gene Ontology, ChEBI and SBO. We find that three out of the methods are suitable to determine characteristic features for arbitrary sets of models: The selected features vary depending on the underlying model set, and they are also specific to the chosen model set. We show that the identified features map on concepts that are higher up in the hierarchy of the ontologies than the concepts used for model annotations. Our analysis also reveals that the information content of concepts in ontologies and their usage for model annotation do not correlate. Conclusions: Annotation-based feature extraction enables the comparison of model sets, as opposed to existing methods for model-to-keyword comparison, or model-to-model comparison

Springer - Publisher Connector

Fraunhofer-ePrints

Stellenbosch University SUNScholar Repository

Effective Early Termination Techniques for Text Similarity Join Operator

Author: A. Moffat
D.K. Harman
E. Schallehn
G. Salton
G. Özsoyoğlu
S.A. Özel
W. Cohen
W. Meng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

Crossref

Effective early termination techniques for text similarity join operator

Author: Güngör T.
Gürgen Fikret
Ulusoy Özgür
Yolum Pınar
Özturan Can
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

Bu çalışma, 26-28 Ekim 2005 tarihleri arasında İstanbul[Türkiye]'da düzenlenen 20. International Symposium on Computer and Information Sciences'da bildiri olarak sunulmuştur.Text similarity join operator joins two relations if their join attributes are textually similar to each other, and it has a variety of application domains including integration and querying of data from heterogeneous resources; cleansing of data; and mining of data. Although, the text similarity join operator is widely used, its processing is expensive due to the huge number of similarity computations performed. In this paper, we incorporate some short cut evaluation techniques from the Information Retrieval domain, namely Harman, quit, continue, and maximal similarity filter heuristics, into the previously proposed text similarity join algorithms to reduce the amount of similarity computations needed during the join operation. We experimentally evaluate the original and the heuristic based similarity join algorithms using real data obtained from the DBLP Bibliography database, and observe performance improvements with continue and maximal similarity filter heuristics.Inst Elec & Elect Engineers, Turkey SectBoğaziçi Üniversites

Açık Erişim@BUU

Effective early termination techniques for text similarity join operator

Author: Ulusoy Ö.
Özalp S.A.
Publication venue
Publication date: 01/01/2005
Field of study

Text similarity join operator joins two relations if their join attributes are textually similar to each other, and it has a variety of application domains including integration and querying of data from heterogeneous resources; cleansing of data; and mining of data. Although, the text similarity join operator is widely used, its processing is expensive due to the huge number of similarity computations performed. In this paper, we incorporate some short cut evaluation techniques from the Information Retrieval domain, namely Harman, quit, continue, and maximal similarity filter heuristics, into the previously proposed text similarity join algorithms to reduce the amount of similarity computations needed during the join operation. We experimentally evaluate the original and the heuristic based similarity join algorithms using real data obtained from the DBLP Bibliography database, and observe performance improvements with continue and maximal similarity filter heuristics. © Springer-Verlag Berlin Heidelberg 2005

Bilkent University Institutional Repository

Clustering-Based Pre-Processing Approaches To Improve Similarity Join Techniques

Author: Tan Yufen
Publication venue: DigitalCommons@WayneState
Publication date: 01/01/2010
Field of study

Research on similarity join techniques is becoming one of the growing practical areas for study, especially with the increasing E-availability of vast amounts of digital data from more and more source systems. This research is focused on pre-processing clustering-based techniques to improve existing similarity join approaches. Identifying and extracting the same real-world entities from different data sources is still a big challenge and a significant task in the digital information era. Dissimilar extracts may indeed represent the same real-world entity because of inconsistent values and naming conventions, incorrect or missing data values, or incomplete information. Therefore discovering efficient and accurate approaches to determine the similarity of data objects or values is of theoretical as well as practical significance. Semantic problems are raised even on the concept of similarity regarding its usage and foundation. Existing similarity join approaches often have a very specific view of similarity measures and pre-defined predicates that represent a narrow focus on the context of similarity for a given scenario. The predicates have been assumed to be a group of clustering [MSW 72] related attributes on the join. To identify those entities for data integration purposes requires a broader view of similarity; for instance a number of generic similarity measures are useful in a given data integration systems. This study focused on string similarity join, namely based on the Levenshtein or edit distance and Q-gram. Proposed effective and efficient pre-processing clustering-based techniques were the focus of this study to identify clustering related predicates based on either attribute value or data value that improve existing similarity join techniques in enterprise data integration scenarios

Digital Commons@Wayne State University

Integrating distributed data streams

Author: Gray Alasdair John Graham
Publication venue: Mathematical and Computer Sciences
Publication date: 01/01/2007
Field of study

Abstract unavailable please refer to PD

CiteSeerX

ROS: The Research Output Service. Heriot-Watt University Edinburgh

OpenGrey Repository