Search CORE

2,384 research outputs found

Using Element Clustering to Increase the Efficiency of XML Schema Matching

Author: Jonker Willem
Keulen Maurice van
Smiljanic Marko
Publication venue
Publication date: 01/01/2006
Field of study

Schema matching attempts to discover semantic mappings between elements of two schemas. Elements are cross compared using various heuristics (e.g., name, data-type, and structure similarity). Seen from a broader perspective, the schema matching problem is a combinatorial problem with an exponential complexity. This makes the naive matching algorithms for large schemas prohibitively inefficient. In this paper we propose a clustering based technique for improving the efficiency of large scale schema matching. The technique inserts clustering as an intermediate step into existing schema matching algorithms. Clustering partitions schemas and reduces the overall matching load, and creates a possibility to trade between the efficiency and effectiveness. The technique can be used in addition to other optimization techniques. In the paper we describe the technique, validate the performance of one implementation of the technique, and open directions for future research

Crossref

University of Twente Research Information

The Family of MapReduce and Large Scale Data Processing Systems

Author: Anna Liu
Ayman G. Fayoumi
King Abdulaziz
See Profile
Sherif Sakr
Sherif Sakr
South Wales
South Wales
Publication venue
Publication date: 12/02/2013
Field of study

In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a simple and powerful programming model that enables easy development of scalable parallel applications to process vast amounts of data on large clusters of commodity machines. It isolates the application from the details of running a distributed program such as issues on data distribution, scheduling and fault tolerance. However, the original implementation of the MapReduce framework had some limitations that have been tackled by many research efforts in several followup works after its introduction. This article provides a comprehensive survey for a family of approaches and mechanisms of large scale data processing mechanisms that have been implemented based on the original idea of the MapReduce framework and are currently gaining a lot of momentum in both research and industrial communities. We also cover a set of introduced systems that have been implemented to provide declarative programming interfaces on top of the MapReduce framework. In addition, we review several large scale data processing systems that resemble some of the ideas of the MapReduce framework for different purposes and application scenarios. Finally, we discuss some of the future research directions for implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author

arXiv.org e-Print Archive

CiteSeerX

AsterixDB: A Scalable, Open Source BDMS

Author: Alsubaiee Sattam
Altowim Yasser
Altwaijry Hotham
Behm Alexander
Borkar Vinayak
Bu Yingyi
Carey Michael
Cetindil Inci
Cheelangi Madhusudan
Faraaz Khurram
Gabrielova Eugenia
Grover Raman
Heilbron Zachary
Kim Young-Seok
Li Chen
Li Guangqiang
Ok Ji Mahn
Onose Nicola
Pirzadeh Pouria
Tsotras Vassilis
Vernica Rares
Wen Jian
Westmann Till
Publication venue
Publication date: 02/07/2014
Field of study

AsterixDB is a new, full-function BDMS (Big Data Management System) with a feature set that distinguishes it from other platforms in today's open source Big Data ecosystem. Its features make it well-suited to applications like web data warehousing, social data storage and analysis, and other use cases related to Big Data. AsterixDB has a flexible NoSQL style data model; a query language that supports a wide range of queries; a scalable runtime; partitioned, LSM-based data storage and indexing (including B+-tree, R-tree, and text indexes); support for external as well as natively stored data; a rich set of built-in types; support for fuzzy, spatial, and temporal types and queries; a built-in notion of data feeds for ingestion of data; and transaction support akin to that of a NoSQL store. Development of AsterixDB began in 2009 and led to a mid-2013 initial open source release. This paper is the first complete description of the resulting open source AsterixDB system. Covered herein are the system's data model, its query language, and its software architecture. Also included are a summary of the current status of the project and a first glimpse into how AsterixDB performs when compared to alternative technologies, including a parallel relational DBMS, a popular NoSQL store, and a popular Hadoop-based SQL data analytics platform, for things that both technologies can do. Also included is a brief description of some initial trials that the system has undergone and the lessons learned (and plans laid) based on those early "customer" engagements

arXiv.org e-Print Archive

CiteSeerX

Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order Terms

Author: Flach PA
Price S
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/09/2008
Field of study

Explore Bristol Research

An efficient and scalable algorithm for clustering XML documents by structure

Author: D.W.-l. Cheung
N. Mamoulis
Siu-Ming Yiu
Wang Lian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Expressiveness improvements of OutSystems DSL query primitives

Author: Simões André Brás
Publication venue: Faculdade de Ciências e Tecnologia
Publication date: 01/01/2013
Field of study

Dissertação para obtenção do Grau de Mestre em Engenharia InformáticaIn the ever more competitive market, companies are forced to reduce their operational costs and innovate. In order to do that, some companies successfully adopted new approaches, some of them using domain specific languages (DSL), building their entire system and all the respective layers in less time and more focused in their business. Frequently, the application business layer interacts with the data layer through SQL queries, in order to obtain or modify data. There are some products in the market that try to make life easier for developers, allowing them to get the data using the features of visual query builders, also available in standard SQL. However, it is not expectable that every possible query can be written through these visual query builders, which leads us to the following questions "What should and what can easily be supported by visual query builders?". These questions are relevant in order to help improving the experience of developers and save them time. This work aims to study and analyse techniques that help detecting patterns in structured data and, afterwards, propose a suitable way to view and manage the visualization of the occurrence of such detected patterns. In order to help identify the most frequent patterns and thus contribute to solve the above questions, with this conjunction of topics we expect to provide a way to improve the experience of understanding a large amount of data in a particular context. Once understood some patterns that could be present in the data and their importance, we are ready to propose a new model in the context of OutSystems Agile PlatformTM, in terms of their visual query builder, aiming to increase its value, improve its expressiveness and offer a powerful visual way to build queries

Repositório da Universidade Nova de Lisboa

XML Matchers: approaches and challenges

Author: Agreste Santa
De Meo Pasquale
Ferrara Emilio
Ursino Domenico
Publication venue: 'Elsevier BV'
Publication date: 10/07/2014
Field of study

Schema Matching, i.e. the process of discovering semantic correspondences between concepts adopted in different data source schemas, has been a key topic in Database and Artificial Intelligence research areas for many years. In the past, it was largely investigated especially for classical database models (e.g., E/R schemas, relational databases, etc.). However, in the latest years, the widespread adoption of XML in the most disparate application fields pushed a growing number of researchers to design XML-specific Schema Matching approaches, called XML Matchers, aiming at finding semantic matchings between concepts defined in DTDs and XSDs. XML Matchers do not just take well-known techniques originally designed for other data models and apply them on DTDs/XSDs, but they exploit specific XML features (e.g., the hierarchical structure of a DTD/XSD) to improve the performance of the Schema Matching process. The design of XML Matchers is currently a well-established research area. The main goal of this paper is to provide a detailed description and classification of XML Matchers. We first describe to what extent the specificities of DTDs/XSDs impact on the Schema Matching task. Then we introduce a template, called XML Matcher Template, that describes the main components of an XML Matcher, their role and behavior. We illustrate how each of these components has been implemented in some popular XML Matchers. We consider our XML Matcher Template as the baseline for objectively comparing approaches that, at first glance, might appear as unrelated. The introduction of this template can be useful in the design of future XML Matchers. Finally, we analyze commercial tools implementing XML Matchers and introduce two challenging issues strictly related to this topic, namely XML source clustering and uncertainty management in XML Matchers.Comment: 34 pages, 8 tables, 7 figure

arXiv.org e-Print Archive

IRIS UniversitÃ Politecnica delle Marche