3,549 research outputs found
REALIZATION OF A SYSTEM OF EFFICIENT QUERYING OF HIERARCHICAL DATA TRANSFORMED INTO A QUASI-RELATIONAL MODEL
Extensible Markup Language was mainly designed to easily represent documents; however, it has evolved and is now widely used for the representation of arbitrary data structures. There are many Application Programming Interfaces (APIs) to aid software developers with processing XML data. There are also many languages for querying and transforming XML, such as XPath or XQuery, which are widely used in this field. However, because of the great flexibility of XML documents, there are no unified data storing and processing standards, tools, or systems.On the other hand, a relational model is still the most-commonly and widely used standard for storing and querying data. Many Database Management Systems consist of components for loading and transforming hierarchical data. DB2 pureXML or Oracle SQLX are some of the most-recognized examples. Unfortunately, all of them require knowledge of additional tools, standards, and languages dedicated to accessing hierarchical data (for example, XPath or XQuery). Transforming XML documents into a (quasi)relational model and then querying (transformed) documents with SQL or SQL–like queries would significantly simplify the development of data-oriented systems and applications.In this paper, an implementation of the SQLxD query system is proposed. The XML documents are converted into a quasi-relational model (preserving their hierarchical structure), and the SQL–like language based on SQL-92 allows for efficient data querying
Diamond multidimensional model and aggregation operators for document OLAP
International audienceOn-Line Analytical Processing (OLAP) has generated methodologies for the analysis of structured data. However, they are not appropriate to handle document content analysis. Because of the fast growing of this type of data, there is a need for new approaches abling to manage textual content of data. Generally, these data exist in XML format. In this context, we propose an approach of construction of our Diamond multidimensional model, which includes semantic dimension to better consider the semantics of textual data In addition, we propose new aggregation operators for textual data in OLAP environment
Unity in diversity : integrating differing linguistic data in TUSNELDA
This paper describes the creation and preparation of TUSNELDA, a collection of corpus data built for linguistic research. This collection contains a number of linguistically annotated corpora which differ in various aspects such as language, text sorts / data types, encoded annotation levels, and linguistic theories underlying the annotation. The paper focuses on this variation on the one hand and the way how these heterogeneous data are integrated into one resource on the other hand
Hyperset Approach to Semi-structured Databases and the Experimental Implementation of the Query Language Delta
This thesis presents practical suggestions towards the implementation of the
hyperset approach to semi-structured databases and the associated query
language Delta. This work can be characterised as part of a top-down approach
to semi-structured databases, from theory to practice. The main original part
of this work consisted in implementation of the hyperset Delta query language
to semi-structured databases, including worked example queries. In fact, the
goal was to demonstrate the practical details of this approach and language.
The required development of an extended, practical version of the language
based on the existing theoretical version, and the corresponding operational
semantics. Here we present detailed description of the most essential steps of
the implementation. Another crucial problem for this approach was to
demonstrate how to deal in reality with the concept of the equality relation
between (hyper)sets, which is computationally realised by the bisimulation
relation. In fact, this expensive procedure, especially in the case of
distributed semi-structured data, required some additional theoretical
considerations and practical suggestions for efficient implementation. To this
end the 'local/global' strategy for computing the bisimulation relation over
distributed semi-structured data was developed and its efficiency was
experimentally confirmed.Comment: Technical Report (PhD thesis), University of Liverpool, Englan
LinkedScales : bases de dados em multiescala
Orientador: André SantanchèTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: As ciências biológicas e médicas precisam cada vez mais de abordagens unificadas para a análise de dados, permitindo a exploração da rede de relacionamentos e interações entre elementos. No entanto, dados essenciais estão frequentemente espalhados por um conjunto cada vez maior de fontes com múltiplos níveis de heterogeneidade entre si, tornando a integração cada vez mais complexa. Abordagens de integração existentes geralmente adotam estratégias especializadas e custosas, exigindo a produção de soluções monolíticas para lidar com formatos e esquemas específicos. Para resolver questões de complexidade, essas abordagens adotam soluções pontuais que combinam ferramentas e algoritmos, exigindo adaptações manuais. Abordagens não sistemáticas dificultam a reutilização de tarefas comuns e resultados intermediários, mesmo que esses possam ser úteis em análises futuras. Além disso, é difícil o rastreamento de transformações e demais informações de proveniência, que costumam ser negligenciadas. Este trabalho propõe LinkedScales, um dataspace baseado em múltiplos níveis, projetado para suportar a construção progressiva de visões unificadas de fontes heterogêneas. LinkedScales sistematiza as múltiplas etapas de integração em escalas, partindo de representações brutas (escalas mais baixas), indo gradualmente para estruturas semelhantes a ontologias (escalas mais altas). LinkedScales define um modelo de dados e um processo de integração sistemático e sob demanda, através de transformações em um banco de dados de grafos. Resultados intermediários são encapsulados em escalas reutilizáveis e transformações entre escalas são rastreadas em um grafo de proveniência ortogonal, que conecta objetos entre escalas. Posteriormente, consultas ao dataspace podem considerar objetos nas escalas e o grafo de proveniência ortogonal. Aplicações práticas de LinkedScales são tratadas através de dois estudos de caso, um no domínio da biologia -- abordando um cenário de análise centrada em organismos -- e outro no domínio médico -- com foco em dados de medicina baseada em evidênciasAbstract: Biological and medical sciences increasingly need a unified, network-driven approach for exploring relationships and interactions among data elements. Nevertheless, essential data is frequently scattered across sources with multiple levels of heterogeneity. Existing data integration approaches usually adopt specialized, heavyweight strategies, requiring a costly upfront effort to produce monolithic solutions for handling specific formats and schemas. Furthermore, such ad-hoc strategies hamper the reuse of intermediary integration tasks and outcomes. This work proposes LinkedScales, a multiscale-based dataspace designed to support the progressive construction of a unified view of heterogeneous sources. It departs from raw representations (lower scales) and goes towards ontology-like structures (higher scales). LinkedScales defines a data model and a systematic, gradual integration process via operations over a graph database. Intermediary outcomes are encapsulated as reusable scales, tracking the provenance of inter-scale operations. Later, queries can combine both scale data and orthogonal provenance information. Practical applications of LinkedScales are discussed through two case studies on the biology domain -- addressing an organism-centric analysis scenario -- and the medical domain -- focusing on evidence-based medicine dataDoutoradoCiência da ComputaçãoDoutor em Ciência da Computação141353/2015-5CAPESCNP
Assessing the Flexibility of a Service Oriented Architecture to that of the Classic Data Warehouse
The flexibility of a service oriented architecture (SOA) is compared to that of the classic data warehouse across three categories: (1) source system access, (2) integration and transformation, and (3) end user access. The findings suggest that an SOA allows better upgrade and migration flexibility if back-end systems expose their source data via adapters. However, the providers of such adapters must deal with the complexity of maintaining consistent interfaces. An SOA also appears to provide more flexibility at the integration tier due to its ability to merge batch with real-time source system data. This has the potential to retain source system data semantics (e.g., code translations and business rules) without having to reproduce such logic in a transformation tier. Additionally, the tight coupling of operational metadata and source system data within XML in an SOA allows more flexibility in downstream analysis and auditing of output . SOA does lag behind the classic data warehouse at the end user level, mainly due to the latter\u27s use of mature SQL and relational database technology. Users of all technical levels can easily work with these technologies in the classic data warehouse environment to query data in a number of ways. The SOA end user likely requires developer support for such activities
- …