103,613 research outputs found
Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources
Apache Calcite is a foundational software framework that provides query
processing, optimization, and query language support to many popular
open-source data processing systems such as Apache Hive, Apache Storm, Apache
Flink, Druid, and MapD. Calcite's architecture consists of a modular and
extensible query optimizer with hundreds of built-in optimization rules, a
query processor capable of processing a variety of query languages, an adapter
architecture designed for extensibility, and support for heterogeneous data
models and stores (relational, semi-structured, streaming, and geospatial).
This flexible, embeddable, and extensible architecture is what makes Calcite an
attractive choice for adoption in big-data frameworks. It is an active project
that continues to introduce support for the new types of data sources, query
languages, and approaches to query processing and optimization.Comment: SIGMOD'1
A management architecture for active networks
In this paper we present an architecture for network and applications management, which is based on the Active Networks paradigm and shows the advantages of network programmability. The stimulus to develop this architecture arises from an actual need to manage a cluster of active nodes, where it is often required to redeploy network assets and modify nodes connectivity. In our architecture, a remote front-end of the managing entity allows the operator to design new network topologies, to check the status of the nodes and to configure them. Moreover, the proposed framework allows to explore an active network, to monitor the active applications, to query each node and to install programmable traps. In order to take advantage of the Active Networks technology, we introduce active SNMP-like MIBs and agents, which are dynamic and programmable. The programmable management agents make tracing distributed applications a feasible task. We propose a general framework that can inter-operate with any active execution environment. In this framework, both the manager and the monitor front-ends communicate with an active node (the Active Network Access Point) through the XML language. A gateway service performs the translation of the queries from XML to an active packet language and injects the code in the network. We demonstrate the implementation of an active network gateway for PLAN (Packet Language for Active Networks) in a forty active nodes testbed. Finally, we discuss an application of the active management architecture to detect the causes of network failures by tracing network events in time
A management architecture for active networks
In this paper we present an architecture for network and applications management, which is based on the Active Networks paradigm and shows the advantages of network programmability. The stimulus to develop this architecture arises from an actual need to manage a cluster of active nodes, where it is often required to redeploy network assets and modify nodes connectivity. In our architecture, a remote front-end of the managing entity allows the operator to design new network topologies, to check the status of the nodes and to configure them. Moreover, the proposed framework allows to explore an active network, to monitor the active applications, to query each node and to install programmable traps. In order to take advantage of the Active Networks technology, we introduce active SNMP-like MIBs and agents, which are dynamic and programmable. The programmable management agents make tracing distributed applications a feasible task. We propose a general framework that can inter-operate with any active execution environment. In this framework, both the manager and the monitor front-ends communicate with an active node (the Active Network Access Point) through the XML language. A gateway service performs the translation of the queries from XML to an active packet language and injects the code in the network. We demonstrate the implementation of an active network gateway for PLAN (Packet Language for Active Networks) in a forty active nodes testbed. Finally, we discuss an application of the active management architecture to detect the causes of network failures by tracing network events in time
Cross-lingual document retrieval categorisation and navigation based on distributed services
The widespread use of the Internet across countries has increased the need for access to document collections
that are often written in languages different from a userâs native language. In this paper we describe Clarity, a
Cross Language Information Retrieval (CLIR) system for English, Finnish, Swedish, Latvian and Lithuanian.
Clarity is a fully-fledged retrieval system that supports the user during the whole process of query formulation,
text retrieval and document browsing. We address four of the major aspects of Clarity: (i) the user-driven
methodology that formed the basis for the iterative design cycle and framework in the project, (ii) the system
architecture that was developed to support the interaction and coordination of Clarityâs distributed services, (iii)
the data resources and methods for query translation, and (iv) the support for Baltic languages. Clarity is an
example of a distributed CLIR system built with minimal translation resources and, to our knowledge, the only
such system that currently supports Baltic languages
Recommended from our members
Query Health: standards-based, cross-platform population health surveillance
Objective: Understanding population-level health trends is essential to effectively monitor and improve public health. The Office of the National Coordinator for Health Information Technology (ONC) Query Health initiative is a collaboration to develop a national architecture for distributed, population-level health queries across diverse clinical systems with disparate data models. Here we review Query Health activities, including a standards-based methodology, an open-source reference implementation, and three pilot projects. Materials and methods Query Health defined a standards-based approach for distributed population health queries, using an ontology based on the Quality Data Model and Consolidated Clinical Document Architecture, Health Quality Measures Format (HQMF) as the query language, the Query Envelope as the secure transport layer, and the Quality Reporting Document Architecture as the result language. Results: We implemented this approach using Informatics for Integrating Biology and the Bedside (i2b2) and hQuery for data analytics and PopMedNet for access control, secure query distribution, and response. We deployed the reference implementation at three pilot sites: two public health departments (New York City and Massachusetts) and one pilot designed to support Food and Drug Administration post-market safety surveillance activities. The pilots were successful, although improved cross-platform data normalization is needed. Discussions This initiative resulted in a standards-based methodology for population health queries, a reference implementation, and revision of the HQMF standard. It also informed future directions regarding interoperability and data access for ONC's Data Access Framework initiative. Conclusions: Query Health was a test of the learning health system that supplied a functional methodology and reference implementation for distributed population health queries that has been validated at three sites
Use of OCL in a Model Assessment Framework: An experience report
In a model assessment framework different quality aspects can be examined. In our approach we consider consistency and perceived semantic quality. The former can be supported by constraints and the later by queries. Consistency can be checked automatically, while for the semantic quality the human judgement is necessary. For constraint and query definitions the utilisation of a query language was necessary. We present a case study that evaluates the expressiveness of the Object Constraint Language (OCL) in the context of our approach. We focus on typical queries required by our methodology and we showed how they can be formulated in OCL. To take full advantage of the languageĂąs expressiveness, we utilise new features of OCL 2.0. Based on our examination we decided to use OCL in our analysis tool and we designed an architecture based on Eclipse Modeling Framework Technology
Data integration through service-based mediation for web-enabled information systems
The Web and its underlying platform technologies have often been used to integrate existing software and information systems. Traditional techniques for data representation and transformations between documents are not sufficient to support a flexible and maintainable data integration solution that meets the requirements of modern complex Web-enabled software and information systems. The difficulty
arises from the high degree of complexity of data structures, for example in business and technology applications, and from the constant change of data and its
representation. In the Web context, where the Web platform is used to integrate different organisations or software systems, additionally the problem of heterogeneity
arises. We introduce a specific data integration solution for Web applications such as Web-enabled information systems. Our contribution is an integration technology
framework for Web-enabled information systems comprising, firstly, a data integration technique based on the declarative specification of transformation rules and the construction of connectors that handle the integration and, secondly, a mediator architecture based on information services and the constructed connectors to handle the integration process
- âŠ