684 research outputs found

    A Comprehensive and Modularized Platform for Time Series Forecast and Analytics

    Get PDF
    Users that work with time series data typically disaggregate time series problems into various isolated tasks and use specific libraries, packages, tools, and services that deal with each individual task. However, the tools used are often fragmented. Analysts have to load different packages for common tasks such as data preprocessing, clustering, feature extraction, forecasting, hierarchical reconciliation, evaluation, and visualization. This disclosure describes a reliable, scalable infrastructure to meet various needs of time series practitioners without adding engineering overload. The infrastructure is modularized and the modules are connected in a flow type declarative language which makes the infrastructure extensible and future proof. Practitioners can use the entire infrastructure or only certain modules, while performing other operations using first or third party libraries or pipelines

    A Service-Oriented Approach for Network-Centric Data Integration and Its Application to Maritime Surveillance

    Get PDF
    Maritime-surveillance operators still demand for an integrated maritime picture better supporting international coordination for their operations, as looked for in the European area. In this area, many data-integration efforts have been interpreted in the past as the problem of designing, building and maintaining huge centralized repositories. Current research activities are instead leveraging service-oriented principles to achieve more flexible and network-centric solutions to systems and data integration. In this direction, this article reports on the design of a SOA platform, the Service and Application Integration (SAI) system, targeting novel approaches for legacy data and systems integration in the maritime surveillance domain. We have developed a proof-of-concept of the main system capabilities to assess feasibility of our approach and to evaluate how the SAI middleware architecture can fit application requirements for dynamic data search, aggregation and delivery in the distributed maritime domain

    Advanced grouping and aggregation for data integration

    Get PDF

    Genomic data integration and user-defined sample-set extraction for population variant analysis

    Get PDF
    Population variant analysis is of great importance for gathering insights into the links between human genotype and phenotype. The 1000 Genomes Project established a valuable reference for human genetic variation; however, the integrative use of the corresponding data with other datasets within existing repositories and pipelines is not fully supported. Particularly, there is a pressing need for flexible and fast selection of population partitions based on their variant and metadata-related characteristics

    Dynamic Integration of Evolving Distributed Databases using Services

    Get PDF
    This thesis investigates the integration of many separate existing heterogeneous and distributed databases which, due to organizational changes, must be merged and appear as one database. A solution to some database evolution problems is presented. It presents an Evolution Adaptive Service-Oriented Data Integration Architecture (EA-SODIA) to dynamically integrate heterogeneous and distributed source databases, aiming to minimize the cost of the maintenance caused by database evolution. An algorithm, named Relational Schema Mapping by Views (RSMV), is designed to integrate source databases that are exposed as services into a pre-designed global schema that is in a data integrator service. Instead of producing hard-coded programs, views are built using relational algebra operations to eliminate the heterogeneities among the source databases. More importantly, the definitions of those views are represented and stored in the meta-database with some constraints to test their validity. Consequently, the method, called Evolution Detection, is then able to identify in the meta-database the views affected by evolutions and then modify them automatically. An evaluation is presented using case study. Firstly, it is shown that most types of heterogeneity defined in this thesis can be eliminated by RSMV, except semantic conflict. Secondly, it presents that few manual modification on the system is required as long as the evolutions follow the rules. For only three types of database evolutions, human intervention is required and some existing views are discarded. Thirdly, the computational cost of the automatic modification shows a slow linear growth in the number of source database. Other characteristics addressed include EA-SODIA’ scalability, domain independence, autonomy of source databases, and potential of involving other data sources (e.g.XML). Finally, the descriptive comparison with other data integration approaches is presented. It shows that although other approaches may provide better performance of query processing in some circumstances, the service-oriented architecture provide better autonomy, flexibility and capability of evolution

    GEM: requirement-driven generation of ETL and multidimensional conceptual designs

    Get PDF
    Technical ReportAt the early stages of a data warehouse design project, the main objective is to collect the business requirements and needs, and translate them into an appropriate conceptual, multidimensional design. Typically, this task is performed manually, through a series of interviews involving two different parties: the business analysts and technical designers. Producing an appropriate conceptual design is an errorprone task that undergoes several rounds of reconciliation and redesigning, until the business needs are satisfied. It is of great importance for the business of an enterprise to facilitate and automate such a process. The goal of our research is to provide designers with a semi-automatic means for producing conceptual multidimensional designs and also, conceptual representation of the extract-transform-load (ETL)processes that orchestrate the data flow from the operational sources to the data warehouse constructs. In particular, we describe a method that combines information about the data sources along with the business requirements, for validating and completing –if necessary– these requirements, producing a multidimensional design, and identifying the ETL operations needed. We present our method in terms of the TPC-DS benchmark and show its applicability and usefulness.Preprin

    Synthesizing System Integration Requirements Model Fragments

    Get PDF
    Systems integration is an enduring issue in organizations. Many organizations have often been faced with the predicament of managing large and complex IT infrastructures accumulated over the years. Before proposing suitable integration architecture and selecting appropriate implementation solutions, a holistic and clear understanding of the enterprise-wide integration requirements among various internal and external systems is needed. This paper builds on prior literature on conceptual modelling of integration requirements to present an algorithm that synthesizes model fragments, i.e., piecemeal sections of the integration requirements. The details of the algorithm, for synthesizing two or more model fragments into a single integration requirements model, are detailed in this paper. An empirical assessment of the algorithm\u27s generated integration solution is made by comparing it against that performed manually
    • …
    corecore