Search CORE

39 research outputs found

Data queries over heterogeneous sources

Author: Grade Nuno Daniel Gouveia de Sousa
Publication venue: Faculdade de Ciências e Tecnologia
Publication date: 01/01/2013
Field of study

Dissertação para obtenção do Grau de Mestre em Engenharia InformáticaEnterprises typically have their data spread over many software systems, such as custom made applications, CRM systems like SalesForce, CMS systems, or ERP systems like SAP. In these setting, it is often desired to integrate information from many data sources to accomplish some business goal in an application. Data may be stored locally or in the cloud in a wide variety of ways, demanding for explicit transformation processes to be defined, reason why it is hard for developers to integrate it. Moreover, the amount of external data can be large and the difference of efficiency between a smart and a naive way of retrieving and filtering data from different locations can be great. Hence, it is clear that developers would benefit greatly from language abstractions to help them build queries over heterogeneous data sources and from an optimization process that avoids large and unnecessary data transfers during the execution of queries. This project was developed at OutSystems and aims at extending a real product, which makes it even more challenging. We followed a generic approach that can be implemented in any framework, not only focused on the product of OutSystems

Repositório da Universidade Nova de Lisboa

Personalizing education with algorithmic course selection

Author: Morrow Tyler
Publication venue: Scholars\u27 Mine
Publication date: 01/01/2017
Field of study

The work presented in this thesis utilizes context-aware recommendation to facilitate personalized education and assist students in selecting courses (or in non-traditional curricula, topics or modules) that meet curricular requirements, leverage their skills and background, and are relevant to their interests. The original research contribution of this thesis is an algorithm that can generate a schedule of courses with consideration of a student\u27s profile, minimization of cost, and complete adherence to institution requirements. The research problem at hand - a constrained optimization problem with potentially conflicting objectives - is solved by first identifying a minimal sets of courses a student can take to graduate and then intelligently placing the selected courses into available semesters. The distinction between the proposed approach and related studies is in its simultaneous achievement of the following: guaranteed fulfillment of curricular requirements; applicability to both traditional and non-traditional curricula; and flexibility in nomenclature - semantics are extracted from syntax to allow the identification of relevant content, despite differences in course or topic titles from one institution to the next. The course selection algorithm presented is developed for the Pervasive Cyberinfrastructure for Personalized eLearning and Instructional Support (PERCEPOLIS), which can assist or supplement the degree planning actions of an academic advisor, with the assurance that recommended selections are always valid. With this algorithm, PERCEPOLIS can recommend the entire trajectory that a student could take to graduation, as opposed to just the next semester, and it does so with consideration of course or topic availability --Abstract, page iii

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

Towards a next generation of open scientific data repositories and services

Author: Christophides V.
Houstis C.
Kapidakis S.
Lalis S.
Nikolaou C. (Charalampos)
Simon E. (Eric)
Tomasic A.
Publication venue: Stichting Mathematisch Centrum
Publication date: 01/06/1999
Field of study

CWI's Institutional Repository

Query optimizers based on machine learning techniques

Author: Souto Rui Pedro Sousa Rodrigues do
Publication venue
Publication date: 27/10/2021
Field of study

Dissertação de mestrado integrado em Engenharia InformáticaQuery optimizers are considered one of the most relevant and sophisticated components in a database management system. However, despite currently producing nearly optimal results, optimizers rely on statistical estimates and heuristics to reduce the search space of alternative execution plans for a single query. As a result, for more complex queries, errors may grow exponentially, often translating into sub-optimal plans resulting in less than ideal performance. Recent advances in machine learning techniques have opened new opportunities for many of the existing problems related to system optimization. This document proposes a solution built on top of PostgreSQL that learns to select the most efficient set of optimizer strategy settings for a particular query. Instead of depending entirely on the optimizer’s estimates to compare different plans under different configurations, it relies on a greedy selection algorithm that supports several types of predictive modeling techniques, from more traditional modeling techniques to a deep learning approach. The system is evaluated experimentally with the standard TPC-H and Join Order ing Benchmark workloads to measure the cost and benefits of adding machine learning capabilities to traditional query optimizers.Os otimizadores de queries são considerados um dos componentes de maior relevância e complexidade num sistema de gestão de bases de dados. No entanto, apesar de atualmente produzirem resultados quase ótimos, os otimizadores dependem do uso de estimativas estatísticas e de heurísticas para reduzir o espaço de procura de planos de execução alternativos para uma determinada query. Como resultado, para queries mais complexas, os erros podem crescer exponencialmente, o que geralmente se traduz em planos sub-ótimos, resultando num desempenho inferior ao ideal. Os recentes avanços nas técnicas de aprendizagem automática abriram novas oportunidades para muitos dos problemas existentes relacionados com otimização de sistemas. Este documento propõe uma solução construída sobre o PostgreSQL que aprende a selecionar o conjunto mais eficiente de configurações do otimizador para uma determinada query. Em vez de depender inteiramente de estimativas do otimizador para comparar planos de configurações diferentes, a solução baseia-se num algoritmo de seleção greedy que suporta vários tipos de técnicas de modelagem preditiva, desde técnicas mais tradicionais a uma abordagem de deep learning. O sistema é avaliado experimentalmente com os workloads TPC-H e Join Ordering Benchmark para medir o custo e os benefícios de adicionar aprendizagem automática a otimizadores de queries tradicionais.This work is financed by National Funds through the Portuguese funding agency, FCT - Fundação para a Ciência e a Tecnologia, within project UIDB/50014/2020

Universidade do Minho: RepositoriUM

Segmenting and labeling query sequences in a multidatabase environment

Author: A.C. Acar
D. Liu
J. Cardiff
L.R. Rabiner
M.-S. Chen
Q. Yao
R. Cooley
R. Kindermann
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

When gathering information from multiple independent data sources, users will generally pose a sequence of queries to each source, combine (union) or cross-reference (join) the results in order to obtain the information they need. Furthermore, when gathering information, there is a fair bit of trial and error involved, where queries are recursively refined according to the results of a previous query in the sequence. From the point of view of an outside observer, the aim of such a sequence of queries may not be immediately obvious. We investigate the problem of isolating and characterizing subsequences representing coherent information retrieval goals out of a sequence of queries sent by a user to different data sources over a period of time. The problem has two sub-problems: segmenting the sequence into subsequences, each representing a discrete goal; and labeling each query in these subsequences according to how they contribute to the goal. We propose a method in which a discriminative probabilistic model (a Conditional Random Field) is trained with pre-labeled sequences. We have tested the accuracy with which such a model can infer labels and segmentation on novel sequences. Results show that the approach is very accurate (> 95% accuracy) when there are no spurious queries in the sequence and moderately accurate even in the presence of substantial noise (∼70% accuracy when 15% of queries in the sequence are spurious). © 2011 Springer-Verlag

Crossref

Bilkent University Institutional Repository

OpenMETU (Middle East Technical University)

An extensible view system for supporting the integration and interoperation of heterogeneous, autonomous, and distributed database management systems

Author: Yen Cheng-Huang
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/1994
Field of study

In this thesis the problem of integrating heterogeneous, autonomous and distributed database management systems (DBMSs) is addressed. To provide a solution, we have developed an approach, a design method, and a view system. Our approach is based on the invention of the abstract view constructs that have uniform and stable representations for supporting semantic relativism and distributed abstraction modeling. Our design method applies object-oriented techniques and software engineering concepts to manage the system complexity. Our view system has been constructed upon established experience with the development of large-scale distributed systems in a distributed object infrastructure provided by the Common Object Request Broker Architecture (CORBA). The scope of our research identifies the goals of Project Zeus in which we have created the Zeus View Mechanism ( ZVM) as the theoretical foundation of our approach. The notion of frameworks has been introduced as part of our design methodology to promote code/design reuse and enhance the portability/extensibility of the architectural design. A multidatabase system, the Zeus Multidatabase System ( ZMS), has provided a test bed for our concept. Project Zeus has exciting prospects. The foundation established in this research has created new directions in multidatabase research and will have a significant impact on future integration and interoperation technologies

Digital Repository @ Iowa State University (ISU)

Inferring user goals from sets of independent queries in a multidatabase environment

Author: A. Motro
A. Motro
A. Motro
A.Y. Halevy
D. Maier
E. Rahm
H. Garcia-Molina
J. Berlin
J.A. Wald
S. Abiteboul
S. Kapoor
Y. Arens
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Loosely coupled data integration among networked sources has become so ubiquitous over the recent years that many of the services and applications used daily are actually not monolithic information systems but rather collections of sources tied together. Instead of building centralized and large data sources (i.e., the Extract- Transform-Load method), many organizations and individuals are opting for a virtual database approach. Especially along with the advent of service-oriented architectures, it has become very easy to leave data in its original source and to instead recruit the service provided by that source as needed. This structure is seen in a variety of scenarios such as hybrid web applications (mash-ups), enterprise information integration models, aggregation services and federated information retrieval systems. Furthermore, individual users are often forced to procure and assemble the information they need from sources distributed across a network. © 2010 Springer-Verlag Berlin Heidelberg

Crossref

Bilkent University Institutional Repository

OpenMETU (Middle East Technical University)

University of Helsinki Department of Computer Science Annual Report 1998

Author
Publication venue: University of Helsinki, Department of Computer Science
Publication date: 01/01/1999
Field of study

Helsingin yliopiston digitaalinen arkisto

Recommended from our members

A Generalization of Band Joins and the Merge-Purge Problem

Author: Hernandez Mauricio A.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/1995
Field of study

The problem of merging multiple databases of information about common entities is frequently encountered in large commercial and government organizations. The problem we study is often called the Merge/Purge problem and is difficult to solve both in scale and accuracy. Large repositories of data always have numerous duplicate information entries about the same entities that are difficult to cull together without an intelligent "equational theory" that identifies equivalent items by a complex, domain dependent matching process. We have developed a system for accomplishing this task for lists of names of potential customers in a direct marketing-type application. Our results for statistically generated data are shown to be accurate and effective when processing the data multiple times using different keys for sorting. The system provides a rule programming module that is easy to program and quite good at finding duplicates especially in an environment with massive amounts of data

Columbia University Academic Commons

A survey of approaches to automatic schema matching

Author: Bernstein Philip A.
Rahm Erhard
Publication venue
Publication date: 19/10/2018
Field of study

Schema matching is a basic problem in many database application domains, such as data integration, E-business, data warehousing, and semantic query processing. In current implementations, schema matching is typically performed manually, which has significant limitations. On the other hand, previous research papers have proposed many techniques to achieve a partial automation of the match operation for specific application domains. We present a taxonomy that covers many of these existing approaches, and we describe the approaches in some detail. In particular, we distinguish between schema-level and instance-level, element-level and structure-level, and language-based and constraint-based matchers. Based on our classification we review some previous match implementations thereby indicating which part of the solution space they cover. We intend our taxonomy and review of past work to be useful when comparing different approaches to schema matching, when developing a new match algorithm, and when implementing a schema matching component

Qucosa - Publikationsserver der Universität Leipzig