Search CORE

8,633 research outputs found

A bi-objective cost model for optimizing database queries in a multi-cloud environment

Author: Gounaris Anastasios
Karampaglis Zisis
Manolopoulos Yannis
Naskos Athanasios
Publication venue: Qassim University. Published by Elsevier B.V.
Publication date: 31/12/2014
Field of study

AbstractCost models are broadly used in query processing to drive the query optimization process, accurately predict the query execution time, schedule database query tasks, apply admission control and derive resource requirements to name a few applications. The main role of cost models is to estimate the time needed to run the query on a specific machine. In a multi-cloud environment, cost models should be easily calibrated for a wide range of different physical machines, and time estimates need to be complemented with monetary cost information, since both the economic cost and the performance are of primary importance. This work aims to serve as the first proposal for a bi-objective query cost model suitable for queries executed over resources provided by potentially multiple cloud providers. We leverage existing calibrating modeling techniques for time estimates and we couple such estimates with monetary cost information covering the main charging options for using cloud resources. Moreover, we explain how the cost model can become part of an optimizer. Our approach is applicable to more generic data flow graphs, the execution plans of which do not necessarily comprise relational operators. Finally, we give a concrete example about the usage of our proposal and we validate its accuracy through real case studies

Elsevier - Publisher Connector

Privacy-Preserving Secret Shared Computations using MapReduce

Author: Dolev Shlomi
Gupta Peeyush
Li Yin
Mehrotra Sharad
Sharma Shantanu
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

Data outsourcing allows data owners to keep their data at \emph{untrusted} clouds that do not ensure the privacy of data and/or computations. One useful framework for fault-tolerant data processing in a distributed fashion is MapReduce, which was developed for \emph{trusted} private clouds. This paper presents algorithms for data outsourcing based on Shamir's secret-sharing scheme and for executing privacy-preserving SQL queries such as count, selection including range selection, projection, and join while using MapReduce as an underlying programming model. Our proposed algorithms prevent an adversary from knowing the database or the query while also preventing output-size and access-pattern attacks. Interestingly, our algorithms do not involve the database owner, which only creates and distributes secret-shares once, in answering any query, and hence, the database owner also cannot learn the query. Logically and experimentally, we evaluate the efficiency of the algorithms on the following parameters: (\textit{i}) the number of communication rounds (between a user and a server), (\textit{ii}) the total amount of bit flow (between a user and a server), and (\textit{iii}) the computational load at the user and the server.\BComment: IEEE Transactions on Dependable and Secure Computing, Accepted 01 Aug. 201

arXiv.org e-Print Archive

eScholarship - University of California

Comparative Study Of Implementing The On-Premises and Cloud Business Intelligence On Business Problems In a Multi-National Software Development Company

Author: Vassilenko Alexandre Nicolau Jdanok
Publication venue
Publication date: 27/01/2023
Field of study

Internship Report presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceNowadays every enterprise wants to be competitive. In the last decade, the data volumes are increased dramatically. As each year data in the market increases, the ability to extract, analyze and manage the data become the backbone condition for the organization to be competitive. In this condition, organizations need to adapt their technologies to the new business reality in order to be competitive and provide new solutions that meet new requests. Business Intelligence by the main definition is the ability to extract analyze and manage the data through which an organization gain a competitive advantage. Before using this approach, it’s important to decide on which computing system it will base on, considering the volume of data, business context of the organization and technologies requirements of the market. In the last 10 years, the popularity of cloud computing increased and divided the computing Systems into On-Premises and cloud. The cloud benefits are based on providing scalability, availability and fewer costs. On another hand, traditional On-Premises provides independence of software configuration, control over data and high security. The final decision as to which computing paradigm to follow in the organization it’s not an easy task as well as depends on the business context of the organization, and the characteristics of the performance of the current On-Premises systems in business processes. In this case, Business Intelligence functions and requires in-depth analysis in order to understand if cloud computing technologies could better perform in those processes than traditional systems. The objective of this internship is to conduct a comparative study between 2 computing systems in Business Intelligence routine functions. The study will compare the On-Premises Business Intelligence Based on Oracle Architecture with Cloud Business Intelligence based on Google Cloud Services. A comparative study will be conducted through participation in activities and projects in the Business Intelligence department, of a company that develops software digital solutions to serve the telecommunications market for 12 months, as an internship student in the 2nd year of a master’s degree in Information Management, with a specialization in Knowledge Management and Business Intelligence at Nova Information Management School (NOVA IMS)

Repositório da Universidade Nova de Lisboa

Storage Solutions for Big Data Systems: A Qualitative Study and Comparison

Author: Alam Mansaf
Ali Syed Arshad
Khan Samiya
Liu Xiufeng
Publication venue
Publication date: 01/01/2019
Field of study

Big data systems development is full of challenges in view of the variety of application areas and domains that this technology promises to serve. Typically, fundamental design decisions involved in big data systems design include choosing appropriate storage and computing infrastructures. In this age of heterogeneous systems that integrate different technologies for optimized solution to a specific real world problem, big data system are not an exception to any such rule. As far as the storage aspect of any big data system is concerned, the primary facet in this regard is a storage infrastructure and NoSQL seems to be the right technology that fulfills its requirements. However, every big data application has variable data characteristics and thus, the corresponding data fits into a different data model. This paper presents feature and use case analysis and comparison of the four main data models namely document oriented, key value, graph and wide column. Moreover, a feature analysis of 80 NoSQL solutions has been provided, elaborating on the criteria and points that a developer must consider while making a possible choice. Typically, big data storage needs to communicate with the execution engine and other processing and visualization technologies to create a comprehensive solution. This brings forth second facet of big data storage, big data file formats, into picture. The second half of the research paper compares the advantages, shortcomings and possible use cases of available big data file formats for Hadoop, which is the foundation for most big data computing technologies. Decentralized storage and blockchain are seen as the next generation of big data storage and its challenges and future prospects have also been discussed

arXiv.org e-Print Archive

Online Research Database In Technology

Query-driven learning for predictive analytics of data subspace cardinality

Author: Anagnostopoulos Christos
Triantafillou Peter
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 29/06/2017
Field of study

Fundamental to many predictive analytics tasks is the ability to estimate the cardinality (number of data items) of multi-dimensional data subspaces, defined by query selections over datasets. This is crucial for data analysts dealing with, e.g., interactive data subspace explorations, data subspace visualizations, and in query processing optimization. However, in many modern data systems, predictive analytics may be (i) too costly money-wise, e.g., in clouds, (ii) unreliable, e.g., in modern Big Data query engines, where accurate statistics are difficult to obtain/maintain, or (iii) infeasible, e.g., for privacy issues. We contribute a novel, query-driven, function estimation model of analyst-defined data subspace cardinality. The proposed estimation model is highly accurate in terms of prediction and accommodating the well-known selection queries: multi-dimensional range and distance-nearest neighbors (radius) queries. Our function estimation model: (i) quantizes the vectorial query space, by learning the analysts’ access patterns over a data space, (ii) associates query vectors with their corresponding cardinalities of the analyst-defined data subspaces, (iii) abstracts and employs query vectorial similarity to predict the cardinality of an unseen/unexplored data subspace, and (iv) identifies and adapts to possible changes of the query subspaces based on the theory of optimal stopping. The proposed model is decentralized, facilitating the scaling-out of such predictive analytics queries. The research significance of the model lies in that (i) it is an attractive solution when data-driven statistical techniques are undesirable or infeasible, (ii) it offers a scale-out, decentralized training solution, (iii) it is applicable to different selection query types, and (iv) it offers a performance that is superior to that of data-driven approaches

Warwick Research Archives Portal Repository

Enlighten