8 research outputs found

    PERFORMANCE ANALYSIS OF TWO BIG DATA TECHNOLOGIES ON A CLOUD DISTRIBUTED ARCHITECTURE. RESULTS FOR NON-AGGREGATE QUERIES ON MEDIUM-SIZED DATA

    Get PDF
    Big Data systems manage and process huge volumes of data constantly generated by various technologies in a myriad of formats. Big Data advocates (and preachers) have claimed that, relative to classical, relational/SQL Data Base Management Systems, Big Data technologies such as NoSQL, Hadoop and in-memory data stores perform better. This paper compares data processing performance of two systems belonging to SQL (PostgreSQL/Postgres XL) and Big Data (Hadoop/Hive) camps on a distributed five-node cluster deployed in cloud. Unlike benchmarks in use (YCSB, TPC), a series of R modules were devised for generating random non-aggregate queries on different subschema (with increasing data size) of TPC-H database. Overall performance of the two systems was compared. Subsequently a number of models were developed for relating performance on the system and also on various query parameters such as the number of attributes in SELECT and WHERE clause, number of joins, number of processing rows etc.JEL Codes - M1

    Data Processing Languages for Business Intelligence. SQL vs. R

    No full text
    As data centric approach, Business Intelligence (BI) deals with the storage, integration, processing, exploration and analysis of information gathered from multiple sources in various formats and volumes. BI systems are generally synonymous to costly, complex platforms that require vast organizational resources. But there is also an-other face of BI, that of a pool of data sources, applications, services developed at different times using different technologies. This is “democratic” BI or, in some cases, “fragmented”, “patched” (or “chaotic”) BI. Fragmentation creates not only integration problems, but also supports BI agility as new modules can be quickly developed. Among various languages and tools that cover large extents of BI activities, SQL and R are instrumental for both BI platform developers and BI users. SQL and R address both monolithic and democratic BI. This paper compares essential data processing features of two languages, identifying similarities and differences among them and also their strengths and limits

    Performance Analysis of Two Big Data Technologies on a Cloud Distributed Architecture. Results for Non-Aggregate Queries on Medium-Sized Data

    No full text
    Big Data systems manage and process huge volumes of data constantly generated by various technologies in a myriad of formats. Big Data advocates (and preachers) have claimed that, relative to classical, relational/SQL Data Base Management Systems, Big Data technologies such as NoSQL, Hadoop and in-memory data stores perform better. This paper compares data processing performance of two systems belonging to SQL (PostgreSQL/Postgres XL) and Big Data (Hadoop/Hive) camps on a distributed five-node cluster deployed in cloud. Unlike benchmarks in use (YCSB, TPC), a series of R modules were devised for generating random non-aggregate queries on different subschema (with increasing data size) of TPC-H database. Overall performance of the two systems was compared. Subsequently a number of models were developed for relating performance on the system and also on various query parameters such as the number of attributes in SELECT and WHERE clause, number of joins, number of processing rows etc

    On the Performance of Three In-Memory Data Systems for On Line Analytical Processing

    No full text
    In-memory database systems are among the most recent and most promising Big Data technologies, being developed and released either as brand new distributed systems or as extensions of old monolith (centralized) database systems. As name suggests, in-memory systems cache all the data into special memory structures. Many are part of the NewSQL strand and target to bridge the gap between OLTP and OLAP into so-called Hybrid Transactional Analytical Systems (HTAP). This paper aims to test the performance of using such type of systems for TPCH analytical workloads. Performance is analyzed in terms of data loading, memory footprint and execution time of the TPCH query set for three in-memory data systems: Oracle, SQL Server and MemSQL. Tests are subsequently deployed on classical on-disk architectures and results compared to in-memory solutions. As in-memory is an enterprise edition feature, associated costs are also considered

    A Few Insights Into Romanian Information Systems Analysts and Designers Toolbox

    No full text
    Information Systems (IS) analysts and designers have been key members in software development teams. From waterfall to Rational Unified Process, from UML to agile development, IS modelers have faced many trends and buzzwords. Even if the topic of models and modeling tools in software development is important, there are no many detailed studies to identify for what the developers, customers and managers decide to use the modeling and specific tools. Despite the popularity of the subject, studies showing what tools the IS modelers prefer are scarce, and quasi-non-existent, when talking about Romanian market. As Romania is an important IT outsourcing market, this paper investigated what methods and tools Romanian IS analysts and designers apply. In this context, the starting question of our research focuses on the preference of the developers to choose between agile or non-agile methods in IT projects. As a result, the research questions targeted the main drivers in choosing specific methods and tools for IT projects deployed in Romanian companies. Also, one of the main objectives of this paper was to approach the relationship between the methodologies (agile or non-agile), diagrams and other tools (we refer in our study to the CASE features) with other variables/metrics of the system/software development project. The observational study was conducted based on a survey filled by IS modelers in Romanian IT companies. The data collected were processed and analyzed using Exploratory Data Analysis. The platform for data visualization and analysis was R
    corecore