2 research outputs found

    A Comparison of Query Execution Speeds for Large Amounts of Data Using Various DBMS Engines Executing on Selected RAM and CPU Configurations

    Get PDF
    In modern economies, most important business decisions are based on detailed analysis of available data. In order to obtain a rapid response from analytical tools, data should be pre-aggregated over dimensions that are of most interest to each business. Sometimes however, important decisions may require analysis of business data over seemingly less important dimensions which have not been pre-aggregated during the ETL process. On these occasions, the ad-hoc "online" aggregation is performed whose execution time is dependent on the overall DBMS performance. This paper describes how the performance of several commercial and non-commercial DBMSs was tested by running queries designed for data analysis using "ad-hoc" aggregations over large volumes of data. Each DBMS was installed on a separate virtual machine and was run on several computers, and two amounts of RAM memory were allocated for each test. Measurements of query execution times were recorded which demonstrated that, as expected, column-oriented databases out-performed classical row-oriented database systems

    Big data aplicado al análisis de opiniones sobre películas

    Get PDF
    The proliferation of social networks, websites and the users who use them makes it so huge amounts of data in the form of messages, videos, images that are transferred through the internet every day. In the first decades in which the use of the internet was extending, this type of data didn’t had any relevance beyond the entertainment of their users, but since the start of the 21st century, multiple investigations have been carried out on this field, motivated by the potential that It’s knowing how to qualify and evaluate the data to extract relevant information from it, in other words, mining and data analysis. This led to an increase in the use of Big Data technologies, which offer multiple programs aimed at handling large amounts of information. Today there are multiple alternatives, among which Hadoop stands out above all in terms of popularity and amount of software associated with its ecosystem. In this project, all the previous concepts will be adapted to the same framework to offer a solution that allows its user to obtain a global vision of the audience's opinion about different films, with the aim of being able to predict their success or to obtain information that would be difficult to infer by normal means. For this, use will be made of: Python scripts, Flume, Hadoop, Hive and Apache Zeppelin.La proliferación de las redes sociales, páginas web y los usuarios que la usan llevan aparejadas cantidades de datos en forma de mensajes, videos, imágenes... que son transferidas a través de internet cada día. En las primeras décadas en las que se extendió el uso de internet, estos datos no han tenido ninguna relevancia más allá del entretenimiento de los usuarios, pero entrados en el siglo XXI se han realizado múltiples investigaciones en este campo, motivados por el potencial que se encuentra en saber calificar y evaluar los datos para extraer información relevante de ellos, en otras palabras, la minería y análisis de datos. Esto desembocó en un aumento del uso de tecnologías Big Data, que ofrecen múltiples programas orientados al manejo de grandes cantidades de información. A día de hoy existen múltiples alternativas, entre las que destaca Hadoop por encima de todas en cuanto a popularidad y cantidad de software que tiene asociado a su ecosistema. En este proyecto se adaptarán todos los conceptos anteriores a un mismo framework, para ofrecer una solución que permita obtener una visión global de la opinión que tiene la audiencia acerca de diferentes películas, con el objetivo de poder prever su éxito y/o obtener información que sería difícil de inferir por medios normales. Para ello se hará uso de: Scripts Python, Flume, Hadoop, Hive y Apache Zeppelin
    corecore