Search CORE

266 research outputs found

Big Data and Large-scale Data Analytics: Efficiency of Sustainable Scalability and Security of Centralized Clouds and Edge Deployment Architectures

Author: Awaysheh Feras Mahmoud Naji
Publication venue
Publication date: 01/01/2020
Field of study

One of the significant shifts of the next-generation computing technologies will certainly be in the development of Big Data (BD) deployment architectures. Apache Hadoop, the BD landmark, evolved as a widely deployed BD operating system. Its new features include federation structure and many associated frameworks, which provide Hadoop 3.x with the maturity to serve different markets. This dissertation addresses two leading issues involved in exploiting BD and large-scale data analytics realm using the Hadoop platform. Namely, (i)Scalability that directly affects the system performance and overall throughput using portable Docker containers. (ii) Security that spread the adoption of data protection practices among practitioners using access controls. An Enhanced Mapreduce Environment (EME), OPportunistic and Elastic Resource Allocation (OPERA) scheduler, BD Federation Access Broker (BDFAB), and a Secure Intelligent Transportation System (SITS) of multi-tiers architecture for data streaming to the cloud computing are the main contribution of this thesis study

Repositorio Institucional da Universidade de Santiago de Compostela

Big Data Processing Attribute Based Access Control Security

Author: Tall Anne
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2022
Field of study

The purpose of this research is to analyze the security of next-generation big data processing (BDP) and examine the feasibility of applying advanced security features to meet the needs of modern multi-tenant, multi-level data analysis. The research methodology was to survey of the status of security mechanisms in BDP systems and identify areas that require further improvement. Access control (AC) security services were identified as priority area, specifically Attribute Based Access Control (ABAC). The exemplar BDP system analyzed is the Apache Hadoop ecosystem. We created data generation software, analysis programs, and posted the detailed the experiment configuration on GitHub. Overall, our research indicates that before a BDP system, such as Hadoop, can be used in operational environment significant security configurations are required. We believe that the tools are available to achieve a secure system, with ABAC, using Apache Ranger and Apache Atlas. However, these systems are immature and require verification by an independent third party. We identified the following specific actions for overall improvement: consistent provisioning of security services through a data analyst workstation, a common backplane of security services, and a management console. These areas are partially satisfied in the current Hadoop ecosystem, continued AC improvements through the open source community, and rigorous independent testing should further address remaining security challenges. Robust security will enable further use of distributed, cluster BDP, such as Apache Hadoop and Hadoop-like systems, to meet future government and business requirements

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

MERRA/AS: The MERRA Analytic Services Project Interim Report

Author: Duffy Dan
Grieg Cristina
Luczak Ed
McInerney Mark
Nadeau Denis
Schnase John
Tamkin Glenn
Thompson Hoot
Publication venue
Publication date
Field of study

MERRA AS is a cyberinfrastructure resource that will combine iRODS-based Climate Data Server (CDS) capabilities with Coudera MapReduce to serve MERRA analytic products, store the MERRA reanalysis data collection in an HDFS to enable parallel, high-performance, storage-side data reductions, manage storage-side driver, mapper, reducer code sets and realized objects for users, and provide a library of commonly used spatiotemporal operations that can be composed to enable higher-order analyses

NASA Technical Reports Server

A Business Intelligence Solution, based on a Big Data Architecture, for processing and analyzing the World Bank data

Author: Damus Ros Nicolas
Publication venue
Publication date: 24/07/2023
Field of study

The rapid growth in data volume and complexity has needed the adoption of advanced technologies to extract valuable insights for decision-making. This project aims to address this need by developing a comprehensive framework that combines Big Data processing, analytics, and visualization techniques to enable effective analysis of World Bank data. The problem addressed in this study is the need for a scalable and efficient Business Intelligence solution that can handle the vast amounts of data generated by the World Bank. Therefore, a Big Data architecture is implemented on a real use case for the International Bank of Reconstruction and Development. The findings of this project demonstrate the effectiveness of the proposed solution. Through the integration of Apache Spark and Apache Hive, data is processed using Extract, Transform and Load techniques, allowing for efficient data preparation. The use of Apache Kylin enables the construction of a multidimensional model, facilitating fast and interactive queries on the data. Moreover, data visualization techniques are employed to create intuitive and informative visual representations of the analysed data. The key conclusions drawn from this project highlight the advantages of a Big Data-driven Business Intelligence solution in processing and analysing World Bank data. The implemented framework showcases improved scalability, performance, and flexibility compared to traditional approaches. In conclusion, this bachelor thesis presents a Business Intelligence solution based on a Big Data architecture for processing and analysing the World Bank data. The project findings emphasize the importance of scalable and efficient data processing techniques, multidimensional modelling, and data visualization for deriving valuable insights. The application of these techniques contributes to the field by demonstrating the potential of Big Data Business Intelligence solutions in addressing the challenges associated with large-scale data analysis

Repositorio Institucional de la Universidad de Alicante

Deep Learning para BigData

Author: Correia Filipe José Ribeiro
Publication venue
Publication date: 01/01/2021
Field of study

We live in a world where data is becoming increasingly valuable and increasingly abundant in volume. Every company produces data, be it from sales, sensors, and various other sources. Since the dawn of the smartphone, virtually every person in the world is connected to the internet and contributes to data generation. Social networks are big contributors to this Big Data boom. How do we extract insight from such a rich data environment? Is Deep Learning capable of circumventing Big Data’s challenges? This is what we intend to understand. To reach a conclusion, Social Network data is used as a case study for predicting sentiment changes in the Stock Market. The objective of this dissertation is to develop a computational study and analyse its performance. The outputs will contribute to understand Deep Learning’s usage with Big Data and how it acts in Sentiment analysis.Vivemos num mundo onde dados são cada vez mais valiosos e abundantes. Todas as empresas produzem dados, sejam eles provenientes de valores de vendas, parâmetros de sensores bem como de outras diversas fontes. Desde que os smartphones se tornaram pessoais, o mundo tornou-se mais conectado, já que virtualmente todas as pessoas passaram a ter a internet na ponta dos dedos. Esta explosão tecnológica foi acompanhada por uma explosão de dados. As redes sociais têm um grande contributo para a quantidade de dados produzida. Mas como se analisam estes dados? Será que Deep Learning poderá dar a volta aos desafios que Big Data traz inerentemente? É isso se pretende perceber. Para chegar a uma conclusão, foi utilizado um caso de estudo de redes sociais para previsão de alterações nas ações de mercados financeiros relacionadas com as opiniões dos utilizadores destas. O objetivo desta dissertação é o desenvolvimento de um estudo computacional e a análise da sua performance. Os resultados contribuirão para entender o uso de Deep Learning com Big Data, com especial foco em análise de sentimento. The objective of this dissertation is to develop a computational study and analyse its performance. The outputs will contribute to understand Deep Learning’s usage with Big Data and how it acts in Sentiment analysis

Repositório Científico do Instituto Politécnico do Porto