268 research outputs found

    Review of modern business intelligence and analytics in 2015: How to tame the big data in practice?: Case study - What kind of modern business intelligence and analytics strategy to choose?

    Get PDF
    The objective of this study was to find out the state of art architecture of modern business intelligence and analytics. Furthermore the status quo of business intelligence and analytics' architecture in an anonymous case company was examined. Based on these findings a future strategy was designed to guide the case company towards a better business intelligence and analytics environment. This objective was selected due to an increasing interest on big data topic. Thus the understanding on how to move on from traditional business intelligence practices to modern ones and what are the available options were seen as the key questions to be solved in order to gain competitive advantage for any company in near future. The study was conducted as a qualitative single-case study. The case study included two parts: an analytics maturity assessment, and an analysis of business intelligence and analytics' architecture. The survey included over 30 questions and was sent to 25 analysts and other individuals who were using a significant time to deal with or read financial reports like for example managers. The architecture analysis was conducted by gathering relevant information on high level. Furthermore a big picture was drawn to illustrate the architecture. The two parts combined were used to construct the actual current maturity level of business intelligence and analytics in the case company. Three theoretical frameworks were used: first framework regarding the architecture, second framework regarding the maturity level and third framework regarding reporting tools. The first higher level framework consisted of the modern data warehouse architecture and Hadoop solution from D'Antoni and Lopez (2014). The second framework included the analytics maturity assessment from the data warehouse institute (2015). Finally the third framework analyzed the advanced analytics tools from Sallam et al. (2015). The findings of this study suggest that modern business intelligence and analytics solution can include both data warehouse and Hadoop components. These two components are not mutually exclusive. Instead Hadoop is actually augmenting data warehouse to another level. This thesis shows how companies can evaluate their current maturity level and design a future strategy by benchmarking their own actions against the state of art solution. To keep up with the fast pace of development, research must be continuous. Therefore in future for example a study regarding a detailed path of implementing Hadoop would be a great addition to this field

    A Business Intelligence Solution, based on a Big Data Architecture, for processing and analyzing the World Bank data

    Get PDF
    The rapid growth in data volume and complexity has needed the adoption of advanced technologies to extract valuable insights for decision-making. This project aims to address this need by developing a comprehensive framework that combines Big Data processing, analytics, and visualization techniques to enable effective analysis of World Bank data. The problem addressed in this study is the need for a scalable and efficient Business Intelligence solution that can handle the vast amounts of data generated by the World Bank. Therefore, a Big Data architecture is implemented on a real use case for the International Bank of Reconstruction and Development. The findings of this project demonstrate the effectiveness of the proposed solution. Through the integration of Apache Spark and Apache Hive, data is processed using Extract, Transform and Load techniques, allowing for efficient data preparation. The use of Apache Kylin enables the construction of a multidimensional model, facilitating fast and interactive queries on the data. Moreover, data visualization techniques are employed to create intuitive and informative visual representations of the analysed data. The key conclusions drawn from this project highlight the advantages of a Big Data-driven Business Intelligence solution in processing and analysing World Bank data. The implemented framework showcases improved scalability, performance, and flexibility compared to traditional approaches. In conclusion, this bachelor thesis presents a Business Intelligence solution based on a Big Data architecture for processing and analysing the World Bank data. The project findings emphasize the importance of scalable and efficient data processing techniques, multidimensional modelling, and data visualization for deriving valuable insights. The application of these techniques contributes to the field by demonstrating the potential of Big Data Business Intelligence solutions in addressing the challenges associated with large-scale data analysis

    Enterprise Data Mining & Machine Learning Framework on Cloud Computing for Investment Platforms

    Get PDF
    Machine Learning and Data Mining are two key components in decision making systems which can provide valuable in-sights quickly into huge data set. Turning raw data into meaningful information and converting it into actionable tasks makes organizations profitable and sustain immense competition. In the past decade we saw an increase in Data Mining algorithms and tools for financial market analysis, consumer products, manufacturing, insurance industry, social networks, scientific discoveries and warehousing. With vast amount of data available for analysis, the traditional tools and techniques are outdated for data analysis and decision support. Organizations are investing considerable amount of resources in the area of Data Mining Frameworks in order to emerge as market leaders. Machine Learning is a natural evolution of Data Mining. The existing Machine Learning techniques rely heavily on the underlying Data Mining techniques in which the Patterns Recognition is an essential component. Building an efficient Data Mining Framework is expensive and usually culminates in multi-year project for the organizations. The organization pay a heavy price for any delay or inefficient Data Mining foundation. In this research, we propose to build a cost effective and efficient Data Mining (DM) and Machine Learning (ML) Framework on cloud computing environment to solve the inherent limitations in the existing design methodologies. The elasticity of the cloud architecture solves the hardware constraint on businesses. Our research is focused on refining and enhancing the current Data Mining frameworks to build an enterprise data mining and machine learning framework. Our initial studies and techniques produced very promising results by reducing the existing build time considerably. Our technique of dividing the DM and ML Frameworks into several individual components (5 sub components) which can be reused at several phases of the final enterprise build is efficient and saves operational costs to the organization. Effective Aggregation using selective cuboids and parallel computations using Azure Cloud Services are few of many proposed techniques in our research. Our research produced a nimble, scalable portable architecture for enterprise wide implementation of DM and ML frameworks

    CERN openlab Whitepaper on Future IT Challenges in Scientific Research

    Get PDF
    This whitepaper describes the major IT challenges in scientific research at CERN and several other European and international research laboratories and projects. Each challenge is exemplified through a set of concrete use cases drawn from the requirements of large-scale scientific programs. The paper is based on contributions from many researchers and IT experts of the participating laboratories and also input from the existing CERN openlab industrial sponsors. The views expressed in this document are those of the individual contributors and do not necessarily reflect the view of their organisations and/or affiliates

    LEAN DATA ENGINEERING. COMBINING STATE OF THE ART PRINCIPLES TO PROCESS DATA EFFICIENTLYS

    Get PDF
    The present work was developed during an internship, under Erasmus+ Traineeship program, in Fieldwork Robotics, a Cambridge based company that develops robots to operate in agricultural fields. They collect data from commercial greenhouses with sensors and real sense cameras, as well as with gripper cameras placed in the robotic arms. This data is recorded mainly in bag files, consisting of unstructured data, such as images and semi-structured data, such as metadata associated with both the conditions where the images were taken and information about the robot itself. Data was uploaded, extracted, cleaned and labelled manually before being used to train Artificial Intelligence (AI) algorithms to identify raspberries during the harvesting process. The amount of available data quickly escalates with every trip to the fields, which creates an ever-growing need for an automated process. This problem was addressed via the creation of a data engineering platform encom- passing a data lake, data warehouse and its needed processing capabilities. This platform was created following a series of principles entitled Lean Data Engineering Principles (LDEP), and the systems that follows them are called Lean Data Engineering Systems (LDES). These principles urge to start with the end in mind: process incoming batch or real-time data with no resource wasting, limiting the costs to the absolutely necessary for the job completion, in other words to be as lean as possible. The LDEP principles are a combination of state-of-the-art ideas stemming from several fields, such as data engineering, software engineering and DevOps, leveraging cloud technologies at its core. The proposed custom-made solution enabled the company to scale its data operations, being able to label images almost ten times faster while reducing over 99.9% of its associated costs in comparison to the previous process. In addition, the data lifecycle time has been reduced from weeks to hours while maintaining coherent data quality results, being able, for instance, to correctly identify 94% of the labels in comparison to a human counterpart.Este trabalho foi desenvolvido durante um estágio no âmbito do programa Erasmus+ Traineeship, na Fieldwork Robotics, uma empresa sediada em Cambridge que desenvolve robôs agrícolas. Estes robôs recolhem dados no terreno com sensores e câmeras real- sense, localizados na estrutura de alumínio e nos pulsos dos braços robóticos. Os dados recolhidos são ficheiros contendo dados não estruturados, tais como imagens, e dados semi- -estruturados, associados às condições em que as imagens foram recolhidas. Originalmente, o processo de tratamento dos dados recolhidos (upload, extração, limpeza e etiquetagem) era feito de forma manual, sendo depois utilizados para treinar algoritmos de Inteligência Artificial (IA) para identificar framboesas durante o processo de colheita. Como a quantidade de dados aumentava substancialmente com cada ida ao terreno, verificou-se uma necessidade crescente de um processo automatizado. Este problema foi endereçado com a criação de uma plataforma de engenharia de dados, composta por um data lake, uma data warehouse e o respetivo processamento, para movimentar os dados nas diferentes etapas do processo. Esta plataforma foi criada seguindo uma série de princípios intitulados Lean Data Engineering Principles (LDEP), sendo os sistemas que os seguem intitulados de Lean Data Engineering Systems (LDES). Estes princípios incitam a começar com o fim em mente: processar dados em batch ou em tempo real, sem desperdício de recursos, limitando os custos ao absolutamente necessário para a concluir o trabalho, ou seja, tornando-os o mais lean possível. Os LDEP combinam vertentes do estado da arte em diversas áreas, tais como engenharia de dados, engenharia de software, DevOps, tendo no seu cerne as tecnologias na cloud. O novo processo permitiu à empresa escalar as suas operações de dados, tornando-se capaz de etiquetar imagens quase 10× mais rápido e reduzindo em mais de 99,9% os custos associados, quando comparado com o processo anterior. Adicionalmente, o ciclo de vida dos dados foi reduzido de semanas para horas, mantendo uma qualidade equiparável, ao ser capaz de identificar corretamente 94% das etiquetas em comparação com um homólogo humano
    corecore