831 research outputs found

    WARP Business Intelligence System

    Get PDF
    Continuous delivery (CD) facilitates the software releasing process. Because the use of continuous integration and deployment pipelines, allows software to be tested several times before going into production. In Business Intelligence (BI), software releases tend to be manual and deprived of pipelines, versions control might also be deficient because of the project nature, which involves data and it’s impossible to version. How to apply CD concepts to BI to an existing project where legacy code is extended and there is no version control over project objects? Only few organizations have an automated release process for their BI projects. Because due to projects nature it is difficult to implement CD to the full extent. Thus, the problem was tackled in stages, first the implementation of version control, that works for the organization, then the establishment of the necessary environments to proceed with the pipelines and finally the creation of a test pipeline for one of the BI projects, proving the success of this approach. To evaluate the success of this solution the main beneficiaries (stakeholders and engineers) were asked to answer some questionnaires regarding their experience with the data warehouse before and after the use of CD. Because each release is tested before going into production, the use of CD will improve software quality in the long run as well as it allows software to be released more frequently.Continuous Delivery (CD) permite que as releases de software aconteçam em qualquer momento sem problemas associados, utilizando pipelines de integração e de deployment. Desta forma, o software é testado várias vezes antes de ser instalado em produção. Em Business Intelligence (BI), as releases são tendencialmente manuais, sem pipelines e devido à natureza do projecto (dados) o controlo de versões tende a ser inexistente. Como aplicar o conceito de CD num contexto de BI a projetos de grandes dimensões, com legacy code extenso e sem controlo de versões? Apenas algumas organizações têm um processo automático de releases para os seus projectos de BI, porque devido à natureza dos projetos que envolvem dados, é difícil implementar CD. Tendo em conta os estes factores, o problema foi abordado por etapas, em primeiro lugar procedeu-se à implementação de um controlo de versões, que se adapte às necessidades da organização. O passo seguinte foi a criação do ambiente necessário para prosseguir com a instalação de pipelines e para terminar, a terceira etapa, consistiu na criação de uma pipeline de teste para um dos projectos de BI, comprovando assim o sucesso da solução proposta. Para avaliar o sucesso desta solução os principais beneficiários (stakeholders e engenheiros) foram convidados a preencher questionários, que permitem avaliar a sua experiência com o data warehouse antes e depois da utilização da solução proposta neste trabalho. Como cada release é testada antes de ser instalada em produção, garantindo que possíveis erros já foram encontrados previamente, o uso de CD melhorará a qualidade do software a longo prazo e permitirá que as releases ocorram com mais frequência

    Extraction transformation load (ETL) solution for data integration: a case study of rubber import and export information

    Get PDF
    Data integration is important in consolidating all the data in the organization or outside the organization to provide a unified view of the organization's information. Extraction Transformation Load (ETL) solution is the back-end process of data integration which involves collecting data from various data sources, preparing and transforming the data according to business requirements and loading them into a Data Warehouse (DW). This paper explains the integration of the rubber import and export data between Malaysian Rubber Board (MRB) and Royal Malaysian Customs Department (Customs) using the ETL solution. Microsoft SQL Server Integration Services (SSIS) and Microsoft SQL Server Agent Jobs have been used as the ETL tool and ETL scheduling

    Extract, Transform, and Load data from Legacy Systems to Azure Cloud

    Get PDF
    Internship report presented as partial requirement for obtaining the Master’s degree in Information Management, with a specialization in Knowledge Management and Business IntelligenceIn a world with continuously evolving technologies and hardened competitive markets, organisations need to continually be on guard to grasp cutting edge technology and tools that will help them to surpass any competition that arises. Modern data platforms that incorporate cloud technologies, support organisations to strive and get ahead of their competitors by providing solutions that help them capture and optimally use untapped data, and scalable storages to adapt to ever-growing data quantities. Also, adopt data processing and visualisation tools that help to improve the decision-making process. With many cloud providers available in the market, from small players to major technology corporations, this offers much flexibility to organisations to choose the best cloud technology that will align with their use cases and overall products and services strategy. This internship came up at the time when one of Accenture’s significant client in the financial industry decided to migrate from legacy systems to a cloud-based data infrastructure that is Microsoft Azure cloud. During this internship, development of the data lake, which is a core part of the MDP, was done to understand better the type of challenges that can be faced when migrating data from on-premise legacy systems to a cloud-based infrastructure. Also, provided in this work, are the main recommendations and guidelines when it comes to performing a large scale data migration

    Evaluation Real-Time Data Warehousing Challenges From A Theoretical And Practical Perspective

    Get PDF
    The concept of real-time data warehousing has grown in popularity in recent years as organizations demand access to critical pieces of data in real-time to produce analytics and make business decisions to gain competitive advantage. Real-time data warehousing systems differ substantially from traditional data warehousing systems, thus, presenting a unique set of organizational and operational challenges. The basis for the research was to investigate whether adequate information is available regarding the organizational and operational challenges of real-time data warehousing and whether that information is available to the database community. This exploration was done by gathering primary research, conducting a case study research design, and comparing, analyzing, and drawing conclusions from the two different types of research. The information for the primary research was gathered from scholarly, peer reviewed articles, and books on Google Scholar and ACM, computer science and database journals, various textbooks, and the Internet. The case study was conducted on an organization utilizing a real-time data warehousing system

    Adaptive Big Data Pipeline

    Get PDF
    Over the past three decades, data has exponentially evolved from being a simple software by-product to one of the most important companies’ assets used to understand their customers and foresee trends. Deep learning has demonstrated that big volumes of clean data generally provide more flexibility and accuracy when modeling a phenomenon. However, handling ever-increasing data volumes entail new challenges: the lack of expertise to select the appropriate big data tools for the processing pipelines, as well as the speed at which engineers can take such pipelines into production reliably, leveraging the cloud. We introduce a system called Adaptive Big Data Pipelines: a platform to automate data pipelines creation. It provides an interface to capture the data sources, transformations, destinations and execution schedule. The system builds up the cloud infrastructure, schedules and fine-tunes the transformations, and creates the data lineage graph. This system has been tested on data sets of 50 gigabytes, processing them in just a few minutes without user intervention.ITESO, A. C

    A High-Performance Data Accessing and Processing System for Campus Real-time Power Usage

    Get PDF
    With the flourishing of Internet of Things (IoT) technology, ubiquitous power data can be linked to the Internet and be analyzed for real-time monitoring requirements. Numerous power data would be accumulated to even Tera-byte level as the time goes. To approach a real-time power monitoring platform on them, an efficient and novel implementation techniques has been developed and formed to be the kernel material of this thesis. Based on the integration of multiple software subsystems in a layered manner, the proposed power-monitoring platform has been established and is composed of Ubuntu (as operating system), Hadoop (as storage subsystem), Hive (as data warehouse), and the Spark MLlib (as data analytics) from bottom to top. The generic power-data source is provided by the so-called smart meters equipped inside factories located in an enterprise practically. The data collection and storage are handled by the Hadoop subsystem and the data ingestion to Hive data warehouse is conducted by the Spark unit. On the aspect of system verification, under single-record query, these software modules: HiveQL and Impala SQL had been tested in terms of query-response efficiency. And for the performance exploration on the full-table query function. The relevant experiments have been conducted on the same software modules as well. The kernel contributions of this research work can be highlighted by two parts: the details of building an efficient real-time power-monitoring platform, and the relevant query-response efficiency for reference

    A review of software project testing

    Get PDF
    In this article a review of software projects based on a taxonomy project is established, allowing the development team or testing personnel to identify the tests to which the project must be subjected for validation. The taxonomy is focused on identifying software projects according to their technology. To establish the taxonomy, a development method comprised of 5 phases was applied

    Framework for Interoperable and Distributed Extraction-Transformation-Loading (ETL) Based on Service Oriented Architecture

    Get PDF
    Extraction. Transformation and Loading (ETL) are the major functionalities in data warehouse (DW) solutions. Lack of component distribution and interoperability is a gap that leads to many problems in the ETL domain, which is due to tightly-coupled components in the current ETL framework. This research discusses how to distribute the Extraction, Transformation and Loading components so as to achieve distribution and interoperability of these ETL components. In addition, it shows how the ETL framework can be extended. To achieve that, Service Oriented Architecture (SOA) is adopted to address the mentioned missing features of distribution and interoperability by restructuring the current ETL framework. This research contributes towards the field of ETL by adding the distribution and inter- operability concepts to the ETL framework. This Ieads to contributions towards the area of data warehousing and business intelligence, because ETL is a core concept in this area. The Design Science Approach (DSA) and Scrum methodologies were adopted for achieving the research goals. The integration of DSA and Scrum provides the suitable methods for achieving the research objectives. The new ETL framework is realized by developing and testing a prototype that is based on the new ETL framework. This prototype is successfully evaluated using three case studies that are conducted using the data and tools of three different organizations. These organizations use data warehouse solutions for the purpose of generating statistical reports that help their top management to take decisions. Results of the case studies show that distribution and interoperability can be achieved by using the new ETL framework

    Comparative Study Of Implementing The On-Premises and Cloud Business Intelligence On Business Problems In a Multi-National Software Development Company

    Get PDF
    Internship Report presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceNowadays every enterprise wants to be competitive. In the last decade, the data volumes are increased dramatically. As each year data in the market increases, the ability to extract, analyze and manage the data become the backbone condition for the organization to be competitive. In this condition, organizations need to adapt their technologies to the new business reality in order to be competitive and provide new solutions that meet new requests. Business Intelligence by the main definition is the ability to extract analyze and manage the data through which an organization gain a competitive advantage. Before using this approach, it’s important to decide on which computing system it will base on, considering the volume of data, business context of the organization and technologies requirements of the market. In the last 10 years, the popularity of cloud computing increased and divided the computing Systems into On-Premises and cloud. The cloud benefits are based on providing scalability, availability and fewer costs. On another hand, traditional On-Premises provides independence of software configuration, control over data and high security. The final decision as to which computing paradigm to follow in the organization it’s not an easy task as well as depends on the business context of the organization, and the characteristics of the performance of the current On-Premises systems in business processes. In this case, Business Intelligence functions and requires in-depth analysis in order to understand if cloud computing technologies could better perform in those processes than traditional systems. The objective of this internship is to conduct a comparative study between 2 computing systems in Business Intelligence routine functions. The study will compare the On-Premises Business Intelligence Based on Oracle Architecture with Cloud Business Intelligence based on Google Cloud Services. A comparative study will be conducted through participation in activities and projects in the Business Intelligence department, of a company that develops software digital solutions to serve the telecommunications market for 12 months, as an internship student in the 2nd year of a master’s degree in Information Management, with a specialization in Knowledge Management and Business Intelligence at Nova Information Management School (NOVA IMS)