1,382 research outputs found

    Data Quality in Analytics: Key Problems Arising from the Repurposing of Manufacturing Data

    Get PDF
    This is the author accepted manuscript. It was first presented at the 20th Annual MIT International Conference on Information Quality.Repurposing data is when data is used for a completely different decision/task to what it was originally intended to be used for. This is often the case in data analytics when data, which has been captured by the business as part of its normal operations, is used by data scientists to derive business insight. However, when data is collected for its primary purpose some consideration is given to ensure that the level of data quality is “fit for purpose”. Data repurposing, by definition, is using data for a different purpose, and therefore the original quality levels may not be suitable for the secondary purpose. Using interviews with various manufacturers, this paper describes examples of repurposing in manufacturing, how manufacturing organisations repurpose data, data quality problems that arise specifically from this, and how the problems are currently addressed. From these results we present a framework which manufacturers can use to help identify and mitigate the issues caused when attempting to repurpose data

    Risk Mitigation Strategy for Spend Management in Strategic Procurement Through Automation of Processes

    Get PDF
    The digital revolution is currently shaking every sector of the industry especially the supply chain and procurement industry. Concepts like “IoTâ€, “Industry 4.0†and “Procurement 4.0†are floating all around which is making the industry more competitive. The digital revolution is compelling the businesses to work in collaboration to achieve their targets because it is getting almost impossible for companies to survive independently, while on the other hand these collaboration lead to the complexities of the processes which can result in-transparencies in the process flows and spend management. A proper, clear and transparent spend management is the key to the success of any organization and an uncontrolled, unclear spend management can lead the companies to bankruptcies. The purpose of this research paper is to show how strategic procurement perform their spend management with conventional methods? and how the digitization and automation concepts can be used to improve the whole spend management process and how to mitigate the risks associated with spend management? Emerging literature, case studies, blogs, expert opinions, market knowledge, practical business experiences and citations are used to fulfill this task

    Integrating Data Cleansing With Popular Culture: A Novel SQL Character Data Tutorial

    Get PDF
    Big data and data science have experienced unprecedented growth in recent years.  The big data market continues to exhibit strong momentum as countless businesses transform into data-driven companies. From salary surges to incredible growth in the number of positions, data science is one of the hottest areas in the job market. Significant demand and limited supply of professionals with data competencies has greatly affected the hiring market and this demand/supply imbalance will likely continue in the future. A major key in supplying the market with qualified big data professionals, is bridging the gap from traditional Information Systems (IS) learning outcomes to those outcomes requisite in this emerging field. The purpose of this paper is to share an SQL Character Data Tutorial.  Utilizing the 5E Instructional Model, this tutorial helps students (a) become familiar with SQL code, (b) learn when and how to use SQL string functions, (c) understand and apply the concept of data cleansing, (d) gain problem solving skills in the context of typical string manipulations, and (e) gain an understanding of typical needs related to string queries. The tutorial utilizes common, recognizable quotes from popular culture to engage students in the learning process and enhance understanding. This tutorial should prove helpful to educators who seek to provide a rigorous, practical, and relevant big data experience in their courses

    Exploration of Big Data in Procurement - Benefits and Challenges

    Get PDF
    Emergence of Big Data had positive implications in various industries and businesses. Big Data analytics provides the ability to harness massive amount of data for decision making purposes. One of the important use case of Big Data analytics is in supply chain management. Increased visibility, enhanced bargaining position in negotiations, better risk management and informed decision making are examples of benefits gained from Big Data analytics in supply chain. Although there are advances in analytics application throughout supply chain management, sourcing applications are lagging behind other functions of supply chain. The purpose of this study is to analyse use cases of exploiting Big Data for purchasing and supply purposes, in order to help companies having more visibility over the supply market. Data collection in this study was carried out through the use of semi-structured interviews which then were coded and categorized for comparison. The results pointed out that big data aids in identifying new suppliers. Additionally, having transparency over n-tier suppliers for managing risks were important for companies. Most of the companies are using descriptive analytics. However, they expected to have predictive analytics to become aware of market situation and gain better position in negotiations. Furthermore, this research showed that to prevent supply disruptions, the Big Data analytics should send timely warnings to managers. The main expectations from Big Data analytics are gaining transparency, automation of data collection and analysis, prediction, availability of new data sources, more efficient KPIs and better representation of data. The main hurdle in Big Data initiative is unintegrated and non-homogenous internal data

    Towards information profiling: data lake content metadata management

    Get PDF
    There is currently a burst of Big Data (BD) processed and stored in huge raw data repositories, commonly called Data Lakes (DL). These BD require new techniques of data integration and schema alignment in order to make the data usable by its consumers and to discover the relationships linking their content. This can be provided by metadata services which discover and describe their content. However, there is currently a lack of a systematic approach for such kind of metadata discovery and management. Thus, we propose a framework for the profiling of informational content stored in the DL, which we call information profiling. The profiles are stored as metadata to support data analysis. We formally define a metadata management process which identifies the key activities required to effectively handle this.We demonstrate the alternative techniques and performance of our process using a prototype implementation handling a real-life case-study from the OpenML DL, which showcases the value and feasibility of our approach.Peer ReviewedPostprint (author's final draft

    LEAN DATA ENGINEERING. COMBINING STATE OF THE ART PRINCIPLES TO PROCESS DATA EFFICIENTLYS

    Get PDF
    The present work was developed during an internship, under Erasmus+ Traineeship program, in Fieldwork Robotics, a Cambridge based company that develops robots to operate in agricultural fields. They collect data from commercial greenhouses with sensors and real sense cameras, as well as with gripper cameras placed in the robotic arms. This data is recorded mainly in bag files, consisting of unstructured data, such as images and semi-structured data, such as metadata associated with both the conditions where the images were taken and information about the robot itself. Data was uploaded, extracted, cleaned and labelled manually before being used to train Artificial Intelligence (AI) algorithms to identify raspberries during the harvesting process. The amount of available data quickly escalates with every trip to the fields, which creates an ever-growing need for an automated process. This problem was addressed via the creation of a data engineering platform encom- passing a data lake, data warehouse and its needed processing capabilities. This platform was created following a series of principles entitled Lean Data Engineering Principles (LDEP), and the systems that follows them are called Lean Data Engineering Systems (LDES). These principles urge to start with the end in mind: process incoming batch or real-time data with no resource wasting, limiting the costs to the absolutely necessary for the job completion, in other words to be as lean as possible. The LDEP principles are a combination of state-of-the-art ideas stemming from several fields, such as data engineering, software engineering and DevOps, leveraging cloud technologies at its core. The proposed custom-made solution enabled the company to scale its data operations, being able to label images almost ten times faster while reducing over 99.9% of its associated costs in comparison to the previous process. In addition, the data lifecycle time has been reduced from weeks to hours while maintaining coherent data quality results, being able, for instance, to correctly identify 94% of the labels in comparison to a human counterpart.Este trabalho foi desenvolvido durante um estágio no âmbito do programa Erasmus+ Traineeship, na Fieldwork Robotics, uma empresa sediada em Cambridge que desenvolve robôs agrícolas. Estes robôs recolhem dados no terreno com sensores e câmeras real- sense, localizados na estrutura de alumínio e nos pulsos dos braços robóticos. Os dados recolhidos são ficheiros contendo dados não estruturados, tais como imagens, e dados semi- -estruturados, associados às condições em que as imagens foram recolhidas. Originalmente, o processo de tratamento dos dados recolhidos (upload, extração, limpeza e etiquetagem) era feito de forma manual, sendo depois utilizados para treinar algoritmos de Inteligência Artificial (IA) para identificar framboesas durante o processo de colheita. Como a quantidade de dados aumentava substancialmente com cada ida ao terreno, verificou-se uma necessidade crescente de um processo automatizado. Este problema foi endereçado com a criação de uma plataforma de engenharia de dados, composta por um data lake, uma data warehouse e o respetivo processamento, para movimentar os dados nas diferentes etapas do processo. Esta plataforma foi criada seguindo uma série de princípios intitulados Lean Data Engineering Principles (LDEP), sendo os sistemas que os seguem intitulados de Lean Data Engineering Systems (LDES). Estes princípios incitam a começar com o fim em mente: processar dados em batch ou em tempo real, sem desperdício de recursos, limitando os custos ao absolutamente necessário para a concluir o trabalho, ou seja, tornando-os o mais lean possível. Os LDEP combinam vertentes do estado da arte em diversas áreas, tais como engenharia de dados, engenharia de software, DevOps, tendo no seu cerne as tecnologias na cloud. O novo processo permitiu à empresa escalar as suas operações de dados, tornando-se capaz de etiquetar imagens quase 10× mais rápido e reduzindo em mais de 99,9% os custos associados, quando comparado com o processo anterior. Adicionalmente, o ciclo de vida dos dados foi reduzido de semanas para horas, mantendo uma qualidade equiparável, ao ser capaz de identificar corretamente 94% das etiquetas em comparação com um homólogo humano

    Legal Analytics, Social Science, and Legal Fees: Reimagining Legal Spend Decisions in an Evolving Industry

    Get PDF
    This article discusses how legal analytics can help law firms and clients understand, monitor, and improve the components that comprise bills for legal fees and expenses
    • …
    corecore