8,693 research outputs found

    AIOps for a Cloud Object Storage Service

    Full text link
    With the growing reliance on the ubiquitous availability of IT systems and services, these systems become more global, scaled, and complex to operate. To maintain business viability, IT service providers must put in place reliable and cost efficient operations support. Artificial Intelligence for IT Operations (AIOps) is a promising technology for alleviating operational complexity of IT systems and services. AIOps platforms utilize big data, machine learning and other advanced analytics technologies to enhance IT operations with proactive actionable dynamic insight. In this paper we share our experience applying the AIOps approach to a production cloud object storage service to get actionable insights into system's behavior and health. We describe a real-life production cloud scale service and its operational data, present the AIOps platform we have created, and show how it has helped us resolving operational pain points.Comment: 5 page

    Incremental Predictive Process Monitoring: How to Deal with the Variability of Real Environments

    Full text link
    A characteristic of existing predictive process monitoring techniques is to first construct a predictive model based on past process executions, and then use it to predict the future of new ongoing cases, without the possibility of updating it with new cases when they complete their execution. This can make predictive process monitoring too rigid to deal with the variability of processes working in real environments that continuously evolve and/or exhibit new variant behaviors over time. As a solution to this problem, we propose the use of algorithms that allow the incremental construction of the predictive model. These incremental learning algorithms update the model whenever new cases become available so that the predictive model evolves over time to fit the current circumstances. The algorithms have been implemented using different case encoding strategies and evaluated on a number of real and synthetic datasets. The results provide a first evidence of the potential of incremental learning strategies for predicting process monitoring in real environments, and of the impact of different case encoding strategies in this setting

    RTLS-enabled clinical workflow predictive analysis

    Get PDF

    Sequence Mining and Pattern Analysis in Drilling Reports with Deep Natural Language Processing

    Full text link
    Drilling activities in the oil and gas industry have been reported over decades for thousands of wells on a daily basis, yet the analysis of this text at large-scale for information retrieval, sequence mining, and pattern analysis is very challenging. Drilling reports contain interpretations written by drillers from noting measurements in downhole sensors and surface equipment, and can be used for operation optimization and accident mitigation. In this initial work, a methodology is proposed for automatic classification of sentences written in drilling reports into three relevant labels (EVENT, SYMPTOM and ACTION) for hundreds of wells in an actual field. Some of the main challenges in the text corpus were overcome, which include the high frequency of technical symbols, mistyping/abbreviation of technical terms, and the presence of incomplete sentences in the drilling reports. We obtain state-of-the-art classification accuracy within this technical language and illustrate advanced queries enabled by the tool.Comment: 7 pages, 14 figures, technical repor

    Data Science for Internal Audit in Banking: Refinement of an Internal Audit Alarmistic System with Machine Learning

    Get PDF
    Internship Report presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceThis report presents the work developed during the academic internship required for obtaining the Master’s Degree in Data Science and Advanced Analytics. The internship took place in the area of Data & Analytics of the Department for Internal Audit of Caixa Geral de Depósitos (Portugal), from the 14th of September 2020 to the 13th of June 2021. The internship’s goal was the introduction of machine learning to the Department of Internal Audit. In particular, the implementation of three machine learning pipelines to aid in audit activities of the institution, which systematically analyze operations that stand out from the implemented alarm system. The alarm system triggers alerts when an event disobeys a predefined methodology. Each triggering event is reviewed and processed individually by the auditors, either by being classified as a confirmed error or as a false positive. Confirmed errors frequently lead to recommendations to rectify the operations, while false positives are closed without a recommendation. The alerts’ triggers are defined by sets of arguably general and manually implemented rules, resulting in high trigger frequencies and low precisions. Trigger frequency, precision, and cost of miss rate differ for each alert. Based on the alerts’ trigger history data, three types of alerts were selected for improvements. The deployment of machine learning pipelines with classification models optimized the triggers' specificity while maintaining high sensitivity, which reduced the number of daily events that have to be reviewed by the auditors. This optimization maximizes the efficiency and productivity of the general alarm system and decreases the auditors’ workload

    Data engineering and best practices

    Get PDF
    Mestrado Bolonha em Data Analytics for BusinessThis report presents the results of a study on the current state of data engineering at LGG Advisors company. Analyzing existing data, we identified several key trends and challenges facing data engineers in this field. Our study's key findings include a lack of standardization and best practices for data engineering processes, a growing need for more sophisticated data management and analysis tools and data security, and a lack of trained and experienced data engineers to meet the increasing demand for data-driven solutions. Based on these findings, we recommend several steps that organizations at LGG Advisors company can take to improve their data engineering capabilities, including investing in training and education programs, adopting best practices for data management and analysis, and collaborating with other organizations to share knowledge and resources. Data security is also an essential concern for data engineers, as data breaches can have significant consequences for organizations, including financial losses, reputational damage, and regulatory penalties. In this thesis, we will review and evaluate some of the best software tools for securing data in data engineering environments. We will discuss these tools' key features and capabilities and their strengths and limitations to help data engineers choose the best software for protecting their data. Some of the tools we will consider include encryption software, access control systems, network security tools, and data backup and recovery solutions. We will also discuss best practices for implementing and managing these tools to ensure data security in data engineering environments. We engineer data using intuition and rules of thumb. Many of these rules are folklore. Given the rapid technological changes, these rules must be constantly reevaluated.info:eu-repo/semantics/publishedVersio

    From dirty data to multiple versions of truth: How different choices in data cleaning lead to different learning analytics outcomes

    Get PDF
    Learning analytics is the analysis of student data with the purpose of improving learning. However, the process of data cleaning remains underexposed within learning analytics literature. In this paper, we elaborate on choices made in the cleaning process of student data and their consequences. We illustrate this with a case where data was gathered during six courses taught via Moodle. In this data set, only 21% of the logged activities were linked to a specific course. We illustrate possible choices in dealing with missing data by applying the cleaning process twelve times with different choices on copies of the raw data. Consequently, the analysis of the data shows varying outcomes. As the purpose of learning analytics is to intervene based on analysis and visualizations, it is of utmost importance to be aware of choices made during data cleaning. This paper\u27s main goal is to make stakeholders of (learning) analytics activities aware of the fact that choices are made during data cleaning have consequences on the outcomes. We believe that there should be transparency to the users of these outcomes and give them a detailed report of the decisions made
    • …
    corecore