915 research outputs found

    A Survey on Actionable Knowledge

    Full text link
    Actionable Knowledge Discovery (AKD) is a crucial aspect of data mining that is gaining popularity and being applied in a wide range of domains. This is because AKD can extract valuable insights and information, also known as knowledge, from large datasets. The goal of this paper is to examine different research studies that focus on various domains and have different objectives. The paper will review and discuss the methods used in these studies in detail. AKD is a process of identifying and extracting actionable insights from data, which can be used to make informed decisions and improve business outcomes. It is a powerful tool for uncovering patterns and trends in data that can be used for various applications such as customer relationship management, marketing, and fraud detection. The research studies reviewed in this paper will explore different techniques and approaches for AKD in different domains, such as healthcare, finance, and telecommunications. The paper will provide a thorough analysis of the current state of AKD in the field and will review the main methods used by various research studies. Additionally, the paper will evaluate the advantages and disadvantages of each method and will discuss any novel or new solutions presented in the field. Overall, this paper aims to provide a comprehensive overview of the methods and techniques used in AKD and the impact they have on different domains

    On the Design, Implementation and Application of Novel Multi-disciplinary Techniques for explaining Artificial Intelligence Models

    Get PDF
    284 p.Artificial Intelligence is a non-stopping field of research that has experienced some incredible growth lastdecades. Some of the reasons for this apparently exponential growth are the improvements incomputational power, sensing capabilities and data storage which results in a huge increment on dataavailability. However, this growth has been mostly led by a performance-based mindset that has pushedmodels towards a black-box nature. The performance prowess of these methods along with the risingdemand for their implementation has triggered the birth of a new research field. Explainable ArtificialIntelligence. As any new field, XAI falls short in cohesiveness. Added the consequences of dealing withconcepts that are not from natural sciences (explanations) the tumultuous scene is palpable. This thesiscontributes to the field from two different perspectives. A theoretical one and a practical one. The formeris based on a profound literature review that resulted in two main contributions: 1) the proposition of anew definition for Explainable Artificial Intelligence and 2) the creation of a new taxonomy for the field.The latter is composed of two XAI frameworks that accommodate in some of the raging gaps found field,namely: 1) XAI framework for Echo State Networks and 2) XAI framework for the generation ofcounterfactual. The first accounts for the gap concerning Randomized neural networks since they havenever been considered within the field of XAI. Unfortunately, choosing the right parameters to initializethese reservoirs falls a bit on the side of luck and past experience of the scientist and less on that of soundreasoning. The current approach for assessing whether a reservoir is suited for a particular task is toobserve if it yields accurate results, either by handcrafting the values of the reservoir parameters or byautomating their configuration via an external optimizer. All in all, this poses tough questions to addresswhen developing an ESN for a certain application, since knowing whether the created structure is optimalfor the problem at hand is not possible without actually training it. However, some of the main concernsfor not pursuing their application is related to the mistrust generated by their black-box" nature. Thesecond presents a new paradigm to treat counterfactual generation. Among the alternatives to reach auniversal understanding of model explanations, counterfactual examples is arguably the one that bestconforms to human understanding principles when faced with unknown phenomena. Indeed, discerningwhat would happen should the initial conditions differ in a plausible fashion is a mechanism oftenadopted by human when attempting at understanding any unknown. The search for counterfactualsproposed in this thesis is governed by three different objectives. Opposed to the classical approach inwhich counterfactuals are just generated following a minimum distance approach of some type, thisframework allows for an in-depth analysis of a target model by means of counterfactuals responding to:Adversarial Power, Plausibility and Change Intensity

    Leveraging Explanations in Interactive Machine Learning: An Overview

    Get PDF
    Explanations have gained an increasing level of interest in the AI and Machine Learning (ML) communities in order to improve model transparency and allow users to form a mental model of a trained ML model. However, explanations can go beyond this one way communication as a mechanism to elicit user control, because once users understand, they can then provide feedback. The goal of this paper is to present an overview of research where explanations are combined with interactive capabilities as a mean to learn new models from scratch and to edit and debug existing ones. To this end, we draw a conceptual map of the state-of-the-art, grouping relevant approaches based on their intended purpose and on how they structure the interaction, highlighting similarities and differences between them. We also discuss open research issues and outline possible directions forward, with the hope of spurring further research on this blooming research topic

    Ruumiandmete harmoniseerimine ja masinõpe veekvaliteedi modelleerimiseks

    Get PDF
    Väitekirja elektrooniline versioon ei sisalda publikatsioonePõllumajanduslik reostus põhjustab jätkuvalt magevee kvaliteedi üleilmset halvenemist. Tõhusate veemajandamise meetmete väljatöötamisel on oluline osa veekvaliteedi modelleerimisel. Veekvaliteedi laialdaseks modelleerimiseks on aga vajalik hea ruumilise katvusega lähteandmete olemasolu. Töö eesmärk oli parandada ja harmoniseerida veekvaliteedi modelleerimiseks vajalikke andmestikke ning arendada välja masinõppe raamistik, mida saaks kasutada riigiüleseks veekvaliteedi modelleerimiseks. Töö üheks väljundiks on Eesti mullastikuandmebaas EstSoil-EH. EstSoil-EH atribuudid olid sisendiks masinõppe mudelile, mida kasutasin mulla orgaanilise süsiniku sisalduse prognoosimiseks. Selgus, et proovivõtukohtade keskkonnatingimused mõjutasid mudeli prognoosi täpsust. Globaalse veekvaliteedi andmete parandamiseks loodi viie andmestiku põhjal andmebaas Global River Water Quality Archive (GRQA). Mullasüsiniku mudeli loomise käigus õpitu põhjal arendati välja raamistik üle-eestiliseks veekvaliteedi modelleerimiseks. Mudel prognoosis toitainete kontsentratsioone 242 Eesti jõe valglas. Saadud mudelite täpsus on võrreldav Baltimaades varem rakendatud mudelitega. Mudelite täpsust mõjutas valglate suurus, kuna prognoosid olid üldjuhul ebatäpsemad väiksemates valglates. Seejuures piisas rahuldava täpsuse saavutamiseks vähem kui pooltest tunnustest, mis näitab, et tunnuste arvust olulisem on nende kirjeldusvõime. Seega on loodud masinõppe mudelid rakendatavad piirkondades, kus tunnuste tuletamiseks vajalike lähteandmete katvus on piiratud.The state of freshwater quality continues to deteriorate worldwide due to agricultural pollution. In order to combat these issues effectively, water quality modeling could be used to better manage water resources. However, large-scale water quality models depend on input datasets with good spatial coverage. The aim of the thesis was to improve and harmonize datasets for water quality modeling purposes and create a machine learning framework for national-scale modeling. We created EstSoil-EH as a new numerical soil database for Estonia by converting the text-based soil properties in the Estonian Soil Map to machine-readable values. We used it to predict soil organic carbon content using the random forest machine learning method and found that the conditions of sampling locations affected prediction accuracy. We improved the global coverage of water quality data by producing the Global River Water Quality Archive (GRQA), which was compiled from five existing large-scale datasets. The compilation involved harmonizing the corresponding metadata, flagging outliers, calculating time series characteristics and detecting duplicate observations. We developed a framework suitable for national-scale water quality modeling based on lessons learnt from predicting soil carbon content. We used 82 environmental variables, including soil properties from EstSoil-EH as features to predict nutrient concentrations in 242 river catchments. The resulting models achieved accuracy comparable to the ones used previously in the Baltic region. We found that the size of the catchment influenced accuracy, since predictions were less accurate in smaller catchments. The models maintained reasonable accuracy even when the number of features was reduced by half, which shows that the relevance of features is more important than the amount. This flexibility makes our models applicable in areas that are otherwise lacking in the input data needed for extracting features.https://www.ester.ee/record=b552067

    Detecting Pain Points from User-Generated Social Media Posts Using Machine Learning

    Get PDF
    Artificial intelligence, particularly machine learning, carries high potential to automatically detect customers’ pain points, which is a particular concern the customer expresses that the company can address. However, unstructured data scattered across social media make detection a nontrivial task. Thus, to help firms gain deeper insights into customers’ pain points, the authors experiment with and evaluate the performance of various machine learning models to automatically detect pain points and pain point types for enhanced customer insights. The data consist of 4.2 million user-generated tweets targeting 20 global brands from five separate industries. Among the models they train, neural networks show the best performance at overall pain point detection, with an accuracy of 85% (F1 score = .80). The best model for detecting five specific pain points was RoBERTa 100 samples using SYNONYM augmentation. This study adds another foundational building block of machine learning research in marketing academia through the application and comparative evaluation of machine learning models for natural language–based content identification and classification. In addition, the authors suggest that firms use pain point profiling, a technique for applying subclasses to the identified pain point messages to gain a deeper understanding of their customers’ concerns.©2022 SAGE Publications. The article is protected by copyright and reuse is restricted to non-commercial and no derivative uses. Users may also download and save a local copy of an article accessed in an institutional repository for the user's personal reference.fi=vertaisarvioitu|en=peerReviewed

    An Interpretable Machine Learning Model to Explore Relationships between Drought Indices and Ecological Drought Impacts in the Cheyenne River Basin, USA

    Get PDF
    Rangeland ecosystems across the United States have significant biological, economic, and cultural value. However, the increasing frequency and severity of droughts across the country may lead to unforeseen impacts on these ecosystems. To address this challenge, this study aimed to identify relationships between drought indices and vegetation health in the Cheyenne River Basin, USA, using machine learning (ML) and explainable artificial intelligence (XAI) methods. Using Terra Moderate Resolution Imaging Spectroradiometers (MODIS), University of Idaho Gridded Surface Meteorological Dataset (gridMET), and Daymet data, the study employed XGBoost Regressor and Extra Trees Regressor models in unison with SHapley Additive exPlanations (SHAP) to evaluate predictive performance and the connections between drought indices, environmental variables, and the Normalized Difference Vegetation Index (NDVI). Tests of model performance demonstrated that the XGBoost model performed moderately well at predicting NDVI and was therefore useful for further XAI analysis with SHAP. SHAP explainer results showed that the Palmer Drought Severity Index (PDSI), the 90-day Standardized Precipitation Index (SPI), and snow water equivalent (SWE), were the most important predictors of NDVI values and are therefore closely associated with vegetation health in the study area. The findings of this study first demonstrate the feasibility and usefulness of applying XAI, an underutilized method in the drought space, to study ecological drought indicators. Secondly, results provide an understanding of which commonly used drought indices correlate with effects on vegetation health in the study area, as well as the specific directionality of these relationships. These results can be used to inform drought research and monitoring practices and anticipate ecological drought impacts in the Cheyenne River Basin
    corecore