63 research outputs found

    Association Rule Mining Meets Regression Analysis: An Automated Approach to Unveil Systematic Biases in Decision-Making Processes

    Get PDF
    Decisional processes are at the basis of most businesses in several application domains. However, they are often not fully transparent and can be affected by human or algorithmic biases that may lead to systematically incorrect or unfair outcomes. In this work, we propose an approach for unveiling biases in decisional processes, which leverages association rule mining for systematic hypothesis generation and regression analysis for model selection and recommendation extraction. In particular, we use rule mining to elicit candidate hypotheses of bias from the observational data of the process. From these hypotheses, we build regression models to determine the impact of variables on the process outcome. We show how the coefficient of the (selected) model can be used to extract recommendation, upon which the decision maker can operate. We evaluated our approach using both synthetic and real-life datasets in the context of discrimination discovery. The results show that our approach provides more reliable evidence compared to the one obtained using rule mining alone, and how the obtained recommendations can be used to guide analysts in the investigation of biases affecting the decisional process at hand.</p

    Encoding High-Level Control-Flow Construct Information for Process Outcome Prediction

    Get PDF
    Outcome-oriented predictive process monitoring aims at classifying a running process execution according to a given set of categorical outcomes, leveraging data on past process executions. Most previous studies employ Recurrent Neural Networks to encode the sequence of events, without taking the structure of the process into account. However, process executions typically involve complex control-flow constructs, like parallelism and loops. Different executions of these constructs can be recorded as different event sequences in the event log. This makes it challenging for a recurrent classifier to detect potential relations between a high-level control-flow construct and the prediction target. This is especially true in the presence of high variability in process executions and lack of data. In this paper, we propose a novel approach which encodes the control-flow construct each event belongs to. First, we exploit Local Process Model mining techniques to extract frequently occurring control-flow patterns from the event log. Then, we employ different encoding techniques to enrich an on-going process execution with information related to the extracted control-flow patterns. We tested the proposed method on nine real-life event logs. The obtained results show consistent improvements in the prediction performance

    Isolation of Methicillin-Resistant Coagulase-Negative Staphylococcus (MRCoNS) from a fecal-contaminated stream in the Shenandoah Valley of Virginia

    Get PDF
    Staphylococcus is comprised of 41 known species, of which 18 can colonize humans. Despite the prevalence of infectious Staphylococcus within hospital settings and agriculture, there are few reports of Staphylococcus in natural bodies of water. A recent study by the US Food and Drug Administration found substantial contamination of poultry and other meats with Staphylococcus. We hypothesized that intensive farming of poultry adjacent to streams would result in contaminated runoff, resulting in at least transient occurrence of Staphylococcus spp. in stream waters and sediments. In this study, we sought to determine whether Staphylococcus occurs and persists within Muddy Creek, a stream located in Hinton, Virginia that originates at the Appalachian Mountains of Virginia and runs through various agricultural fields and adjacent to a poultry processing plant in the central Shenandoah Valley. Five different Staphylococcus spp. were detected in water and sediment from Muddy Creek. Mannitol Salt Agar (MSA) was used to isolate eleven Staphylococcus from both water and sediment. These isolates were Gram-positive, catalase-positive, and oxidase-negative cocci that were capable of fermenting mannitol. In addition, a method for screening putative staphylococci species from stream water and sediment was developed. Ten out of the eleven tested isolates were oxacillin resistant (now used to identify phenotypic methicillin-resistance) using a Kirby Bauer disc diffusion test. Furthermore, the isolates were susceptible to trimethoprim/sulfamethoxazole, tetracycline, and gentamicin while two of the isolates were resistant to erythromycin. Additionally, the BOX-PCR repetitive sequence fingerprinting method verified the presence of nine different strains among the isolates. Sequencing of the 16S rRNA gene identified five of the isolates as Staphylococcus equorum. The Biolog identification protocol further identified the remaining isolates as Staphylococcus xylosus, Staphylococcus lentus, Staphylococcus succinus, and Staphylococcus sciuri. Finally, polymerase chain reaction amplification (PCR) confirmed that ten of the eleven isolates harbored the mecA gene known to confer methicillin-resistance. Overall, the occurrence of coagulase-negative staphylococci (MRCoNS) in stream water and sediment represents a potential environmental and human health concern

    Towards Multi-perspective conformance checking with fuzzy sets

    Full text link
    Conformance checking techniques are widely adopted to pinpoint possible discrepancies between process models and the execution of the process in reality. However, state of the art approaches adopt a crisp evaluation of deviations, with the result that small violations are considered at the same level of significant ones. This affects the quality of the provided diagnostics, especially when there exists some tolerance with respect to reasonably small violations, and hampers the flexibility of the process. In this work, we propose a novel approach which allows to represent actors' tolerance with respect to violations and to account for severity of deviations when assessing executions compliance. We argue that besides improving the quality of the provided diagnostics, allowing some tolerance in deviations assessment also enhances the flexibility of conformance checking techniques and, indirectly, paves the way for improving the resilience of the overall process management system.Comment: 15 pages, 5 figure

    Towards Multi-perspective Conformance Checking with Fuzzy Sets

    Get PDF
    Nowadays organizations often need to employ data-driven techniques to audit their business processes and ensure they comply with laws and internal/external regulations. Failing in complying with the expected process behavior can indeed pave the way to inefficiencies or, worse, to frauds or abuses. An increasingly popular approach to automatically assess the compliance of the executions of organization processes is represented by alignment-based conformance checking. These techniques are able to compare real process executions with models representing the expected behaviors, providing diagnostics able to pinpoint possible discrepancies. However, the diagnostics generated by state of the art techniques still suffer from some limitations. They perform a crisp evaluation of process compliance, marking process behavior either as compliant or deviant, without taking into account the severity of the identified deviation. This hampers the accuracy of the obtained diagnostics and can lead to misleading results, especially in contexts where there is some tolerance with respect to violations of the process guidelines. In the present work, we discuss the impact and the drawbacks of a crisp deviation assessment approach. Then, we propose a novel conformance checking approach aimed at representing actors’ tolerance with respect to process deviations, taking it into account when assessing the severity of the deviations. As a proof of concept, we performed a set of synthetic experiments to assess the approach. The obtained results point out the potential of the usage of a more flexible evaluation of process deviations, and its impact on the quality and the interpretation of the obtained diagnostics

    Perbandingan Algoritma K-Nearest Neighbor dan Support Vector Machine Untuk Pemberian Rekomendasi Pemilihan Sekolah Lanjutan (Studi Kasus Siswa Kelas IX MTs Nurul Anwar)

    Get PDF
    Pendidikan merupakan bidang yang paling penting dalam perkembangan suatu bangsa. Dalam rangka mewujudkan tujuan dari pendidikan nasional secara optimal maka setiap siswa perlu menempuh jenjang pendidikan formal setidaknya sampai siswa menempuh Sekolah Lanjutan Tingkat Atas (SLTA) Sejalan dengan hal ini maka setamat SLTP setiap siswa kelas IX seharusnya melanjutkan pendidikan ke SLTA (SMK/SMA/MA/). Siswa kelas IX yang menempuh jenjang pendidikan SLTP sudah pasti akan dihadapkan dengan masalah memilih sekolah lanjutan, baik sekolah menengah umum maupun kejuruan. Memilih sekolah lanjutan menjadi faktor penting karena berkaitan dengan masa depan siswa. Salah satu pemodelan yang bisa digunakan untuk menentukan rekomendasi pemilihan sekolah lanjutan yaitu dengan Data Mining.Pemanfaatan teknik data mining diharapkan dapat membantu dalam Menentukan rekomondasi pemilihan sekolah lanjutan. Pada penelitian ini membandingkan teknik klasifikasi dari kinerja metode K-Nearst Neighbor dan Support VectorMachine.Atribut yang digunakan terdiri dari Nilai UNBK, Minat Siswa, dan Saran BK. Dengan menggunakan masing-masing data training dan data testing sebanyak 35 data. Hasil dari penelitian yang dilakukan, berdasarkan dari nilai akurasinya Support Vector Machine lebih tinggi yaitu 97,1% dibandingkan dengan K-Nearst Neighbor yaitu 85,7% .Hasil akhir dari penelitian ini adalah metode Support Vector Machine lebih baik digunakan dari pada metode K-Nearst Neighbor

    Allosteric inhibition of a stem cell RNA-binding protein by an intermediary metabolite

    Get PDF
    Gene expression and metabolism are coupled at numerous levels. Cells must sense and respond to nutrients in their environment, and specialized cells must synthesize metabolic products required for their function. Pluripotent stem cells have the ability to differentiate into a wide variety of specialized cells. How metabolic state contributes to stem cell differentiation is not understood. In this study, we show that RNA-binding by the stem cell translation regulator Musashi-1 (MSI1) is allosterically inhibited by 18-22 carbon omega-9 monounsaturated fatty acids. The fatty acid binds to the N-terminal RNA Recognition Motif (RRM) and induces a conformational change that prevents RNA association. Musashi proteins are critical for development of the brain, blood, and epithelium. We identify stearoyl-CoA desaturase-1 as a MSI1 target, revealing a feedback loop between omega-9 fatty acid biosynthesis and MSI1 activity. We propose that other RRM proteins could act as metabolite sensors to couple gene expression changes to physiological state

    A historical perspective of biomedical explainable AI research

    Get PDF
    The black-box nature of most artificial intelligence (AI) models encourages the development of explainability methods to engender trust into the AI decision-making process. Such methods can be broadly categorized into two main types: post hoc explanations and inherently interpretable algorithms. We aimed at analyzing the possible associations between COVID-19 and the push of explainable AI (XAI) to the forefront of biomedical research. We automatically extracted from the PubMed database biomedical XAI studies related to concepts of causality or explainability and manually labeled 1,603 papers with respect to XAI categories. To compare the trends pre- and post-COVID-19, we fit a change point detection model and evaluated significant changes in publication rates. We show that the advent of COVID-19 in the beginning of 2020 could be the driving factor behind an increased focus concerning XAI, playing a crucial role in accelerating an already evolving trend. Finally, we present a discussion with future societal use and impact of XAI technologies and potential future directions for those who pursue fostering clinical trust with interpretable machine learning models.</p

    A PM10 chemically characterised nation-wide dataset for Italy. Geographical influence on urban air pollution and source apportionment

    Get PDF
    : Urban textures of the Italian cities are peculiarly shaped by the local geography generating similarities among cities placed in different regions but comparable topographical districts. This suggested the following scientific question: can such different topographies generate significant differences on the PM10 chemical composition at Italian urban sites that share similar geography despite being in different regions? To investigate whether such communalities can be found and are applicable at Country-scale, we propose here a novel methodological approach. A dataset comprising season-averages of PM10 mass concentration and chemical composition data was built, covering the decade 2005-2016 and referring to urban sites only (21 cities). Statistical analyses, estimation of missing data, identification of latent clusters and source apportionment modelling by Positive Matrix Factorization (PMF) were performed on this unique dataset. The first original result is the demonstration that a dataset with atypical time resolution can be successfully exploited as an input matrix for PMF obtaining Country-scale representative chemical profiles, whose physical consistency has been assessed by different tests of modelling performance. Secondly, this dataset can be considered a reference repository of season averages of chemical species over the Italian territory and the chemical profiles obtained by PMF for urban Italian agglomerations could contribute to emission repositories. These findings indicate that our approach is powerful, and it could be further employed with datasets typically available in the air pollution monitoring networks

    Truncating FLNC Mutations Are Associated With High-Risk Dilated and Arrhythmogenic Cardiomyopathies

    Get PDF
    BACKGROUND: Filamin C (encoded by the FLNC gene) is essential for sarcomere attachment to the plasmatic membrane. FLNC mutations have been associated with myofibrillar myopathies, and cardiac involvement has been reported in some carriers. Accordingly, since 2012, the authors have included FLNC in the genetic screening of patients with inherited cardiomyopathies and sudden death. OBJECTIVES: The aim of this study was to demonstrate the association between truncating mutations in FLNC and the development of high-risk dilated and arrhythmogenic cardiomyopathies. METHODS: FLNC was studied using next-generation sequencing in 2,877 patients with inherited cardiovascular diseases. A characteristic phenotype was identified in probands with truncating mutations in FLNC. Clinical and genetic evaluation of 28 affected families was performed. Localization of filamin C in cardiac tissue was analyzed in patients with truncating FLNC mutations using immunohistochemistry. RESULTS: Twenty-three truncating mutations were identified in 28 probands previously diagnosed with dilated, arrhythmogenic, or restrictive cardiomyopathies. Truncating FLNC mutations were absent in patients with other phenotypes, including 1,078 patients with hypertrophic cardiomyopathy. Fifty-four mutation carriers were identified among 121 screened relatives. The phenotype consisted of left ventricular dilation (68%), systolic dysfunction (46%), and myocardial fibrosis (67%); inferolateral negative T waves and low QRS voltages on electrocardiography (33%); ventricular arrhythmias (82%); and frequent sudden cardiac death (40 cases in 21 of 28 families). Clinical skeletal myopathy was not observed. Penetrance was >97% in carriers older than 40 years. Truncating mutations in FLNC cosegregated with this phenotype with a dominant inheritance pattern (combined logarithm of the odds score: 9.5). Immunohistochemical staining of myocardial tissue showed no abnormal filamin C aggregates in patients with truncating FLNC mutations. CONCLUSIONS: Truncating mutations in FLNC caused an overlapping phenotype of dilated and left-dominant arrhythmogenic cardiomyopathies complicated by frequent premature sudden death. Prompt implantation of a cardiac defibrillator should be considered in affected patients harboring truncating mutations in FLNC.Instituto de Salud Carlos III [PI11/0699, PI14/0967, PI14/01477, RD012/0042/0029, RD012/0042/0049, RD012/0042/0066, RD12/0042/0069]; Spanish Ministry of Economy and Competitiveness [SAF2015-71863-REDT]; Plan Nacional de I+D+I; Plan Estatalde I+D+I, European Regional Development Fund; Health in Code SLS
    corecore