18 research outputs found

    Toward Behavioral Modeling of a Grid System: Mining the Logging and Bookkeeping files

    Get PDF
    International audienceGrid systems are complex heterogeneous systems, and their modeling constitutes a highly challenging goal. This paper is interested in modeling the jobs handled by the EGEE grid, by mining the Logging and Bookkeeping files. The goal is to discover meaningful job clusters, going beyond the coarse categories of ”successfully terminated jobs” and ”other jobs”. The presented approach is a threestep process: i) Data slicing is used to alleviate the job heterogeneity and afford discriminant learning; ii) Constructive induction proceeds by learning discriminant hypotheses from each data slice; iii) Finally, double clustering is used on the representation built by constructive induction; the clusters are fully validated after the stability criteria proposed by Meila (2006). Lastly, the job clusters are submitted to the experts and some meaningful interpretations are foun

    Recent Results on "Approximations to Optimal Alarm Systems for Anomaly Detection"

    Get PDF
    An optimal alarm system and its approximations may use Kalman filtering for univariate linear dynamic systems driven by Gaussian noise to provide a layer of predictive capability. Predicted Kalman filter future process values and a fixed critical threshold can be used to construct a candidate level-crossing event over a predetermined prediction window. An optimal alarm system can be designed to elicit the fewest false alarms for a fixed detection probability in this particular scenario

    Early Stopping of a Neural Network via the Receiver Operating Curve.

    Get PDF
    This thesis presents the area under the ROC (Receiver Operating Characteristics) curve, or abbreviated AUC, as an alternate measure for evaluating the predictive performance of ANNs (Artificial Neural Networks) classifiers. Conventionally, neural networks are trained to have total error converge to zero which may give rise to over-fitting problems. To ensure that they do not over fit the training data and then fail to generalize well in new data, it appears effective to stop training as early as possible once getting AUC sufficiently large via integrating ROC/AUC analysis into the training process. In order to reduce learning costs involving the imbalanced data set of the uneven class distribution, random sampling and k-means clustering are implemented to draw a smaller subset of representatives from the original training data set. Finally, the confidence interval for the AUC is estimated in a non-parametric approach

    Applying machine learning techniques to identify companies at higher risk of ESG controversy

    Get PDF
    ESG controversies may have enormous consequences for an individual company, its customers, investors, and other stakeholders. The objective of this work is to identify companies at high risk of ESG controversy based on public ESG data. By using machine learning solutions, early indicators in ESG data can be identified that provide insight into how likely a company is to face an ESG controversy. By using Random Forest models, the proportion of companies with a controversy among the flagged companies can be increased by 93, 5.6and 4.3times for the Environmental, Social and Governance pillar, respectively

    Comparison of feature representations in MRI-based MCI-to-AD conversion prediction

    Get PDF
    Alzheimer's disease (AD) is a progressive neurological disorder in which the death of brain cells causes memory loss and cognitive decline. The identification of at-risk subjects yet showing no dementia symptoms but who will later convert to AD can be crucial for the effective treatment of AD. For this, Magnetic Resonance Imaging (MRI) is expected to play a crucial role. During recent years, several Machine Learning (ML) approaches to AD-conversion prediction have been proposed using different types of MRI features. However, few studies comparing these different feature representations exist, and the existing ones do not allow to make definite conclusions. We evaluated the performance of various types of MRI features for the conversion prediction: voxel-based features extracted based on voxel-based morphometry, hippocampus volumes, volumes of the entorhinal cortex, and a set of regional volumetric, surface area, and cortical thickness measures across the brain. Regional features consistently yielded the best performance over two classifiers (Support Vector Machines and Regularized Logistic Regression), and two datasets studied. However, the performance difference to other features was not statistically significant. There was a consistent trend of age correction improving the classification performance, but the improvement reached statistical significance only rarely.Data collection and sharing for this project was funded by the Alzheimer's Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer's Association; Alzheimer's Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. J. Tohka's work was supported by the Academy of Finland and V. GĂłmez-Verdejo's work has been partly funded by the Spanish MINECO grant TEC2014-52289R, TEC2016-81900-REDT/AEI and TEC2017-83838-R

    The Metabolomic Profile in Amyotrophic Lateral Sclerosis Changes According to the Progression of the Disease: An Exploratory Study

    Get PDF
    Amyotrophic lateral sclerosis (ALS) is a multifactorial neurodegenerative pathology of the upper or lower motor neuron. Evaluation of ALS progression is based on clinical outcomes considering the impairment of body sites. ALS has been extensively investigated in the pathogenetic mechanisms and the clinical profile; however, no molecular biomarkers are used as diagnostic criteria to establish the ALS pathological staging. Using the source-reconstructed magnetoencephalography (MEG) approach, we demonstrated that global brain hyperconnectivity is associated with early and advanced clinical ALS stages. Using nuclear magnetic resonance (1H-NMR) and high resolution mass spectrometry (HRMS) spectroscopy, here we studied the metabolomic profile of ALS patients’ sera characterized by different stages of disease progression—namely early and advanced. Multivariate statistical analysis of the data integrated with the network analysis indicates that metabolites related to energy deficit, abnormal concentrations of neurotoxic metabolites and metabolites related to neurotransmitter production are pathognomonic of ALS in the advanced stage. Furthermore, analysis of the lipidomic profile indicates that advanced ALS patients report significant alteration of phosphocholine (PCs), lysophosphatidylcholine (LPCs), and sphingomyelin (SMs) metabolism, consistent with the exigency of lipid remodeling to repair advanced neuronal degeneration and inflammatio

    Opening the Black-Box of AI: Challenging Pattern Robustness and Improving Theorizing through Explainable AI Methods

    Get PDF
    Machine Learning (ML) algorithms, as approach to Artificial Intelligence (AI), show unprecedented analytical capabilities and tremendous potential for pattern detection in large data sets. Despite researchers showing great interest in these methodologies, ML remains largely underutilized, because the algorithms are a black-box, preventing the interpretation of learned models. Recent research on explainable artificial intelligence (XAI) sheds light on these models by allowing researchers to identify the main determinants of a prediction through post-hoc analyses. Thereby, XAI affords the opportunity to critically reflect on identified patterns, offering the opportunity to enhance decision making and theorizing based on these patterns. Based on two large and publicly available data sets, we show that different variables within the same data set can generate models with similar predictive accuracy. In exploring this issue, we develop guidelines and recommendations for the effective use of XAI in research and particularly for theorizing from identified patterns

    The Misuse of AUC: What High Impact Risk Assessment Gets Wrong

    Full text link
    When determining which machine learning model best performs some high impact risk assessment task, practitioners commonly use the Area under the Curve (AUC) to defend and validate their model choices. In this paper, we argue that the current use and understanding of AUC as a model performance metric misunderstands the way the metric was intended to be used. To this end, we characterize the misuse of AUC and illustrate how this misuse negatively manifests in the real world across several risk assessment domains. We locate this disconnect in the way the original interpretation of AUC has shifted over time to the point where issues pertaining to decision thresholds, class balance, statistical uncertainty, and protected groups remain unaddressed by AUC-based model comparisons, and where model choices that should be the purview of policymakers are hidden behind the veil of mathematical rigor. We conclude that current model validation practices involving AUC are not robust, and often invalid

    Risk-Based Comparison of Classification Systems

    Get PDF
    Performance measures for families of classification system families that rely upon the analysis of receiver operating characteristics (ROCs), such as area under the ROC curve (AUC), often fail to fully address the issue of risk, especially for classification systems involving more than two classes. For the general case, we denote matrices of class prevalences, costs, and class-conditional probabilities, and assume costs are subjectively fixed, acceptable estimates for expected values of class-conditional probabilities exist, and mutual independence between a variable in one such matrix and those of any other matrix. The ROC Risk Functional (RRF), valid for any finite number of classes, has an associated parameter argument, that which specifies a member of a family of classification systems, and which system minimizes Bayes risk over the family. We typify joint distributions for class prevalences over standard simplices by means of uniform and beta distributions, and create a family of classification systems using actual data, testing independence assumptions under two such class prevalence distributions. We minimize risk under two different sets of costs
    corecore