4,965 research outputs found

    SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary

    Get PDF
    The Synthetic Minority Oversampling Technique (SMOTE) preprocessing algorithm is considered \de facto" standard in the framework of learning from imbalanced data. This is due to its simplicity in the design of the procedure, as well as its robustness when applied to di erent type of problems. Since its publication in 2002, SMOTE has proven successful in a variety of applications from several di erent domains. SMOTE has also inspired several approaches to counter the issue of class imbalance, and has also signi cantly contributed to new supervised learning paradigms, including multilabel classi cation, incremental learning, semi-supervised learning, multi-instance learning, among others. It is standard benchmark for learning from imbalanced data. It is also featured in a number of di erent software packages | from open source to commercial. In this paper, marking the fteen year anniversary of SMOTE, we re ect on the SMOTE journey, discuss the current state of a airs with SMOTE, its applications, and also identify the next set of challenges to extend SMOTE for Big Data problems.This work have been partially supported by the Spanish Ministry of Science and Technology under projects TIN2014-57251-P, TIN2015-68454-R and TIN2017-89517-P; the Project 887 BigDaP-TOOLS - Ayudas Fundaci on BBVA a Equipos de Investigaci on Cient ca 2016; and the National Science Foundation (NSF) Grant IIS-1447795

    An Empirical Study of AML Approach for Credit Card Fraud Detection—Financial Transactions

    Get PDF
    Credit card fraud is one of the flip sides of the digital world, where transactions are made without the knowledge of the genuine user. Based on the study of various papers published between 1994 and 2018 on credit card fraud, the following objectives are achieved: the various types of credit card frauds has identified and to detect automatically these frauds, an adaptive machine learning techniques (AMLTs) has studied and also their pros and cons has summarized. The various dataset are used in the literature has studied and categorized into the real and synthesized datasets.The performance matrices and evaluation criteria have summarized which has used to evaluate the fraud detection system.This study has also covered the deep analysis and comparison of the performance (i.e sensitivity, specificity, and accuracy) of existing machine learning techniques in the credit card fraud detection area.The findings of this study clearly show that supervised learning, card-not-present fraud, skimming fraud, and website cloning method has been used more frequently.This Study helps to new researchers by discussing the limitation of existing fraud detection techniques and providing helpful directions of research in the credit card fraud detection field

    A Cost-Sensitive Sparse Representation Based Classification for Class-Imbalance Problem

    Get PDF

    Model-based and Model-free Approaches for Power System Security Assessment

    Get PDF
    Continuous security assessment of a power system is necessary to insure a reliable, stable, and continuous supply of electrical power to customers. To this end, this dissertation identifies and explores some of the various challenges encountered in the field of power system security assessment. Accordingly, several model-based and/or model-free approaches were developed to overcome these challenges. First, a voltage stability index, named TAVSI, is proposed. This index has three important features: TAVSI applies to general load models including ZIP, exponential, and induction motor loads; TAVSI can be used for both measurement-based and model-based voltage stability assessment; and finally, TAVSI is calculated based on normalized sensitivities which enables identification of weak buses and the definition of a global instability threshold. TAVSI was tested on both the IEEE 14-bus and the 181-bus WECC systems. Results show that TAVSI gives a reliable assessment of system stability. Second, a data-driven and model-based hybrid reinforcement learning approach is proposed for training a control agent to re-dispatch generators’ output power in order to relieve stressed branches. For large power systems, the agent’s action space is highly dimensioned which challenges the successful training of data-driven agents. Therefore, we propose a hybrid approach where model-based actions are utilized to help the agent learn an optimal control policy. The proposed approach was tested and compared to the generic data-driven DDPG-based approach on the IEEE 118-bus system and a larger 2749-bus real-world system. Results show that the hybrid approach performs well for large power systems and that it is superior to the DDPG-based approach. Finally, a Convolutional Neural Network (CNN) based approach is proposed as a faster alternative to the classical AC power flow-based contingency screening. The proposed approach is investigated on both the IEEE 118-bus system and the Texas 2000-bus synthetic system. For such large systems, the implementation of the proposed approach came with several challenges, such as computational burden, learning from imbalanced datasets, and performance evaluation of trained models. Accordingly, this work contributes a set of novel techniques and best practices that enables both efficient and successful implementation of CNN-based multi-contingency classifiers for large power systems

    Event program

    Get PDF
    UNLV Undergraduates from all departments, programs and colleges participated in a campus-wide symposium on April 16, 2011. Undergraduate posters from all disciplines and also oral presentations of research activities, readings and other creative endeavors were exhibited throughout the festival

    Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision

    Full text link
    Recent AI-assistant agents, such as ChatGPT, predominantly rely on supervised fine-tuning (SFT) with human annotations and reinforcement learning from human feedback (RLHF) to align the output of large language models (LLMs) with human intentions, ensuring they are helpful, ethical, and reliable. However, this dependence can significantly constrain the true potential of AI-assistant agents due to the high cost of obtaining human supervision and the related issues on quality, reliability, diversity, self-consistency, and undesirable biases. To address these challenges, we propose a novel approach called SELF-ALIGN, which combines principle-driven reasoning and the generative power of LLMs for the self-alignment of AI agents with minimal human supervision. Our approach encompasses four stages: first, we use an LLM to generate synthetic prompts, and a topic-guided method to augment the prompt diversity; second, we use a small set of human-written principles for AI models to follow, and guide the LLM through in-context learning from demonstrations (of principles application) to produce helpful, ethical, and reliable responses to user's queries; third, we fine-tune the original LLM with the high-quality self-aligned responses so that the resulting model can generate desirable responses for each query directly without the principle set and the demonstrations anymore; and finally, we offer a refinement step to address the issues of overly-brief or indirect responses. Applying SELF-ALIGN to the LLaMA-65b base language model, we develop an AI assistant named Dromedary. With fewer than 300 lines of human annotations (including < 200 seed prompts, 16 generic principles, and 5 exemplars for in-context learning). Dromedary significantly surpasses the performance of several state-of-the-art AI systems, including Text-Davinci-003 and Alpaca, on benchmark datasets with various settings.Comment: Accepted at NeurIPS 2023 (Spotlight). Project page: https://github.com/IBM/Dromedar
    • …
    corecore