1,420 research outputs found

    GraphFC: Customs Fraud Detection with Label Scarcity

    Full text link
    Custom officials across the world encounter huge volumes of transactions. With increased connectivity and globalization, the customs transactions continue to grow every year. Associated with customs transactions is the customs fraud - the intentional manipulation of goods declarations to avoid the taxes and duties. With limited manpower, the custom offices can only undertake manual inspection of a limited number of declarations. This necessitates the need for automating the customs fraud detection by machine learning (ML) techniques. Due the limited manual inspection for labeling the new-incoming declarations, the ML approach should have robust performance subject to the scarcity of labeled data. However, current approaches for customs fraud detection are not well suited and designed for this real-world setting. In this work, we propose GraphFC\textbf{GraphFC} (Graph\textbf{Graph} neural networks for C\textbf{C}ustoms F\textbf{F}raud), a model-agnostic, domain-specific, semi-supervised graph neural network based customs fraud detection algorithm that has strong semi-supervised and inductive capabilities. With upto 252% relative increase in recall over the present state-of-the-art, extensive experimentation on real customs data from customs administrations of three different countries demonstrate that GraphFC consistently outperforms various baselines and the present state-of-art by a large margin

    Semi-supervised multi-layered clustering model for intrusion detection

    Get PDF
    A Machine Learning (ML) -based Intrusion Detection and Prevention System (IDPS) requires a large amount of labeled up-to-date training data, to effectively detect intrusions and generalize well to novel attacks. However, labeling of data is costly and becomes infeasible when dealing with big data, such as those generated by IoT (Internet of Things) -based applications. To this effect, building a ML model that learns from non- or partially-labeled data is of critical importance. This paper proposes a novel Semi-supervised Multi-Layered Clustering Model (SMLC) for network intrusion detection and prevention tasks. The SMLC has the capability to learn from partially labeled data while achieving a comparable detection performance to supervised ML-based IDPS. The performance of the SMLC is compared with well-known supervised ensemble ML models, namely, RandomForest, Bagging, and AdaboostM1 and a semi-supervised model (i.e., tri-training) on a benchmark network intrusion dataset, the Kyoto 2006+. Experimental results show that the SMLC outperforms all other models and can achieve better detection accuracy using only 20% labeled instances of the training data

    A comprehensive survey on deep active learning and its applications in medical image analysis

    Full text link
    Deep learning has achieved widespread success in medical image analysis, leading to an increasing demand for large-scale expert-annotated medical image datasets. Yet, the high cost of annotating medical images severely hampers the development of deep learning in this field. To reduce annotation costs, active learning aims to select the most informative samples for annotation and train high-performance models with as few labeled samples as possible. In this survey, we review the core methods of active learning, including the evaluation of informativeness and sampling strategy. For the first time, we provide a detailed summary of the integration of active learning with other label-efficient techniques, such as semi-supervised, self-supervised learning, and so on. Additionally, we also highlight active learning works that are specifically tailored to medical image analysis. In the end, we offer our perspectives on the future trends and challenges of active learning and its applications in medical image analysis.Comment: Paper List on Github: https://github.com/LightersWang/Awesome-Active-Learning-for-Medical-Image-Analysi

    Recent advancement in Disease Diagnostic using machine learning: Systematic survey of decades, comparisons, and challenges

    Full text link
    Computer-aided diagnosis (CAD), a vibrant medical imaging research field, is expanding quickly. Because errors in medical diagnostic systems might lead to seriously misleading medical treatments, major efforts have been made in recent years to improve computer-aided diagnostics applications. The use of machine learning in computer-aided diagnosis is crucial. A simple equation may result in a false indication of items like organs. Therefore, learning from examples is a vital component of pattern recognition. Pattern recognition and machine learning in the biomedical area promise to increase the precision of disease detection and diagnosis. They also support the decision-making process's objectivity. Machine learning provides a practical method for creating elegant and autonomous algorithms to analyze high-dimensional and multimodal bio-medical data. This review article examines machine-learning algorithms for detecting diseases, including hepatitis, diabetes, liver disease, dengue fever, and heart disease. It draws attention to the collection of machine learning techniques and algorithms employed in studying conditions and the ensuing decision-making process

    Recent Advances in Transfer Learning for Cross-Dataset Visual Recognition: A Problem-Oriented Perspective

    Get PDF
    This paper takes a problem-oriented perspective and presents a comprehensive review of transfer learning methods, both shallow and deep, for cross-dataset visual recognition. Specifically, it categorises the cross-dataset recognition into seventeen problems based on a set of carefully chosen data and label attributes. Such a problem-oriented taxonomy has allowed us to examine how different transfer learning approaches tackle each problem and how well each problem has been researched to date. The comprehensive problem-oriented review of the advances in transfer learning with respect to the problem has not only revealed the challenges in transfer learning for visual recognition, but also the problems (e.g. eight of the seventeen problems) that have been scarcely studied. This survey not only presents an up-to-date technical review for researchers, but also a systematic approach and a reference for a machine learning practitioner to categorise a real problem and to look up for a possible solution accordingly
    • …
    corecore