41 research outputs found

    The Paradox of Noise: An Empirical Study of Noise-Infusion Mechanisms to Improve Generalization, Stability, and Privacy in Federated Learning

    Full text link
    In a data-centric era, concerns regarding privacy and ethical data handling grow as machine learning relies more on personal information. This empirical study investigates the privacy, generalization, and stability of deep learning models in the presence of additive noise in federated learning frameworks. Our main objective is to provide strategies to measure the generalization, stability, and privacy-preserving capabilities of these models and further improve them. To this end, five noise infusion mechanisms at varying noise levels within centralized and federated learning settings are explored. As model complexity is a key component of the generalization and stability of deep learning models during training and evaluation, a comparative analysis of three Convolutional Neural Network (CNN) architectures is provided. The paper introduces Signal-to-Noise Ratio (SNR) as a quantitative measure of the trade-off between privacy and training accuracy of noise-infused models, aiming to find the noise level that yields optimal privacy and accuracy. Moreover, the Price of Stability and Price of Anarchy are defined in the context of privacy-preserving deep learning, contributing to the systematic investigation of the noise infusion strategies to enhance privacy without compromising performance. Our research sheds light on the delicate balance between these critical factors, fostering a deeper understanding of the implications of noise-based regularization in machine learning. By leveraging noise as a tool for regularization and privacy enhancement, we aim to contribute to the development of robust, privacy-aware algorithms, ensuring that AI-driven solutions prioritize both utility and privacy

    Time-series Analysis for Detecting Structure Changes and Suspicious Accounting Activities in Public Software Companies

    Get PDF
    AbstractThis paper offers a novel methodology using several new ratios and comparison approaches to investigate public software companies’ financial activities and condition. The methodology focuses on time-series data mining, monitoring and analyzing. The dataset is based on 100 U.S. software companies with least ten-year SEC verified income statement, balance sheets and cash flow statement. The contribution of this paper is creating and applying several new financial ratios combined with traditional approach to detect companies’ financial structure changes and accounts manipulation. For cash flow statement operating section, our proposed major account to operating net cash inflow and outflow ratios provide a better visualization of the cash sources and usage, which help analysts to observe major cash flow structure changes and make predication. For investing section, our proposed investing cash flow growth contribution ratio is used to identify irregular investment behavior. Combining with the traditional financial ratio tests, we believe that our approach significantly facilitates early detection on suspicious financial activities and the evaluation of its financial status

    Exploring Machine Learning Models for Federated Learning: A Review of Approaches, Performance, and Limitations

    Full text link
    In the growing world of artificial intelligence, federated learning is a distributed learning framework enhanced to preserve the privacy of individuals' data. Federated learning lays the groundwork for collaborative research in areas where the data is sensitive. Federated learning has several implications for real-world problems. In times of crisis, when real-time decision-making is critical, federated learning allows multiple entities to work collectively without sharing sensitive data. This distributed approach enables us to leverage information from multiple sources and gain more diverse insights. This paper is a systematic review of the literature on privacy-preserving machine learning in the last few years based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. Specifically, we have presented an extensive review of supervised/unsupervised machine learning algorithms, ensemble methods, meta-heuristic approaches, blockchain technology, and reinforcement learning used in the framework of federated learning, in addition to an overview of federated learning applications. This paper reviews the literature on the components of federated learning and its applications in the last few years. The main purpose of this work is to provide researchers and practitioners with a comprehensive overview of federated learning from the machine learning point of view. A discussion of some open problems and future research directions in federated learning is also provided

    Bayesian Kernel Methods for Non-Gaussian Distributions: Binary and Multi- class Classification Problems

    Get PDF
    Project Objective: The objective of this project is to develop a Bayesian kernel model built around non- Gaussian prior distributions to address binary and multi-class classification problems.Recent advances in data mining have integrated kernel functions with Bayesian probabilistic analysis of Gaussian distributions. These machine learning approaches can incorporate prior information with new data to calculate probabilistic rather than deterministic values for unknown parameters. This paper analyzes extensively a specific Bayesian kernel model that uses a kernel function to calculate a posterior beta distribution that is conjugate to the prior beta distribution. Numerical testing of the beta kernel model on several benchmark data sets reveal that this model’s accuracy is comparable with those of the support vector machine and relevance vector machine, and the model runs more quickly than the other algorithms. When one class occurs much more frequently than the other class, the beta kernel model often outperforms other strategies to handle imbalanced data sets. If data arrive sequentially over time, the beta kernel model easily and quickly updates the probability distribution, and this model is more accurate than an incremental support vector machine algorithm for online learning when fewer than 50 data points are available.U.S. Army Research OfficeSponsor/Monitor's Report Number(s): 61414-MA-II.3W911NF-12-1-040

    Efficient faces of a polytope: Interior methods in multiobjective optimization

    No full text
    This dissertation addresses the problem of computing the set of efficient faces of a bounded polyhedron in R\sp{\rm n} that is defined by linear inequalities. It describes two algorithms. One algorithm is an interior point method that generalizes and extends the recent path following techniques in Linear Programming to multiple objective optimization. It finds an efficient face in polynomial time. The other algorithm is based on an entirely new approach to multiple objective optimization that employs techniques of algebraic geometry related to the parametrization of algebraic varieties in n-dimensional spaces. It approximates a portion of the set of efficient faces by an algebraic surface. Generalizations for nonlinear multiobjective optimization problems where the feasible region is defined by quadratic constraints are also examined. Parametrization of hyperellipsoids in n space are also studied along with their use in approximating efficient faces (efficient frontier) of polytopes. Relations with multivalued dynamical systems are also discussed

    Foreword

    No full text

    A Bayesian beta kernel model for binary classification and online learning problems

    Get PDF
    Statistical Analysis and Data Mining, 7(6), 434-449. Author's accepted manuscriptThe article of record may be found at http://dx.doi.org/10.1002/sam.11241Recent advances in data mining have integrated kernel functions with Bayesian probabilistic analysis of Gaussian distributions. These machine learning approaches can incorporate prior information with new data to calculate probabilistic rather than deterministic values for unknown parameters. This paper extensively analyzes a speci c Bayesian kernel model that uses a kernel function to calculate a posterior beta distribution that is conjugate to the prior beta distribution. Numerical testing of the beta kernel model on several benchmark data sets reveals that this model's accuracy is comparable with those of the support vector machine, relevance vector machine, naive Bayes, and logistic regression, and the model runs more quickly than other algorithms. When one class occurs much more frequently than the other class, the beta kernel model often outperforms other strategies to handle imbalanced data sets, including undersampling, over-sampling, and the Synthetic Minority Over-Sampling Technique. If data arrive sequentially over time, the beta kernel model easily and quickly updates the probability distribution, and this model is more accurate than an incremental support vector machine algorithm for online learning.This work was funded in part by the U.S. Army Research, Development and Engineering Command, Army Research Office, Mathematical Science Division, under proposal no. 61414-MA-II
    corecore