41 research outputs found
The Paradox of Noise: An Empirical Study of Noise-Infusion Mechanisms to Improve Generalization, Stability, and Privacy in Federated Learning
In a data-centric era, concerns regarding privacy and ethical data handling
grow as machine learning relies more on personal information. This empirical
study investigates the privacy, generalization, and stability of deep learning
models in the presence of additive noise in federated learning frameworks. Our
main objective is to provide strategies to measure the generalization,
stability, and privacy-preserving capabilities of these models and further
improve them. To this end, five noise infusion mechanisms at varying noise
levels within centralized and federated learning settings are explored. As
model complexity is a key component of the generalization and stability of deep
learning models during training and evaluation, a comparative analysis of three
Convolutional Neural Network (CNN) architectures is provided. The paper
introduces Signal-to-Noise Ratio (SNR) as a quantitative measure of the
trade-off between privacy and training accuracy of noise-infused models, aiming
to find the noise level that yields optimal privacy and accuracy. Moreover, the
Price of Stability and Price of Anarchy are defined in the context of
privacy-preserving deep learning, contributing to the systematic investigation
of the noise infusion strategies to enhance privacy without compromising
performance. Our research sheds light on the delicate balance between these
critical factors, fostering a deeper understanding of the implications of
noise-based regularization in machine learning. By leveraging noise as a tool
for regularization and privacy enhancement, we aim to contribute to the
development of robust, privacy-aware algorithms, ensuring that AI-driven
solutions prioritize both utility and privacy
Time-series Analysis for Detecting Structure Changes and Suspicious Accounting Activities in Public Software Companies
AbstractThis paper offers a novel methodology using several new ratios and comparison approaches to investigate public software companies’ financial activities and condition. The methodology focuses on time-series data mining, monitoring and analyzing. The dataset is based on 100 U.S. software companies with least ten-year SEC verified income statement, balance sheets and cash flow statement. The contribution of this paper is creating and applying several new financial ratios combined with traditional approach to detect companies’ financial structure changes and accounts manipulation. For cash flow statement operating section, our proposed major account to operating net cash inflow and outflow ratios provide a better visualization of the cash sources and usage, which help analysts to observe major cash flow structure changes and make predication. For investing section, our proposed investing cash flow growth contribution ratio is used to identify irregular investment behavior. Combining with the traditional financial ratio tests, we believe that our approach significantly facilitates early detection on suspicious financial activities and the evaluation of its financial status
Exploring Machine Learning Models for Federated Learning: A Review of Approaches, Performance, and Limitations
In the growing world of artificial intelligence, federated learning is a
distributed learning framework enhanced to preserve the privacy of individuals'
data. Federated learning lays the groundwork for collaborative research in
areas where the data is sensitive. Federated learning has several implications
for real-world problems. In times of crisis, when real-time decision-making is
critical, federated learning allows multiple entities to work collectively
without sharing sensitive data. This distributed approach enables us to
leverage information from multiple sources and gain more diverse insights. This
paper is a systematic review of the literature on privacy-preserving machine
learning in the last few years based on the Preferred Reporting Items for
Systematic Reviews and Meta-Analyses (PRISMA) guidelines. Specifically, we have
presented an extensive review of supervised/unsupervised machine learning
algorithms, ensemble methods, meta-heuristic approaches, blockchain technology,
and reinforcement learning used in the framework of federated learning, in
addition to an overview of federated learning applications. This paper reviews
the literature on the components of federated learning and its applications in
the last few years. The main purpose of this work is to provide researchers and
practitioners with a comprehensive overview of federated learning from the
machine learning point of view. A discussion of some open problems and future
research directions in federated learning is also provided
Bayesian Kernel Methods for Non-Gaussian Distributions: Binary and Multi- class Classification Problems
Project Objective: The objective of this project is to develop a Bayesian kernel model built around non- Gaussian prior distributions to address binary and multi-class classification problems.Recent advances in data mining have integrated kernel functions with Bayesian probabilistic analysis of Gaussian distributions. These machine learning approaches can incorporate prior information with new data to calculate probabilistic rather than deterministic values for unknown parameters. This paper analyzes extensively a specific Bayesian kernel model that uses a kernel function to calculate a posterior beta distribution that is conjugate to the prior beta distribution. Numerical testing of the beta kernel model on several benchmark data sets reveal that this model’s accuracy is comparable with those of the support vector machine and relevance vector machine, and the model runs more quickly than the other algorithms. When one class occurs much more frequently than the other class, the beta kernel model often outperforms other strategies to handle imbalanced data sets. If data arrive sequentially over time, the beta kernel model easily and quickly updates the probability distribution, and this model is more accurate than an incremental support vector machine algorithm for online learning when fewer than 50 data points are available.U.S. Army Research OfficeSponsor/Monitor's Report Number(s): 61414-MA-II.3W911NF-12-1-040
Efficient faces of a polytope: Interior methods in multiobjective optimization
This dissertation addresses the problem of computing the set of efficient faces of a bounded polyhedron in R\sp{\rm n} that is defined by linear inequalities. It describes two algorithms. One algorithm is an interior point method that generalizes and extends the recent path following techniques in Linear Programming to multiple objective optimization. It finds an efficient face in polynomial time. The other algorithm is based on an entirely new approach to multiple objective optimization that employs techniques of algebraic geometry related to the parametrization of algebraic varieties in n-dimensional spaces. It approximates a portion of the set of efficient faces by an algebraic surface. Generalizations for nonlinear multiobjective optimization problems where the feasible region is defined by quadratic constraints are also examined. Parametrization of hyperellipsoids in n space are also studied along with their use in approximating efficient faces (efficient frontier) of polytopes. Relations with multivalued dynamical systems are also discussed
A Bayesian beta kernel model for binary classification and online learning problems
Statistical Analysis and Data Mining, 7(6), 434-449. Author's accepted manuscriptThe article of record may be found at http://dx.doi.org/10.1002/sam.11241Recent advances in data mining have integrated kernel functions with Bayesian
probabilistic analysis of Gaussian distributions. These machine learning approaches
can incorporate prior information with new data to calculate probabilistic
rather than deterministic values for unknown parameters. This paper
extensively analyzes a speci c Bayesian kernel model that uses a kernel function
to calculate a posterior beta distribution that is conjugate to the prior beta distribution.
Numerical testing of the beta kernel model on several benchmark data
sets reveals that this model's accuracy is comparable with those of the support
vector machine, relevance vector machine, naive Bayes, and logistic regression,
and the model runs more quickly than other algorithms. When one class occurs
much more frequently than the other class, the beta kernel model often
outperforms other strategies to handle imbalanced data sets, including undersampling,
over-sampling, and the Synthetic Minority Over-Sampling Technique.
If data arrive sequentially over time, the beta kernel model easily and quickly
updates the probability distribution, and this model is more accurate than an
incremental support vector machine algorithm for online learning.This work was funded in part by the U.S. Army Research, Development and Engineering Command, Army Research Office, Mathematical Science Division, under proposal no. 61414-MA-II
On-line SVM learning via an incremental primal-dual technique
International audienc
An Incremental Interior Point Method for On-line SVM Learning
International audienc