157 research outputs found
A traffic classification method using machine learning algorithm
Applying concepts of attack investigation in IT industry, this idea has been developed to design
a Traffic Classification Method using Data Mining techniques at the intersection of Machine
Learning Algorithm, Which will classify the normal and malicious traffic. This classification will
help to learn about the unknown attacks faced by IT industry. The notion of traffic classification
is not a new concept; plenty of work has been done to classify the network traffic for
heterogeneous application nowadays. Existing techniques such as (payload based, port based
and statistical based) have their own pros and cons which will be discussed in this
literature later, but classification using Machine Learning techniques is still an open field to explore and has provided very promising results up till now
Active Nearest-Neighbor Learning in Metric Spaces
We propose a pool-based non-parametric active learning algorithm for general
metric spaces, called MArgin Regularized Metric Active Nearest Neighbor
(MARMANN), which outputs a nearest-neighbor classifier. We give prediction
error guarantees that depend on the noisy-margin properties of the input
sample, and are competitive with those obtained by previously proposed passive
learners. We prove that the label complexity of MARMANN is significantly lower
than that of any passive learner with similar error guarantees. MARMANN is
based on a generalized sample compression scheme, and a new label-efficient
active model-selection procedure
The R Package MitISEM: Mixture of Student-t Distributions using Importance Sampling Weighted Expectation Maximization for Efficient and Robust Simulation
This paper presents the R package MitISEM, which provides an automatic and flexible method to approximate a non-elliptical target density using adaptive mixtures of Student-t densities, where only a kernel of the target density is required. The approximation can be used as a candidate density in Importance Sampling or Metropolis Hastings methods for Bayesian inference on model parameters and probabilities. The package provides also an extended MitISEM algorithm, ‘sequential MitISEM’, which substantially decreases the computational time when the target density has to be approximated for increasing data samples. This occurs when the posterior distribution is updated with new observations and/or when one computes model probabilities using predictive likelihoods. We illustrate the MitISEM algorithm using three canonical statistical and econometric models that are characterized by several types of non-elliptical posterior shapes and that describe well-known data patterns in econometrics and finance. We show that the candidate distribution obtained by MitISEM outperforms those obtained by ‘naive’ approximations in terms of numerical efficiency. Further, the MitISEM approach can be used for Bayesian model comparison, using the predictive likelihoods
Robust detection of real-time power quality disturbances under noisy condition using FTDD features
To improve power quality (PQ), detecting the particular type of disturbance is the foremost thing before mitigation. So monitoring is needed to detect the PQ disturbance that occurs in a short duration of time. Classification of real-time PQ disturbances under noisy environment is investigated in this method by selecting an appropriate signal processing tool called fusion of time domain descriptors (FTDD) at the feature extraction stage. It’s a method of extracting power spectrum characteristics of various PQ disturbances. Few advantages like algorithmic simplicity and local time-based unique features makes the FTDD algorithm ahead of other techniques. PQ events like voltage sag, voltage swell, interruption, healthy, transient and harmonics mixed with different noise conditions are analysed. multiclass support vector machine and Naïves Bayes (NB) classifiers are applied to analyse the performance of the proposed method. As a result, NB classifier performs better in noiseless signal with 99.66%, wherein noise added signals both NB and SVM are showing better accuracy at different signal to noise ratios. Finally, Arduino controller-based hardware tool involved in the acquisition of real-time signals shows how our proposed system is applicable for industries that make detection simple
The Kernel Density Integral Transformation
Feature preprocessing continues to play a critical role when applying machine
learning and statistical methods to tabular data. In this paper, we propose the
use of the kernel density integral transformation as a feature preprocessing
step. Our approach subsumes the two leading feature preprocessing methods as
limiting cases: linear min-max scaling and quantile transformation. We
demonstrate that, without hyperparameter tuning, the kernel density integral
transformation can be used as a simple drop-in replacement for either method,
offering protection from the weaknesses of each. Alternatively, with tuning of
a single continuous hyperparameter, we frequently outperform both of these
methods. Finally, we show that the kernel density transformation can be
profitably applied to statistical data analysis, particularly in correlation
analysis and univariate clustering.Comment: Published in Transactions on Machine Learning Research (10/2023
Bayesian classification and survival analysis with curve predictors
We propose classification models for binary and multicategory data where the
predictor is a random function. The functional predictor could be irregularly and
sparsely sampled or characterized by high dimension and sharp localized changes. In
the former case, we employ Bayesian modeling utilizing flexible spline basis which is
widely used for functional regression. In the latter case, we use Bayesian modeling
with wavelet basis functions which have nice approximation properties over a large
class of functional spaces and can accommodate varieties of functional forms observed
in real life applications. We develop an unified hierarchical model which accommodates
both the adaptive spline or wavelet based function estimation model as well as
the logistic classification model. These two models are coupled together to borrow
strengths from each other in this unified hierarchical framework. The use of Gibbs
sampling with conjugate priors for posterior inference makes the method computationally
feasible. We compare the performance of the proposed models with the naive
models as well as existing alternatives by analyzing simulated as well as real data. We
also propose a Bayesian unified hierarchical model based on a proportional hazards model and generalized linear model for survival analysis with irregular longitudinal
covariates. This relatively simple joint model has two advantages. One is that using
spline basis simplifies the parameterizations while a flexible non-linear pattern of
the function is captured. The other is that joint modeling framework allows sharing
of the information between the regression of functional predictors and proportional
hazards modeling of survival data to improve the efficiency of estimation. The novel
method can be used not only for one functional predictor case, but also for multiple
functional predictors case. Our methods are applied to analyze real data sets and
compared with a parameterized regression method
- …