56,729 research outputs found
A traffic classification method using machine learning algorithm
Applying concepts of attack investigation in IT industry, this idea has been developed to design
a Traffic Classification Method using Data Mining techniques at the intersection of Machine
Learning Algorithm, Which will classify the normal and malicious traffic. This classification will
help to learn about the unknown attacks faced by IT industry. The notion of traffic classification
is not a new concept; plenty of work has been done to classify the network traffic for
heterogeneous application nowadays. Existing techniques such as (payload based, port based
and statistical based) have their own pros and cons which will be discussed in this
literature later, but classification using Machine Learning techniques is still an open field to explore and has provided very promising results up till now
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
Population Synthesis via k-Nearest Neighbor Crossover Kernel
The recent development of multi-agent simulations brings about a need for
population synthesis. It is a task of reconstructing the entire population from
a sampling survey of limited size (1% or so), supplying the initial conditions
from which simulations begin. This paper presents a new kernel density
estimator for this task. Our method is an analogue of the classical
Breiman-Meisel-Purcell estimator, but employs novel techniques that harness the
huge degree of freedom which is required to model high-dimensional nonlinearly
correlated datasets: the crossover kernel, the k-nearest neighbor restriction
of the kernel construction set and the bagging of kernels. The performance as a
statistical estimator is examined through real and synthetic datasets. We
provide an "optimization-free" parameter selection rule for our method, a
theory of how our method works and a computational cost analysis. To
demonstrate the usefulness as a population synthesizer, our method is applied
to a household synthesis task for an urban micro-simulator.Comment: 10 pages, 4 figures, IEEE International Conference on Data Mining
(ICDM) 201
From Review to Rating: Exploring Dependency Measures for Text Classification
Various text analysis techniques exist, which attempt to uncover unstructured
information from text. In this work, we explore using statistical dependence
measures for textual classification, representing text as word vectors. Student
satisfaction scores on a 3-point scale and their free text comments written
about university subjects are used as the dataset. We have compared two textual
representations: a frequency word representation and term frequency
relationship to word vectors, and found that word vectors provide a greater
accuracy. However, these word vectors have a large number of features which
aggravates the burden of computational complexity. Thus, we explored using a
non-linear dependency measure for feature selection by maximizing the
dependence between the text reviews and corresponding scores. Our quantitative
and qualitative analysis on a student satisfaction dataset shows that our
approach achieves comparable accuracy to the full feature vector, while being
an order of magnitude faster in testing. These text analysis and feature
reduction techniques can be used for other textual data applications such as
sentiment analysis.Comment: 8 page
- …