Search CORE

56,729 research outputs found

A traffic classification method using machine learning algorithm

Author: Chishti Hamayoun Rauf
Publication venue: University of Bedfordshire
Publication date: 01/01/2013
Field of study

Applying concepts of attack investigation in IT industry, this idea has been developed to design a Traffic Classification Method using Data Mining techniques at the intersection of Machine Learning Algorithm, Which will classify the normal and malicious traffic. This classification will help to learn about the unknown attacks faced by IT industry. The notion of traffic classification is not a new concept; plenty of work has been done to classify the network traffic for heterogeneous application nowadays. Existing techniques such as (payload based, port based and statistical based) have their own pros and cons which will be discussed in this literature later, but classification using Machine Learning techniques is still an open field to explore and has provided very promising results up till now

University of Bedfordshire Repository

Machine Learning and Integrative Analysis of Biomedical Big Data.

Author: Choi Howard
Chung Neo Christopher
Mirza Bilal
Ping Peipei
Wang Jie
Wang Wei
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

Multidisciplinary Digital Publishing Institute

Ezid

Directory of Open Access Journals

eScholarship - University of California

Population Synthesis via k-Nearest Neighbor Crossover Kernel

Author: Hamada Naoki
Higuchi Hiroyuki
Homma Katsumi
Kikuchi Hideyuki
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 26/08/2015
Field of study

The recent development of multi-agent simulations brings about a need for population synthesis. It is a task of reconstructing the entire population from a sampling survey of limited size (1% or so), supplying the initial conditions from which simulations begin. This paper presents a new kernel density estimator for this task. Our method is an analogue of the classical Breiman-Meisel-Purcell estimator, but employs novel techniques that harness the huge degree of freedom which is required to model high-dimensional nonlinearly correlated datasets: the crossover kernel, the k-nearest neighbor restriction of the kernel construction set and the bagging of kernels. The performance as a statistical estimator is examined through real and synthetic datasets. We provide an "optimization-free" parameter selection rule for our method, a theory of how our method works and a computational cost analysis. To demonstrate the usefulness as a population synthesizer, our method is applied to a household synthesis task for an urban micro-simulator.Comment: 10 pages, 4 figures, IEEE International Conference on Data Mining (ICDM) 201

arXiv.org e-Print Archive

Crossref

From Review to Rating: Exploring Dependency Measures for Text Classification

Author: Baktashmotlagh Mahsa
Boles Wageeh
Cunningham-Nelson Samuel
Publication venue
Publication date: 01/01/2017
Field of study

Various text analysis techniques exist, which attempt to uncover unstructured information from text. In this work, we explore using statistical dependence measures for textual classification, representing text as word vectors. Student satisfaction scores on a 3-point scale and their free text comments written about university subjects are used as the dataset. We have compared two textual representations: a frequency word representation and term frequency relationship to word vectors, and found that word vectors provide a greater accuracy. However, these word vectors have a large number of features which aggravates the burden of computational complexity. Thus, we explored using a non-linear dependency measure for feature selection by maximizing the dependence between the text reviews and corresponding scores. Our quantitative and qualitative analysis on a student satisfaction dataset shows that our approach achieves comparable accuracy to the full feature vector, while being an order of magnitude faster in testing. These text analysis and feature reduction techniques can be used for other textual data applications such as sentiment analysis.Comment: 8 page

arXiv.org e-Print Archive

Queensland University of Technology ePrints Archive