17,600 research outputs found
A Survey of Methods for Encrypted Traffic Classification and Analysis
With the widespread use of encrypted data transport network traffic encryption is becoming a standard nowadays. This presents a challenge for traffic measurement, especially for analysis and anomaly detection methods which are dependent on the type of network traffic. In this paper, we survey existing approaches for classification and analysis of encrypted traffic. First, we describe the most widespread encryption protocols used throughout the Internet. We show that the initiation of an encrypted connection and the protocol structure give away a lot of information for encrypted traffic classification and analysis. Then, we survey payload and feature-based classification methods for encrypted traffic and categorize them using an established taxonomy. The advantage of some of described classification methods is the ability to recognize the encrypted application protocol in addition to the encryption protocol. Finally, we make a comprehensive comparison of the surveyed feature-based classification methods and present their weaknesses and strengths.Ĺ ifrovánĂ sĂĹĄovĂ©ho provozu se v dnešnĂ dobÄ› stalo standardem. To pĹ™inášà vysokĂ© nároky na monitorovánĂ sĂĹĄovĂ©ho provozu, zejmĂ©na pak na analĂ˝zu provozu a detekci anomáliĂ, kterĂ© jsou závislĂ© na znalosti typu sĂĹĄovĂ©ho provozu. V tomto ÄŤlánku pĹ™inášĂme pĹ™ehled existujĂcĂch zpĹŻsobĹŻ klasifikace a analĂ˝zy šifrovanĂ©ho provozu. Nejprve popisujeme nejrozšĂĹ™enÄ›jšà šifrovacĂ protokoly, a ukazujeme, jakĂ˝m zpĹŻsobem lze zĂskat informace pro analĂ˝zu a klasifikaci šifrovanĂ©ho provozu. NáslednÄ› se zabĂ˝váme klasifikaÄŤnĂmi metodami zaloĹľenĂ˝mi na obsahu paketĹŻ a vlastnostech sĂĹĄovĂ©ho provozu. Tyto metody klasifikujeme pomocĂ zavedenĂ© taxonomie. VĂ˝hodou nÄ›kterĂ˝ch popsanĂ˝ch klasifikaÄŤnĂch metod je schopnost rozeznat nejen šifrovacĂ protokol, ale takĂ© šifrovanĂ˝ aplikaÄŤnĂ protokol. Na závÄ›r porovnáváme silnĂ© a slabĂ© stránky všech popsanĂ˝ch klasifikaÄŤnĂch metod
A comprehensive literature classification of simulation optimisation methods
Simulation Optimization (SO) provides a structured approach to the system design and configuration when analytical expressions for input/output relationships are unavailable. Several excellent surveys have been written on this topic. Each survey concentrates on only few classification criteria. This paper presents a literature survey with all classification criteria on techniques for SO according to the problem of characteristics such as shape of the response surface (global as compared to local optimization), objective functions (single or multiple objectives) and parameter spaces (discrete or continuous parameters). The survey focuses specifically on the SO problem that involves single per-formance measureSimulation Optimization, classification methods, literature survey
A Bayesian approach to star-galaxy classification
Star-galaxy classification is one of the most fundamental data-processing
tasks in survey astronomy, and a critical starting point for the scientific
exploitation of survey data. For bright sources this classification can be done
with almost complete reliability, but for the numerous sources close to a
survey's detection limit each image encodes only limited morphological
information. In this regime, from which many of the new scientific discoveries
are likely to come, it is vital to utilise all the available information about
a source, both from multiple measurements and also prior knowledge about the
star and galaxy populations. It is also more useful and realistic to provide
classification probabilities than decisive classifications. All these
desiderata can be met by adopting a Bayesian approach to star-galaxy
classification, and we develop a very general formalism for doing so. An
immediate implication of applying Bayes's theorem to this problem is that it is
formally impossible to combine morphological measurements in different bands
without using colour information as well; however we develop several
approximations that disregard colour information as much as possible. The
resultant scheme is applied to data from the UKIRT Infrared Deep Sky Survey
(UKIDSS), and tested by comparing the results to deep Sloan Digital Sky Survey
(SDSS) Stripe 82 measurements of the same sources. The Bayesian classification
probabilities obtained from the UKIDSS data agree well with the deep SDSS
classifications both overall (a mismatch rate of 0.022, compared to 0.044 for
the UKIDSS pipeline classifier) and close to the UKIDSS detection limit (a
mismatch rate of 0.068 compared to 0.075 for the UKIDSS pipeline classifier).
The Bayesian formalism developed here can be applied to improve the reliability
of any star-galaxy classification schemes based on the measured values of
morphology statistics alone.Comment: Accepted 22 November 2010, 19 pages, 17 figure
Automated Protein Structure Classification: A Survey
Classification of proteins based on their structure provides a valuable
resource for studying protein structure, function and evolutionary
relationships. With the rapidly increasing number of known protein structures,
manual and semi-automatic classification is becoming ever more difficult and
prohibitively slow. Therefore, there is a growing need for automated, accurate
and efficient classification methods to generate classification databases or
increase the speed and accuracy of semi-automatic techniques. Recognizing this
need, several automated classification methods have been developed. In this
survey, we overview recent developments in this area. We classify different
methods based on their characteristics and compare their methodology, accuracy
and efficiency. We then present a few open problems and explain future
directions.Comment: 14 pages, Technical Report CSRG-589, University of Toront
An intelligent assistant for exploratory data analysis
In this paper we present an account of the main features of SNOUT, an intelligent assistant for exploratory data analysis (EDA) of social science survey data that incorporates a range of data mining techniques. EDA has much in common with existing data mining techniques: its main objective is to help an investigator reach an understanding of the important relationships ina data set rather than simply develop predictive models for selectd variables. Brief descriptions of a number of novel techniques developed for use in SNOUT are presented. These include heuristic variable level inference and classification, automatic category formation, the use of similarity trees to identify groups of related variables, interactive decision tree construction and model selection using a genetic algorithm
Using machine learning techniques to automate sky survey catalog generation
We describe the application of machine classification techniques to the development of an automated tool for the reduction of a large scientific data set. The 2nd Palomar Observatory Sky Survey provides comprehensive photographic coverage of the northern celestial hemisphere. The photographic plates are being digitized into images containing on the order of 10(exp 7) galaxies and 10(exp 8) stars. Since the size of this data set precludes manual analysis and classification of objects, our approach is to develop a software system which integrates independently developed techniques for image processing and data classification. Image processing routines are applied to identify and measure features of sky objects. Selected features are used to determine the classification of each object. GID3* and O-BTree, two inductive learning techniques, are used to automatically learn classification decision trees from examples. We describe the techniques used, the details of our specific application, and the initial encouraging results which indicate that our approach is well-suited to the problem. The benefits of the approach are increased data reduction throughput, consistency of classification, and the automated derivation of classification rules that will form an objective, examinable basis for classifying sky objects. Furthermore, astronomers will be freed from the tedium of an intensely visual task to pursue more challenging analysis and interpretation problems given automatically cataloged data
A Survey on Compiler Autotuning using Machine Learning
Since the mid-1990s, researchers have been trying to use machine-learning
based approaches to solve a number of different compiler optimization problems.
These techniques primarily enhance the quality of the obtained results and,
more importantly, make it feasible to tackle two main compiler optimization
problems: optimization selection (choosing which optimizations to apply) and
phase-ordering (choosing the order of applying optimizations). The compiler
optimization space continues to grow due to the advancement of applications,
increasing number of compiler optimizations, and new target architectures.
Generic optimization passes in compilers cannot fully leverage newly introduced
optimizations and, therefore, cannot keep up with the pace of increasing
options. This survey summarizes and classifies the recent advances in using
machine learning for the compiler optimization field, particularly on the two
major problems of (1) selecting the best optimizations and (2) the
phase-ordering of optimizations. The survey highlights the approaches taken so
far, the obtained results, the fine-grain classification among different
approaches and finally, the influential papers of the field.Comment: version 5.0 (updated on September 2018)- Preprint Version For our
Accepted Journal @ ACM CSUR 2018 (42 pages) - This survey will be updated
quarterly here (Send me your new published papers to be added in the
subsequent version) History: Received November 2016; Revised August 2017;
Revised February 2018; Accepted March 2018
- …