25 research outputs found
Real valued negative selection for anomaly detection in wireless ad hoc networks
Wireless ad hoc network is one of the network technologies that have gained lots of attention from computer scientists for the future telecommunication applications. However it has inherits the major vulnerabilities from its ancestor (i.e., the fixed wired networks) but cannot inherit all the conventional intrusion detection capabilities due to its features and characteristics. Wireless ad hoc network has the potential to become the de facto standard for future wireless networking because of its open medium and dynamic features. Non-infrastructure network such as wireless ad hoc networks are expected to become an important part of 4G architecture in the future. In this paper, we study the use of an Artificial Immune System (AIS) as anomaly detector in a wireless ad hoc network. The main goal of our research is to build a system that can learn and detect new and unknown attacks. To achieve our goal, we studied how the real-valued negative selection algorithm can be applied in wireless ad hoc network network and finally we proposed the enhancements to real-valued negative selection algorithm for anomaly detection in wireless ad hoc network
Video Mining using LIM Based Clustering and Self Organizing Maps
AbstractVideo mining has grown as an energetic research area and given incremental concentration in recent years due to impressive and rapid raise in the volume of digital video databases. The aim of this research work is to find out new objects in videos. This work proposes a novel approach for video mining using LIM based clustering technique and self organizing maps to recognize novelty in the frames of video sequence. The proposed work is designed and implemented on MATLAB. It is tested with the sample videos and provides promising results. And it is suitable for day to day video mining applications and object detection systems including remote video surveillance in defense for national and international border tracking
Novel Intrusion Detection using Probabilistic Neural Network and Adaptive Boosting
This article applies Machine Learning techniques to solve Intrusion Detection
problems within computer networks. Due to complex and dynamic nature of
computer networks and hacking techniques, detecting malicious activities
remains a challenging task for security experts, that is, currently available
defense systems suffer from low detection capability and high number of false
alarms. To overcome such performance limitations, we propose a novel Machine
Learning algorithm, namely Boosted Subspace Probabilistic Neural Network
(BSPNN), which integrates an adaptive boosting technique and a semi parametric
neural network to obtain good tradeoff between accuracy and generality. As the
result, learning bias and generalization variance can be significantly
minimized. Substantial experiments on KDD 99 intrusion benchmark indicate that
our model outperforms other state of the art learning algorithms, with
significantly improved detection accuracy, minimal false alarms and relatively
small computational complexity.Comment: 9 pages IEEE format, International Journal of Computer Science and
Information Security, IJCSIS 2009, ISSN 1947 5500, Impact Factor 0.423,
http://sites.google.com/site/ijcsis
model checking for data anomaly detection
Abstract Data tipically evolve according to specific processes, with the consequent possibility to identify a profile of evolution: the values it may assume, the frequencies at which it changes, the temporal variation in relation to other data, or other constraints that are directly connected to the reference domain. A violation of these conditions could be the signal of different menaces that threat the system, as well as: attempts of a tampering or a cyber attack, a failure in the system operation, a bug in the applications which manage the life cycle of data. To detect such violations is not straightforward as processes could be unknown or hard to extract. In this paper we propose an approach to detect data anomalies. We represent data user behaviours in terms of labelled transition systems and through the model checking techniques we demonstrate the proposed modeling can be exploited to successfully detect data anomalies
Asymptotic normality of plug-in level set estimates
We establish the asymptotic normality of the -measure of the symmetric
difference between the level set and a plug-in-type estimator of it formed by
replacing the density in the definition of the level set by a kernel density
estimator. Our proof will highlight the efficacy of Poissonization methods in
the treatment of large sample theory problems of this kind.Comment: Published in at http://dx.doi.org/10.1214/08-AAP569 the Annals of
Applied Probability (http://www.imstat.org/aap/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Hyperparameter selection of one-class support vector machine by self-adaptive data shifting
With flexible data description ability, one-class Support Vector Machine (OCSVM) is one of the most popular and widely-used methods for one-class classification (OCC). Nevertheless, the performance of OCSVM strongly relies on its hyperparameter selection, which is still a challenging open problem due to the absence of outlier data. This paper proposes a fully automatic OCSVM hyperparameter selection method, which requires no tuning of additional hyperparameter, based on a novel self-adaptive βdata shiftingβ mechanism: Firstly, by efficient edge pattern detection (EPD) and βnegativelyβ shifting edge patterns along the negative direction of estimated data density gradient, a constrained number of high-quality pseudo outliers are self-adaptively generated at more desirable locations, which readily avoids two major difficulties in previous outlier generation methods. Secondly, to avoid time-consuming cross-validation and enhance robustness to noise in the given training data, a pseudo target set is generated for model validation by βpositivelyβ shifting each given target datum along the positive direction of data density gradient. Experiments on synthetic and benchmark datasets demonstrate the effectiveness of the proposed method.This work was sponsored by the National Natural Science Foundation
of China (Project no. 61170287, 61232016)
A Comprehensive Survey of Data Mining-based Fraud Detection Research
This survey paper categorises, compares, and summarises from almost all
published technical and review articles in automated fraud detection within the
last 10 years. It defines the professional fraudster, formalises the main types
and subtypes of known fraud, and presents the nature of data evidence collected
within affected industries. Within the business context of mining the data to
achieve higher cost savings, this research presents methods and techniques
together with their problems. Compared to all related reviews on fraud
detection, this survey covers much more technical articles and is the only one,
to the best of our knowledge, which proposes alternative data and solutions
from related domains.Comment: 14 page
Detecting Errors in Korean Corpus based on GMM
In computational linguistics, a corpus is a large and structured set of language samples collected from real world text for a specific purpose. There are be various types of errors in the corpus because most corpus are built manually and/or semi-automatically and the errors are caused by human intervention. Such errors make corpus-based learning systems worse in performance. Many studies have therefore been conducted to detect and correct such errors in various ways and most studies have been done from pre-built corpus. Human intervention is, however, still required. In addition, error correction is not only very tedious as well as laborious and cost-expensive.
In this paper, we propose a method for detecting corpus errors using GMM clustering algorithm. The purpose of this paper to detect errors under the small size of corpus. That is, the proposed method can be used in developing corpus by integrating into annotation tools. The proposed method consists of three steps. The first step is to make word embedding vectors of some error-prone context. The second step is to reduce the dimension of the vectors because clustering with a large dimension of vectors is time-consuming. The third step is to group the reduced vectors and to detect outliers as errors.
For experiments, we have used two kinds of corpora: Korean dependency corpus and Korean semantic role labelling (SRL) corpus of which each one comprises only 1000 sentences. Our results show that the proposed method can serve as a error detector in early stage of corpus development. Our best results achieve recall of 65.15% for Korean dependency corpus and recall of 69.46% for Korean SRL corpus.
In the future, we will do research on representing features for detecting errors and also on correcting errors as well as detecting errors. Motivated by the proposed method, we will start to investigate error detection in case that there is a large tagged corpus.|λ§λμΉλ νΉμ λͺ©μ μ κ°μ§κ³ μΈμ΄ νλ³Έμ μΆμΆν μ§ν©μ μλ―Ένλ€. μ΄λ° λ§λμΉμλ λͺ©μ μ λ°λΌ λ€μν μ’
λ₯κ° μλ€. λλΆλΆμ λ§λμΉλ μ¬λμ μμμ
μΌλ‘ ꡬμΆλκΈ° λλ¬Έμ λ€μν μ€λ₯λ€μ΄ ν¬ν¨λμ΄ μμΌλ©°, μ€λ₯κ° ν¬ν¨λ λ§λμΉλ₯Ό μ¬μ©νλ μμ€ν
μ μ’μ μ±λ₯μ κΈ°λν μ μλ€. μ΄λ¬ν λ¬Έμ μ μ ν΄κ²°νκΈ° μν΄ λ€μν λ°©λ²μΌλ‘ μ€λ₯λ₯Ό νμ§νκ³ μμ νλ μ°κ΅¬κ° μ§νλμλ€. νμ§λ§ λλΆλΆμ λ°©λ²λ€μ΄ μ΄λ―Έ μ μλ λ§λμΉλ₯Ό νμ΅νμ¬ μ€λ₯λ₯Ό νμ§νκ³ μμ νλ€. μ΄λ¬ν μμ
μ μ¬λ¬ λ² μννμ¬μΌ νλ©° λ§μ λΉμ©μ΄ μμλλ€. μ΄ λ¬Έμ λ₯Ό λ€μ μνμν€κΈ° μν΄ λ³Έ λ
Όλ¬Έμμλ GMM(Gaussian Mixture Model)μ μ΄μ©ν κ΅°μ§νλ₯Ό ν΅ν΄ μ€λ₯ νμ§ λ°©λ²μ μ μνλ€. κ΅°μ§νλ λΉμ§λνμ΅μ ν λ°©λ²μΌλ‘ νμ§κ° λΆμ°©λ νμ΅λ°μ΄ν°κ° μκ±°λ μ λλΌλ μ€λ₯ νμ§λ₯Ό μνν μ μλ€. λ°λΌμ μ΄λ―Έ μ μλ λ§λμΉκ° μλλΌ λ§λμΉλ₯Ό ꡬμΆνλ κ³Όμ μλ μ¬μ©ν μ μλ€.
λ³Έ λ
Όλ¬Έμμ μ μνλ λ°©λ²μΌλ‘ μνλ μ€λ₯ νμ§λ₯Ό κ²μ¦νκΈ° μνμ¬ κ΅¬λ¬ΈλΆμ λ§λμΉμ μλ―Έμ λ§λμΉλ₯Ό μ¬μ©νμλ€. μ±λ₯ νκ°μ μ²λλ μ 보κ²μμμ λ리 μ¬μ©λλ μ λ°λμ μ¬νμ¨μ μ¬μ©νμλ€. ꡬ문λΆμ λ§λμΉμ μλ―Έμ λ§λμΉμμ κ°κ° 65.15%μ 69.46%μ μ¬νμ¨μ 보μλ€. μ΄μ κ°μ κ²°κ³Όλ₯Ό λ°νμΌλ‘ μ μν λͺ¨λΈμ μ¬μ©νμ¬ λ€μν λ§λμΉμ μ€λ₯ νμ§λ₯Ό μνν μ μμμ μ μ μλ€.
μ¬νμ¨μ μ’ λ ν₯μμν¬ μ μλλ‘ μμ§ νμ₯ λ±μ μ°κ΅¬λ₯Ό μ§νν μ μμ κ²μ΄λ€. λν λ§λμΉ κ΅¬μΆ λꡬμ μ§μ μ μ©νμ¬ μ μλ μμ€ν
μ΄ μΌλ§λ ν¨μ¨μ μΈμ§λ νκ°ν κ³νμ΄λ€.μ 1 μ₯ μ λ‘ 1
μ 2 μ₯ κ΄λ ¨ μ°κ΅¬ 3
2.1 μ€λ₯ νμ§ 3
2.2 GMM μκ³ λ¦¬μ¦ 6
2.3 μ°¨μ μΆμ 10
2.4 νκ΅μ΄ ꡬ문λΆμ λ§λμΉ 11
2.5 νκ΅μ΄ μλ―Έμ λ§λμΉ 13
μ 3 μ₯ μ€λ₯ ν보 νμ§ μμ€ν
15
3.1 λ¬Έλ§₯ νμ 16
3.1.1 ꡬ문λΆμ λ§λμΉμμμ λ¬Έλ§₯ νμ 16
3.1.2 μλ―Έμ λ§λμΉμμμ λ¬Έλ§₯ νμ 17
3.2 λ¬Έλ§₯ νμμ μ°¨μ μΆμ 19
3.3 GMMμ μ΄μ©ν λ§λμΉμμμ μ€λ₯ νμ§ 20
μ 4 μ₯ μ€ν λ° νκ° 24
4.1 μ€ν λ°μ΄ν° 24
4.2 μ€ν κ²°κ³Ό 26
μ 5 μ₯ κ²°λ‘ λ° ν₯ν μ°κ΅¬ 30
μ°Έκ³ λ¬Έν 32
κ°μ¬μ κΈ 38Maste