25 research outputs found

    Real valued negative selection for anomaly detection in wireless ad hoc networks

    Get PDF
    Wireless ad hoc network is one of the network technologies that have gained lots of attention from computer scientists for the future telecommunication applications. However it has inherits the major vulnerabilities from its ancestor (i.e., the fixed wired networks) but cannot inherit all the conventional intrusion detection capabilities due to its features and characteristics. Wireless ad hoc network has the potential to become the de facto standard for future wireless networking because of its open medium and dynamic features. Non-infrastructure network such as wireless ad hoc networks are expected to become an important part of 4G architecture in the future. In this paper, we study the use of an Artificial Immune System (AIS) as anomaly detector in a wireless ad hoc network. The main goal of our research is to build a system that can learn and detect new and unknown attacks. To achieve our goal, we studied how the real-valued negative selection algorithm can be applied in wireless ad hoc network network and finally we proposed the enhancements to real-valued negative selection algorithm for anomaly detection in wireless ad hoc network

    Video Mining using LIM Based Clustering and Self Organizing Maps

    Get PDF
    AbstractVideo mining has grown as an energetic research area and given incremental concentration in recent years due to impressive and rapid raise in the volume of digital video databases. The aim of this research work is to find out new objects in videos. This work proposes a novel approach for video mining using LIM based clustering technique and self organizing maps to recognize novelty in the frames of video sequence. The proposed work is designed and implemented on MATLAB. It is tested with the sample videos and provides promising results. And it is suitable for day to day video mining applications and object detection systems including remote video surveillance in defense for national and international border tracking

    Novel Intrusion Detection using Probabilistic Neural Network and Adaptive Boosting

    Full text link
    This article applies Machine Learning techniques to solve Intrusion Detection problems within computer networks. Due to complex and dynamic nature of computer networks and hacking techniques, detecting malicious activities remains a challenging task for security experts, that is, currently available defense systems suffer from low detection capability and high number of false alarms. To overcome such performance limitations, we propose a novel Machine Learning algorithm, namely Boosted Subspace Probabilistic Neural Network (BSPNN), which integrates an adaptive boosting technique and a semi parametric neural network to obtain good tradeoff between accuracy and generality. As the result, learning bias and generalization variance can be significantly minimized. Substantial experiments on KDD 99 intrusion benchmark indicate that our model outperforms other state of the art learning algorithms, with significantly improved detection accuracy, minimal false alarms and relatively small computational complexity.Comment: 9 pages IEEE format, International Journal of Computer Science and Information Security, IJCSIS 2009, ISSN 1947 5500, Impact Factor 0.423, http://sites.google.com/site/ijcsis

    model checking for data anomaly detection

    Get PDF
    Abstract Data tipically evolve according to specific processes, with the consequent possibility to identify a profile of evolution: the values it may assume, the frequencies at which it changes, the temporal variation in relation to other data, or other constraints that are directly connected to the reference domain. A violation of these conditions could be the signal of different menaces that threat the system, as well as: attempts of a tampering or a cyber attack, a failure in the system operation, a bug in the applications which manage the life cycle of data. To detect such violations is not straightforward as processes could be unknown or hard to extract. In this paper we propose an approach to detect data anomalies. We represent data user behaviours in terms of labelled transition systems and through the model checking techniques we demonstrate the proposed modeling can be exploited to successfully detect data anomalies

    Asymptotic normality of plug-in level set estimates

    Full text link
    We establish the asymptotic normality of the GG-measure of the symmetric difference between the level set and a plug-in-type estimator of it formed by replacing the density in the definition of the level set by a kernel density estimator. Our proof will highlight the efficacy of Poissonization methods in the treatment of large sample theory problems of this kind.Comment: Published in at http://dx.doi.org/10.1214/08-AAP569 the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Hyperparameter selection of one-class support vector machine by self-adaptive data shifting

    Get PDF
    With flexible data description ability, one-class Support Vector Machine (OCSVM) is one of the most popular and widely-used methods for one-class classification (OCC). Nevertheless, the performance of OCSVM strongly relies on its hyperparameter selection, which is still a challenging open problem due to the absence of outlier data. This paper proposes a fully automatic OCSVM hyperparameter selection method, which requires no tuning of additional hyperparameter, based on a novel self-adaptive β€œdata shifting” mechanism: Firstly, by efficient edge pattern detection (EPD) and β€œnegatively” shifting edge patterns along the negative direction of estimated data density gradient, a constrained number of high-quality pseudo outliers are self-adaptively generated at more desirable locations, which readily avoids two major difficulties in previous outlier generation methods. Secondly, to avoid time-consuming cross-validation and enhance robustness to noise in the given training data, a pseudo target set is generated for model validation by β€œpositively” shifting each given target datum along the positive direction of data density gradient. Experiments on synthetic and benchmark datasets demonstrate the effectiveness of the proposed method.This work was sponsored by the National Natural Science Foundation of China (Project no. 61170287, 61232016)

    A Comprehensive Survey of Data Mining-based Fraud Detection Research

    Full text link
    This survey paper categorises, compares, and summarises from almost all published technical and review articles in automated fraud detection within the last 10 years. It defines the professional fraudster, formalises the main types and subtypes of known fraud, and presents the nature of data evidence collected within affected industries. Within the business context of mining the data to achieve higher cost savings, this research presents methods and techniques together with their problems. Compared to all related reviews on fraud detection, this survey covers much more technical articles and is the only one, to the best of our knowledge, which proposes alternative data and solutions from related domains.Comment: 14 page

    Detecting Errors in Korean Corpus based on GMM

    Get PDF
    In computational linguistics, a corpus is a large and structured set of language samples collected from real world text for a specific purpose. There are be various types of errors in the corpus because most corpus are built manually and/or semi-automatically and the errors are caused by human intervention. Such errors make corpus-based learning systems worse in performance. Many studies have therefore been conducted to detect and correct such errors in various ways and most studies have been done from pre-built corpus. Human intervention is, however, still required. In addition, error correction is not only very tedious as well as laborious and cost-expensive. In this paper, we propose a method for detecting corpus errors using GMM clustering algorithm. The purpose of this paper to detect errors under the small size of corpus. That is, the proposed method can be used in developing corpus by integrating into annotation tools. The proposed method consists of three steps. The first step is to make word embedding vectors of some error-prone context. The second step is to reduce the dimension of the vectors because clustering with a large dimension of vectors is time-consuming. The third step is to group the reduced vectors and to detect outliers as errors. For experiments, we have used two kinds of corpora: Korean dependency corpus and Korean semantic role labelling (SRL) corpus of which each one comprises only 1000 sentences. Our results show that the proposed method can serve as a error detector in early stage of corpus development. Our best results achieve recall of 65.15% for Korean dependency corpus and recall of 69.46% for Korean SRL corpus. In the future, we will do research on representing features for detecting errors and also on correcting errors as well as detecting errors. Motivated by the proposed method, we will start to investigate error detection in case that there is a large tagged corpus.|λ§λ­‰μΉ˜λž€ νŠΉμ • λͺ©μ μ„ 가지고 μ–Έμ–΄ ν‘œλ³Έμ„ μΆ”μΆœν•œ 집합을 μ˜λ―Έν•œλ‹€. 이런 λ§λ­‰μΉ˜μ—λŠ” λͺ©μ μ— 따라 λ‹€μ–‘ν•œ μ’…λ₯˜κ°€ μžˆλ‹€. λŒ€λΆ€λΆ„μ˜ λ§λ­‰μΉ˜λŠ” μ‚¬λžŒμ˜ μˆ˜μž‘μ—…μœΌλ‘œ κ΅¬μΆ•λ˜κΈ° λ•Œλ¬Έμ— λ‹€μ–‘ν•œ 였λ₯˜λ“€μ΄ ν¬ν•¨λ˜μ–΄ 있으며, 였λ₯˜κ°€ ν¬ν•¨λœ λ§λ­‰μΉ˜λ₯Ό μ‚¬μš©ν•˜λŠ” μ‹œμŠ€ν…œμ€ 쒋은 μ„±λŠ₯을 κΈ°λŒ€ν•  수 μ—†λ‹€. μ΄λŸ¬ν•œ λ¬Έμ œμ μ„ ν•΄κ²°ν•˜κΈ° μœ„ν•΄ λ‹€μ–‘ν•œ λ°©λ²•μœΌλ‘œ 였λ₯˜λ₯Ό νƒμ§€ν•˜κ³  μˆ˜μ •ν•˜λŠ” 연ꡬ가 μ§„ν–‰λ˜μ—ˆλ‹€. ν•˜μ§€λ§Œ λŒ€λΆ€λΆ„μ˜ 방법듀이 이미 μ œμž‘λœ λ§λ­‰μΉ˜λ₯Ό ν•™μŠ΅ν•˜μ—¬ 였λ₯˜λ₯Ό νƒμ§€ν•˜κ³  μˆ˜μ •ν•œλ‹€. μ΄λŸ¬ν•œ μž‘μ—…μ„ μ—¬λŸ¬ 번 μˆ˜ν–‰ν•˜μ—¬μ•Ό ν•˜λ©° λ§Žμ€ λΉ„μš©μ΄ μ†Œμš”λœλ‹€. 이 문제λ₯Ό λ‹€μ†Œ μ™„ν™”μ‹œν‚€κΈ° μœ„ν•΄ λ³Έ λ…Όλ¬Έμ—μ„œλŠ” GMM(Gaussian Mixture Model)을 μ΄μš©ν•œ ꡰ집화λ₯Ό 톡해 였λ₯˜ 탐지 방법은 μ œμ•ˆν•œλ‹€. κ΅°μ§‘ν™”λŠ” λΉ„μ§€λ„ν•™μŠ΅μ˜ ν•œ λ°©λ²•μœΌλ‘œ ν‘œμ§€κ°€ λΆ€μ°©λœ ν•™μŠ΅λ°μ΄ν„°κ°€ μ—†κ±°λ‚˜ 적더라도 였λ₯˜ 탐지λ₯Ό μˆ˜ν–‰ν•  수 μžˆλ‹€. λ”°λΌμ„œ 이미 μ œμž‘λœ λ§λ­‰μΉ˜κ°€ μ•„λ‹ˆλΌ λ§λ­‰μΉ˜λ₯Ό κ΅¬μΆ•ν•˜λŠ” 과정에도 μ‚¬μš©ν•  수 μžˆλ‹€. λ³Έ λ…Όλ¬Έμ—μ„œ μ œμ•ˆν•˜λŠ” λ°©λ²•μœΌλ‘œ μˆ˜ν–‰λœ 였λ₯˜ 탐지λ₯Ό κ²€μ¦ν•˜κΈ° μœ„ν•˜μ—¬ ꡬ문뢄석 λ§λ­‰μΉ˜μ™€ μ˜λ―Έμ—­ λ§λ­‰μΉ˜λ₯Ό μ‚¬μš©ν•˜μ˜€λ‹€. μ„±λŠ₯ ν‰κ°€μ˜ μ²™λ„λŠ” μ •λ³΄κ²€μƒ‰μ—μ„œ 널리 μ‚¬μš©λ˜λŠ” 정밀도와 μž¬ν˜„μœ¨μ„ μ‚¬μš©ν•˜μ˜€λ‹€. ꡬ문뢄석 λ§λ­‰μΉ˜μ™€ μ˜λ―Έμ—­ λ§λ­‰μΉ˜μ—μ„œ 각각 65.15%와 69.46%의 μž¬ν˜„μœ¨μ„ λ³΄μ˜€λ‹€. 이와 같은 κ²°κ³Όλ₯Ό λ°”νƒ•μœΌλ‘œ μ œμ•ˆν•œ λͺ¨λΈμ„ μ‚¬μš©ν•˜μ—¬ λ‹€μ–‘ν•œ λ§λ­‰μΉ˜μ˜ 였λ₯˜ 탐지λ₯Ό μˆ˜ν–‰ν•  수 μžˆμŒμ„ μ•Œ 수 μžˆλ‹€. μž¬ν˜„μœ¨μ„ μ’€ 더 ν–₯μƒμ‹œν‚¬ 수 μžˆλ„λ‘ 자질 ν™•μž₯ λ“±μ˜ 연ꡬλ₯Ό 진행할 수 μžˆμ„ 것이닀. λ˜ν•œ λ§λ­‰μΉ˜ ꡬ좕 도ꡬ에 직접 μ μš©ν•˜μ—¬ μ œμ•ˆλœ μ‹œμŠ€ν…œμ΄ μ–Όλ§ˆλ‚˜ νš¨μœ¨μ μΈμ§€λ„ 평가할 κ³„νšμ΄λ‹€.제 1 μž₯ μ„œ λ‘  1 제 2 μž₯ κ΄€λ ¨ 연ꡬ 3 2.1 였λ₯˜ 탐지 3 2.2 GMM μ•Œκ³ λ¦¬μ¦˜ 6 2.3 차원 μΆ•μ†Œ 10 2.4 ν•œκ΅­μ–΄ ꡬ문뢄석 λ§λ­‰μΉ˜ 11 2.5 ν•œκ΅­μ–΄ μ˜λ―Έμ—­ λ§λ­‰μΉ˜ 13 제 3 μž₯ 였λ₯˜ 후보 탐지 μ‹œμŠ€ν…œ 15 3.1 λ¬Έλ§₯ ν‘œμƒ 16 3.1.1 ꡬ문뢄석 λ§λ­‰μΉ˜μ—μ„œμ˜ λ¬Έλ§₯ ν‘œμƒ 16 3.1.2 μ˜λ―Έμ—­ λ§λ­‰μΉ˜μ—μ„œμ˜ λ¬Έλ§₯ ν‘œμƒ 17 3.2 λ¬Έλ§₯ ν‘œμƒμ˜ 차원 μΆ•μ†Œ 19 3.3 GMM을 μ΄μš©ν•œ λ§λ­‰μΉ˜μ—μ„œμ˜ 였λ₯˜ 탐지 20 제 4 μž₯ μ‹€ν—˜ 및 평가 24 4.1 μ‹€ν—˜ 데이터 24 4.2 μ‹€ν—˜ κ²°κ³Ό 26 제 5 μž₯ κ²°λ‘  및 ν–₯ν›„ 연ꡬ 30 μ°Έκ³ λ¬Έν—Œ 32 κ°μ‚¬μ˜ κΈ€ 38Maste
    corecore