195,608 research outputs found

    An Adaptive Firefly Optimization (AFO) with Multi-Kernel SVM (MKSVM) Classification for Big Data Dimensionality Reduction

    Get PDF
    The data's dimensionality had already risen sharply in the last several decades. The "Dimensionality Curse" (DC) is a problem for conventional learning techniques when dealing with "Big Data (BD)" with a higher level of dimensionality. A learning model's performance degrades when there is a numerous range of features present. "Dimensionality Reduction (DR)" approaches are used to solve the DC issue, and the field of "Machine Learning (ML)" research is significant in this regard. It is a prominent procedure to use "Feature Selection (FS)" to reduce dimensions. Improved learning effectiveness such as greater classification precision, cheaper processing costs, and improved model comprehensibility are all typical outcomes of this approach that selects an optimal portion of the original features based on some relevant assessment criteria. An "Adaptive Firefly Optimization (AFO)" technique based on the "Map Reduce (MR)" platform is developed in this research. During the initial phase (mapping stage) the whole large "DataSet (DS)" is first subdivided into blocks of contexts. The AFO technique is then used to choose features from its large DS. In the final phase (reduction stage), every one of the fragmentary findings is combined into a single feature vector. Then the "Multi Kernel Support Vector Machine (MKSVM)" classifier is used as classification in this research to classify the data for appropriate class from the optimal features obtained from AFO for DR purposes. We found that the suggested algorithm AFO combined with MKSVM (AFO-MKSVM) scales very well to high-dimensional DSs which outperforms the existing approach "Linear Discriminant Analysis-Support Vector Machine (LDA-SVM)" in terms of performance. The evaluation metrics such as Information-Ratio for Dimension-Reduction, Accuracy, and Recall, indicate that the AFO-MKSVM method established a better outcome than the LDA-SVM method

    A DDoS Attack Detection using PCA Dimensionality Reduction and Support Vector Machine

    Get PDF
    Distributed denial-of-service attack (DDoS) is one of the most frequently occurring network attacks. Because of rapid growth in the communication and computer technology, the DDoS attacks became severe. So, it is essential to research the detection of a DDoS attack. There are different modes of DDoS attacks because of which a single method cannot provide good security. To overcome this, a DDoS attack detection technique is presented in this paper using machine learning algorithm. The proposed method has two phases, dimensionality reduction and model training for attack detection. The first phase identifies important components from the large proportion of the internet data. These extracted components are used as machine learning’s input features in the phase of model detection. Support Vector Machine (SVM) algorithm is used to train the features and learn the model. The experimental results shows that the proposed method detects DDoS attacks with good accuracy

    Mitigation of Distortions in Radio-Over-Fiber Systems Using Machine Learning

    Get PDF
    Introducción: El constante crecimiento de usuarios conectados a internet por medio de dispositivos móviles ha conllevado a incrementar la investigación en el paradigma de las redes híbridas conocido como Radio-sobre-Fibra. Estas redes aprovechan las ventajas del ancho de banda de la fibra óptica y la movilidad de las transmisiones inalámbricas, evitando el cuello de botella que se da por la conversión óptico a eléctrico. No obstante, la dispersión cromática propia de la fibra óptica genera distorsiones en la señal de radiofrecuencia modulada ópticamente, lo cual limita su alcance.  Objetivo: Mejorar el desempeño de un sistema de radio sobre fibra en términos de la tasa de error de bit, usando demodulación no simétrica por medio del algoritmo de aprendizaje automático Máquina de Soporte Vectorial.  Metodología: Se simula un sistema de Radio-sobre-Fibra en el software especializado VPIDesignSuite. Se transmiten señales de radiofrecuencia moduladas en formatos 16 y 64-QAM con diferentes anchos de línea de láser sobre fibra óptica. Se aplica el algoritmo Máquina de Soporte Vectorial para la demodulación de la señal.  Resultados: La implementación del algoritmo de aprendizaje automático para la demodulación de la señal mejora significativamente el desempeño de la red permitiendo alcanzar los 30 km de transmisión por fibra óptica. Esto implica una reducción de la tasa de error de bit hasta en dos órdenes de magnitud en comparación con la demodulación tradicional.  Conclusiones: Se demuestra que con el uso de umbrales asimétricos usando algoritmo de Máquina de Soporte Vectorial se logran mitigar distorsiones en términos de la tasa de error de bit. Así, esta técnica se hace atractiva para futuras redes de acceso de alta capacidad.  Introduction: The ever-growing number of users connected to internet via mobile devices has driven to increase the research in the paradigm of hybrid optical networks called Radio-over-Fiber. These networks take advantages of the bandwidth given by the optical fiber and the mobility given by wireless transmissions, avoiding the bottleneck of optical-to-electrical conversion interfaces. However, the chromatic dispersion of the optical fiber generates distortions in the radiofrequency signals optically modulated, limiting the reach of transmission.  Objective: To improve the performance of a Radio-over-Fiber system in terms of bit-error-rate, using nonsymmetrical demodulation by means of the machine learning algorithm Support Vector Machine.  Methodology: A Radio-over-Fiber System is simulated in the specialized software VPIDesignSuite. The radiofrequency signals are modulated at 16 and 64-QAM formats with different laser linewidths and transmitted over optical fiber. The Support Vector Machine algorithm is applied to carry out nonsymmetrical demodulation.  Results: The implementation of the machine learning algorithm for signal demodulation significantly improves the network performance, reaching transmissions up to 30 km. It implies a reduction of the bit-error-rate up to two Introduction: The ever-growing number of users connected to internet via mobile devices has driven to increase the research in the paradigm of hybrid optical networks called Radio-over-Fiber. These networks take advantages of the bandwidth given by the optical fiber and the mobility given by wireless transmissions, avoiding the bottleneck of optical-to-electrical conversion interfaces. However, the chromatic dispersion of the optical fiber generates distortions in the radiofrequency signals optically modulated, limiting the reach of transmission.  Objective: To improve the performance of a Radio-over-Fiber system in terms of bit-error-rate, using nonsymmetrical demodulation by means of the machine learning algorithm Support Vector Machine.   Methodology: A Radio-over-Fiber System is simulated in the specialized software VPIDesignSuite. The radiofrequency signals are modulated at 16 and 64-QAM formats with different laser linewidths and transmitted over optical fiber. The Support Vector Machine algorithm is applied to carry out nonsymmetrical demodulation.  Results: The implementation of the machine learning algorithm for signal demodulation significantly improves the network performance, reaching transmissions up to 30 km. It implies a reduction of the bit-error-rate up to two orders of magnitude in comparison with conventional demodulation.  Conclusions: Mitigation of distortions in terms of bit-error-rate is demonstrated in a Radio-over-Fiber system using nonsymmetrical demodulation by using the Support Vector Machine algorithm. Thus, the proposed technique can be suitable for future high-capacity access networks. &nbsp

    Machine learning approach for detection of nonTor traffic

    Get PDF
    Intrusion detection has attracted a considerable interest from researchers and industry. After many years of research the community still faces the problem of building reliable and efficient intrusion detection systems (IDS) capable of handling large quantities of data with changing patterns in real time situations. The Tor network is popular in providing privacy and security to end user by anonymizing the identity of internet users connecting through a series of tunnels and nodes. This work identifies two problems; classification of Tor traffic and nonTor traffic to expose the activities within Tor traffic that minimizes the protection of users in using the UNB-CIC Tor Network Traffic dataset and classification of the Tor traffic flow in the network. This paper proposes a hybrid classifier; Artificial Neural Network in conjunction with Correlation feature selection algorithm for dimensionality reduction and improved classification performance. The reliability and efficiency of the propose hybrid classifier is compared with Support Vector Machine and naïve Bayes classifiers in detecting nonTor traffic in UNB-CIC Tor Network Traffic dataset. Experimental results show the hybrid classifier, ANN-CFS proved a better classifier in detecting nonTor traffic and classifying the Tor traffic flow in UNB-CIC Tor Network Traffic dataset

    Eavesdropping Hackers: Detecting Software Vulnerability Communication on Social Media Using Text Mining

    Get PDF
    Abstract—Cyber security is striving to find new forms of protection against hacker attacks. An emerging approach nowadays is the investigation of security-related messages exchanged on Deep/Dark Web and even Surface Web channels. This approach can be supported by the use of supervised machine learning models and text mining techniques. In our work, we compare a variety of machine learning algorithms, text representations and dimension reduction approaches for the detection accuracies of software-vulnerability-related communications. Given the imbalanced nature of the three public datasets used, we investigate appropriate sampling approaches to boost detection accuracies of our models. In addition, we examine how feature reduction techniques, such as Document Frequency Reduction, Chi-square and Singular Value Decomposition (SVD) can be used to reduce the number of features of the model without impacting the detection performance. We conclude that: (1) a Support Vector Machine (SVM) algorithm used with traditional Bag of Words achieved highest accuracies (2) The increase of the minority class with Random Oversampling technique improves the detection performance of the model by 5% on average, and (3) The number of features of the model can be reduced by up to 10% without affecting the detection performance. Also, we have provided the labelled dataset used in this work for further research. These findings can be used to support Cyber Security Threat Intelligence (CTI) with respect to the use of text mining techniques for detecting security-related communicatio

    Detecting Hacker Threats: Performance of Word and Sentence Embedding Models in Identifying Hacker Communications

    Get PDF
    Abstract—Cyber security is striving to find new forms of protection against hacker attacks. An emerging approach nowadays is the investigation of security-related messages exchanged on deep/dark web and even surface web channels. This approach can be supported by the use of supervised machine learning models and text mining techniques. In our work, we compare a variety of machine learning algorithms, text representations and dimension reduction approaches for the detection accuracies of software-vulnerability-related communications. Given the imbalanced nature of the three public datasets used, we investigate appropriate sampling approaches to boost detection accuracies of our models. In addition, we examine how feature reduction techniques such as Document Frequency Reduction, Chi-square and Singular Value Decomposition (SVD) can be used to reduce the number of features of the model without impacting the detection performance. We conclude that: (1) a Support Vector Machine (SVM) algorithm used with traditional Bag of Words achieved highest accuracies (2) The increase of the minority class with Random Oversampling technique improves the detection performance of the model by 5% on average, and (3) The number of features of the model can be reduced by up to 10% without affecting the detection performance. Also, we have provided the labelled dataset used in this work for further research. These findings can be used to support Cyber Security Threat Intelligence (CTI) with respect to the use of text mining techniques for detecting security-related communication

    Nonnegative principal component analysis for mass spectral serum profiles and biomarker discovery

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>As a novel cancer diagnostic paradigm, mass spectroscopic serum proteomic pattern diagnostics was reported superior to the conventional serologic cancer biomarkers. However, its clinical use is not fully validated yet. An important factor to prevent this young technology to become a mainstream cancer diagnostic paradigm is that robustly identifying cancer molecular patterns from high-dimensional protein expression data is still a challenge in machine learning and oncology research. As a well-established dimension reduction technique, PCA is widely integrated in pattern recognition analysis to discover cancer molecular patterns. However, its global feature selection mechanism prevents it from capturing local features. This may lead to difficulty in achieving high-performance proteomic pattern discovery, because only features interpreting global data behavior are used to train a learning machine.</p> <p>Methods</p> <p>In this study, we develop a nonnegative principal component analysis algorithm and present a nonnegative principal component analysis based support vector machine algorithm with sparse coding to conduct a high-performance proteomic pattern classification. Moreover, we also propose a nonnegative principal component analysis based filter-wrapper biomarker capturing algorithm for mass spectral serum profiles.</p> <p>Results</p> <p>We demonstrate the superiority of the proposed algorithm by comparison with six peer algorithms on four benchmark datasets. Moreover, we illustrate that nonnegative principal component analysis can be effectively used to capture meaningful biomarkers.</p> <p>Conclusion</p> <p>Our analysis suggests that nonnegative principal component analysis effectively conduct local feature selection for mass spectral profiles and contribute to improving sensitivities and specificities in the following classification, and meaningful biomarker discovery.</p
    corecore