433 research outputs found

    Real-time big data processing for anomaly detection : a survey

    Get PDF
    The advent of connected devices and omnipresence of Internet have paved way for intruders to attack networks, which leads to cyber-attack, financial loss, information theft in healthcare, and cyber war. Hence, network security analytics has become an important area of concern and has gained intensive attention among researchers, off late, specifically in the domain of anomaly detection in network, which is considered crucial for network security. However, preliminary investigations have revealed that the existing approaches to detect anomalies in network are not effective enough, particularly to detect them in real time. The reason for the inefficacy of current approaches is mainly due the amassment of massive volumes of data though the connected devices. Therefore, it is crucial to propose a framework that effectively handles real time big data processing and detect anomalies in networks. In this regard, this paper attempts to address the issue of detecting anomalies in real time. Respectively, this paper has surveyed the state-of-the-art real-time big data processing technologies related to anomaly detection and the vital characteristics of associated machine learning algorithms. This paper begins with the explanation of essential contexts and taxonomy of real-time big data processing, anomalous detection, and machine learning algorithms, followed by the review of big data processing technologies. Finally, the identified research challenges of real-time big data processing in anomaly detection are discussed. © 2018 Elsevier Lt

    A Survey on Big Data for Network Traffic Monitoring and Analysis

    Get PDF
    Network Traffic Monitoring and Analysis (NTMA) represents a key component for network management, especially to guarantee the correct operation of large-scale networks such as the Internet. As the complexity of Internet services and the volume of traffic continue to increase, it becomes difficult to design scalable NTMA applications. Applications such as traffic classification and policing require real-time and scalable approaches. Anomaly detection and security mechanisms require to quickly identify and react to unpredictable events while processing millions of heterogeneous events. At last, the system has to collect, store, and process massive sets of historical data for post-mortem analysis. Those are precisely the challenges faced by general big data approaches: Volume, Velocity, Variety, and Veracity. This survey brings together NTMA and big data. We catalog previous work on NTMA that adopt big data approaches to understand to what extent the potential of big data is being explored in NTMA. This survey mainly focuses on approaches and technologies to manage the big NTMA data, additionally briefly discussing big data analytics (e.g., machine learning) for the sake of NTMA. Finally, we provide guidelines for future work, discussing lessons learned, and research directions

    Security analytics of large scale streaming data

    Get PDF

    Big Data Security (Volume 3)

    Get PDF
    After a short description of the key concepts of big data the book explores on the secrecy and security threats posed especially by cloud based data storage. It delivers conceptual frameworks and models along with case studies of recent technology

    Big data analytics: a predictive analysis applied to cybersecurity in a financial organization

    Get PDF
    Project Work presented as partial requirement for obtaining the Master’s degree in Information Management, with a specialization in Knowledge Management and Business IntelligenceWith the generalization of the internet access, cyber attacks have registered an alarming growth in frequency and severity of damages, along with the awareness of organizations with heavy investments in cybersecurity, such as in the financial sector. This work is focused on an organization’s financial service that operates on the international markets in the payment systems industry. The objective was to develop a predictive framework solution responsible for threat detection to support the security team to open investigations on intrusive server requests, over the exponentially growing log events collected by the SIEM from the Apache Web Servers for the financial service. A Big Data framework, using Hadoop and Spark, was developed to perform classification tasks over the financial service requests, using Neural Networks, Logistic Regression, SVM, and Random Forests algorithms, while handling the training of the imbalance dataset through BEV. The main conclusions over the analysis conducted, registered the best scoring performances for the Random Forests classifier using all the preprocessed features available. Using the all the available worker nodes with a balanced configuration of the Spark executors, the most performant elapsed times for loading and preprocessing of the data were achieved using the column-oriented ORC with native format, while the row-oriented CSV format performed the best for the training of the classifiers.Com a generalização do acesso à internet, os ciberataques registaram um crescimento alarmante em frequência e severidade de danos causados, a par da consciencialização das organizações, com elevados investimentos em cibersegurança, como no setor financeiro. Este trabalho focou-se no serviço financeiro de uma organização que opera nos mercados internacionais da indústria de sistemas de pagamento. O objetivo consistiu no desenvolvimento uma solução preditiva responsável pela detecção de ameaças, por forma a dar suporte à equipa de segurança na abertura de investigações sobre pedidos intrusivos no servidor, relativamente aos exponencialmente crescentes eventos de log coletados pelo SIEM, referentes aos Apache Web Servers, para o serviço financeiro. Uma solução de Big Data, usando Hadoop e Spark, foi desenvolvida com o objectivo de executar tarefas de classificação sobre os pedidos do serviço financeiros, usando os algoritmos Neural Networks, Logistic Regression, SVM e Random Forests, solucionando os problemas associados ao treino de um dataset desequilibrado através de BEV. As principais conclusões sobre as análises realizadas registaram os melhores resultados de classificação usando o algoritmo Random Forests com todas as variáveis pré-processadas disponíveis. Usando todos os nós do cluster e uma configuração balanceada dos executores do Spark, os melhores tempos para carregar e pré-processar os dados foram obtidos usando o formato colunar ORC nativo, enquanto o formato CSV, orientado a linhas, apresentou os melhores tempos para o treino dos classificadores

    Network anomalies detection via event analysis and correlation by a smart system

    Get PDF
    The multidisciplinary of contemporary societies compel us to look at Information Technology (IT) systems as one of the most significant grants that we can remember. However, its increase implies a mandatory security force for users, a force in the form of effective and robust tools to combat cybercrime to which users, individual or collective, are ex-posed almost daily. Monitoring and detection of this kind of problem must be ensured in real-time, allowing companies to intervene fruitfully, quickly and in unison. The proposed framework is based on an organic symbiosis between credible, affordable, and effective open-source tools for data analysis, relying on Security Information and Event Management (SIEM), Big Data and Machine Learning (ML) techniques commonly applied for the development of real-time monitoring systems. Dissecting this framework, it is composed of a system based on SIEM methodology that provides monitoring of data in real-time and simultaneously saves the information, to assist forensic investigation teams. Secondly, the application of the Big Data concept is effective in manipulating and organising the flow of data. Lastly, the use of ML techniques that help create mechanisms to detect possible attacks or anomalies on the network. This framework is intended to provide a real-time analysis application in the institution ISCTE – Instituto Universitário de Lisboa (Iscte), offering a more complete, efficient, and secure monitoring of the data from the different devices comprising the network.A multidisciplinaridade das sociedades contemporâneas obriga-nos a perspetivar os sistemas informáticos como uma das maiores dádivas de que há memória. Todavia o seu incremento implica uma mandatária força de segurança para utilizadores, força essa em forma de ferramentas eficazes e robustas no combate ao cibercrime a que os utilizadores, individuais ou coletivos, são sujeitos quase diariamente. A monitorização e deteção deste tipo de problemas tem de ser assegurada em tempo real, permitindo assim, às empresas intervenções frutuosas, rápidas e em uníssono. A framework proposta é alicerçada numa simbiose orgânica entre ferramentas open source credíveis, acessíveis pecuniariamente e eficazes na monitorização de dados, recorrendo a um sistema baseado em técnicas de Security Information and Event Management (SIEM), Big Data e Machine Learning (ML) comumente aplicadas para a criação de sistemas de monitorização em tempo real. Dissecando esta framework, é composta pela metodologia SIEM que possibilita a monitorização de dados em tempo real e em simultâneo guardar a informação, com o objetivo de auxiliar as equipas de investigação forense. Em segundo lugar, a aplicação do conceito Big Data eficaz na manipulação e organização do fluxo dos dados. Por último, o uso de técnicas de ML que ajudam a criação de mecanismos de deteção de possíveis ataques ou anomalias na rede. Esta framework tem como objetivo uma aplicação de análise em tempo real na instituição ISCTE – Instituto Universitário de Lisboa (Iscte), apresentando uma monitorização mais completa, eficiente e segura dos dados dos diversos dispositivos presentes na mesma

    Data Migration from RDBMS to Hadoop

    Get PDF
    Oracle, IBM, Microsoft and Teradata own a large portion of the information on the planet. By that on the off chance that we run an inquiry in any piece of the world, it is likely that you are perusing the information from a Database possessed by them. The bigger the volume of information moves from Oracle to DB2 or other is testing assignment for the business. The conception of Hadoop and NoSQL innovation spoke to a seismic movement that shook the RDBMS market and offering a different option for organizations. The Database merchants moved rapidly to Big Data for position and opposite. Indeed, even everybody has own enormous information innovation like prophet NoSQL and mongo DB ,There is a colossal business sector for an elite information movement that can duplicate the information and put away in RDBMS Databases to Hadoop or NoSQL databases. Current data is available in the RDBMS databases like oracle, SQL Server, MySQL and Teradata. We are planning to migrate RDBMS data to big data which is support NoSQL database and contains verity of data from the existed system it’s take huge resources and time to migrate pita bytes of data. Time and resource may be constraints for the current migrating process
    • …
    corecore