186 research outputs found
Towards Large-Scale, Heterogeneous Anomaly Detection Systems in Industrial Networks: A Survey of Current Trends
Industrial Networks (INs) are widespread environments where heterogeneous devices collaborate to control and monitor physical
processes. Some of the controlled processes belong to Critical Infrastructures (CIs), and, as such, IN protection is an active research
field. Among different types of security solutions, IN Anomaly Detection Systems (ADSs) have received wide attention from the
scientific community.While INs have grown in size and in complexity, requiring the development of novel, Big Data solutions for
data processing, IN ADSs have not evolved at the same pace. In parallel, the development of BigData frameworks such asHadoop or
Spark has led the way for applying Big Data Analytics to the field of cyber-security,mainly focusing on the Information Technology
(IT) domain. However, due to the particularities of INs, it is not feasible to directly apply IT security mechanisms in INs, as IN
ADSs face unique characteristics. In this work we introduce three main contributions. First, we survey the area of Big Data ADSs
that could be applicable to INs and compare the surveyed works. Second, we develop a novel taxonomy to classify existing INbased
ADSs. And, finally, we present a discussion of open problems in the field of Big Data ADSs for INs that can lead to further
development
A Survey on Big Data for Network Traffic Monitoring and Analysis
Network Traffic Monitoring and Analysis (NTMA) represents a key component for network management, especially to guarantee the correct operation of large-scale networks such as the Internet. As the complexity of Internet services and the volume of traffic continue to increase, it becomes difficult to design scalable NTMA applications. Applications such as traffic classification and policing require real-time and scalable approaches. Anomaly detection and security mechanisms require to quickly identify and react to unpredictable events while processing millions of heterogeneous events. At last, the system has to collect, store, and process massive sets of historical data for post-mortem analysis. Those are precisely the challenges faced by general big data approaches: Volume, Velocity, Variety, and Veracity. This survey brings together NTMA and big data. We catalog previous work on NTMA that adopt big data approaches to understand to what extent the potential of big data is being explored in NTMA. This survey mainly focuses on approaches and technologies to manage the big NTMA data, additionally briefly discussing big data analytics (e.g., machine learning) for the sake of NTMA. Finally, we provide guidelines for future work, discussing lessons learned, and research directions
Data Migration from RDBMS to Hadoop
Oracle, IBM, Microsoft and Teradata own a large portion of the information on the planet. By that on the off chance that we run an inquiry in any piece of the world, it is likely that you are perusing the information from a Database possessed by them. The bigger the volume of information moves from Oracle to DB2 or other is testing assignment for the business. The conception of Hadoop and NoSQL innovation spoke to a seismic movement that shook the RDBMS market and offering a different option for organizations. The Database merchants moved rapidly to Big Data for position and opposite. Indeed, even everybody has own enormous information innovation like prophet NoSQL and mongo DB ,There is a colossal business sector for an elite information movement that can duplicate the information and put away in RDBMS Databases to Hadoop or NoSQL databases. Current data is available in the RDBMS databases like oracle, SQL Server, MySQL and Teradata. We are planning to migrate RDBMS data to big data which is support NoSQL database and contains verity of data from the existed system it’s take huge resources and time to migrate pita bytes of data. Time and resource may be constraints for the current migrating process
Recommended from our members
A MapReduce architecture for web site user behaviour monitoring in real time
Monitoring the behaviour of large numbers of web site users in real time poses significant performance challenges, due to the decentralised location and volume of generated data. This paper proposes a MapReduce-style architecture where the processing of event series from the Web users is performed by a number of cascading mappers, reducers and rereducers, local to the event origin. With the use of static analysis and a prototype implementation, we show how this architecture is capable to carry out time series analysis in real time for very large web data sets, based on the actual events, instead of resorting to sampling or other extrapolation techniques
Design and evaluation of a cloud native data analysis pipeline for cyber physical production systems
Since 1991 with the birth of the World Wide Web the rate of data growth has been growing with a record level in the last couple of years. Big companies
tackled down this data growth with expensive and enormous data centres to process and get value of this data. From social media, Internet of Things (IoT), new business process, monitoring and multimedia, the capacities of
those data centres started to be a problem and required continuos and expensive expansion. Thus, Big Data was something that only a few were able to access. This changed fast when Amazon launched Amazon Web Services (AWS) around 15 years ago and gave the origins to the public cloud.
At that time, the capabilities were still very new and reduced but 10 years later the cloud was a whole new business that changed for ever the Big Data business. This not only commoditised computer power but it was
accompanied by a price model that let medium and small players the possibility to access it. In consequence, new problems arised regarding the nature of these distributed systems and the software architectures required
for proper data processing. The present job analyse the type of typical Big Data workloads and propose an architecture for a cloud native data analysis
pipeline. Lastly, it provides a chapter for tools and services that can be used in the architecture taking advantage of their open source nature and the cloud
price models.Fil: Ferrer Daub, Facundo Javier. Universidad Católica de Córdoba. Instituto de Ciencias de la Administración; Argentin
The Family of MapReduce and Large Scale Data Processing Systems
In the last two decades, the continuous increase of computational power has
produced an overwhelming flow of data which has called for a paradigm shift in
the computing architecture and large scale data processing mechanisms.
MapReduce is a simple and powerful programming model that enables easy
development of scalable parallel applications to process vast amounts of data
on large clusters of commodity machines. It isolates the application from the
details of running a distributed program such as issues on data distribution,
scheduling and fault tolerance. However, the original implementation of the
MapReduce framework had some limitations that have been tackled by many
research efforts in several followup works after its introduction. This article
provides a comprehensive survey for a family of approaches and mechanisms of
large scale data processing mechanisms that have been implemented based on the
original idea of the MapReduce framework and are currently gaining a lot of
momentum in both research and industrial communities. We also cover a set of
introduced systems that have been implemented to provide declarative
programming interfaces on top of the MapReduce framework. In addition, we
review several large scale data processing systems that resemble some of the
ideas of the MapReduce framework for different purposes and application
scenarios. Finally, we discuss some of the future research directions for
implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author
- …