9 research outputs found
Big Data Dimensional Analysis
The ability to collect and analyze large amounts of data is a growing problem
within the scientific community. The growing gap between data and users calls
for innovative tools that address the challenges faced by big data volume,
velocity and variety. One of the main challenges associated with big data
variety is automatically understanding the underlying structures and patterns
of the data. Such an understanding is required as a pre-requisite to the
application of advanced analytics to the data. Further, big data sets often
contain anomalies and errors that are difficult to know a priori. Current
approaches to understanding data structure are drawn from the traditional
database ontology design. These approaches are effective, but often require too
much human involvement to be effective for the volume, velocity and variety of
data encountered by big data systems. Dimensional Data Analysis (DDA) is a
proposed technique that allows big data analysts to quickly understand the
overall structure of a big dataset, determine anomalies. DDA exploits
structures that exist in a wide class of data to quickly determine the nature
of the data and its statical anomalies. DDA leverages existing schemas that are
employed in big data databases today. This paper presents DDA, applies it to a
number of data sets, and measures its performance. The overhead of DDA is low
and can be applied to existing big data systems without greatly impacting their
computing requirements.Comment: From IEEE HPEC 201
Julia implementation of the Dynamic Distributed Dimensional Data Model
Julia is a new language for writing data analysis programs that are easy to implement and run at high performance. Similarly, the Dynamic Distributed Dimensional Data Model (D4M) aims to clarify data analysis operations while retaining strong performance. D4M accomplishes these goals through a composable, unified data model on associative arrays. In this work, we present an implementation of D4M in Julia and describe how it enables and facilitates data analysis. Several experiments showcase scalable performance in our new Julia version as compared to the original Matlab implementation
Large Scale Enrichment and Statistical Cyber Characterization of Network Traffic
Modern network sensors continuously produce enormous quantities of raw data
that are beyond the capacity of human analysts. Cross-correlation of network
sensors increases this challenge by enriching every network event with
additional metadata. These large volumes of enriched network data present
opportunities to statistically characterize network traffic and quickly answer
a key question: "What are the primary cyber characteristics of my network
data?" The Python GraphBLAS and PyD4M analysis frameworks enable anonymized
statistical analysis to be performed quickly and efficiently on very large
network data sets. This approach is tested using billions of anonymized network
data samples from the largest Internet observatory (CAIDA Telescope) and tens
of millions of anonymized records from the largest commercially available
background enrichment capability (GreyNoise). The analysis confirms that most
of the enriched variables follow expected heavy-tail distributions and that a
large fraction of the network traffic is due to a small number of cyber
activities. This information can simplify the cyber analysts' task by enabling
prioritization of cyber activities based on statistical prevalence.Comment: 8 pages, 8 figures, HPE
Онтологія аналізу Big Data
The object of this research is the Big Data (BD) analysis processes. One of the most problematic places is the lack of a clear classification of BD analysis methods, the presence of which will greatly facilitate the selection of an optimal and efficient algorithm for analyzing these data depending on their structure.In the course of the study, Data Mining methods, Technologies Tech Mining, MapReduce technology, data visualization, other technologies and analysis techniques were used. This allows to determine their main characteristics and features for constructing a formal analysis model for Big Data. The rules for analyzing Big Data in the form of an ontological knowledge base are developed with the aim of using it to process and analyze any data.A classifier for forming a set of Big Data analysis rules has been obtained. Each BD has a set of parameters and criteria that determine the methods and technologies of analysis. The very purpose of BD, its structure and content determine the techniques and technologies for further analysis. Thanks to the developed ontology of the knowledge base of BD analysis with Protégé 3.4.7 and the set of RABD rules built in them, the process of selecting the methodologies and technologies for further analysis is shortened and the analysis of the selected BD is automated. This is due to the fact that the proposed approach to the analysis of Big Data has a number of features, in particular ontological knowledge base based on modern methods of artificial intelligence.Thanks to this, it is possible to obtain a complete set of Big Data analysis rules. This is possible only if the parameters and criteria of a specific Big Data are analyzed clearly.Исследованы процессы анализа Big Data. Используя разработанную формальную модель и проведенный критический анализ методов и технологий анализа Big Data, построена онтология анализа Big Data. Исследованы методы, модели и инструменты для усовершенствования онтологии аналитики Big Data и эффективной поддержки разработки структурных элементов модели системы поддержки принятия решений по управлению Big Data.Досліджені процеси аналізу Big Data. Використовуючи розроблену формальну модель та проведений критичний аналіз методів і технологій аналізу Big Data, побудовано онтологію аналізу Big Data. Досліджено методи, моделі та інструменти для удосконалення онтології аналітики Big Data та ефективнішої підтримки розроблення структурних елементів моделі системи підтримки прийняття рішень з керування Big Data
Онтологія аналізу Big Data
The object of this research is the Big Data (BD) analysis processes. One of the most problematic places is the lack of a clear classification of BD analysis methods, the presence of which will greatly facilitate the selection of an optimal and efficient algorithm for analyzing these data depending on their structure.In the course of the study, Data Mining methods, Technologies Tech Mining, MapReduce technology, data visualization, other technologies and analysis techniques were used. This allows to determine their main characteristics and features for constructing a formal analysis model for Big Data. The rules for analyzing Big Data in the form of an ontological knowledge base are developed with the aim of using it to process and analyze any data.A classifier for forming a set of Big Data analysis rules has been obtained. Each BD has a set of parameters and criteria that determine the methods and technologies of analysis. The very purpose of BD, its structure and content determine the techniques and technologies for further analysis. Thanks to the developed ontology of the knowledge base of BD analysis with Protégé 3.4.7 and the set of RABD rules built in them, the process of selecting the methodologies and technologies for further analysis is shortened and the analysis of the selected BD is automated. This is due to the fact that the proposed approach to the analysis of Big Data has a number of features, in particular ontological knowledge base based on modern methods of artificial intelligence.Thanks to this, it is possible to obtain a complete set of Big Data analysis rules. This is possible only if the parameters and criteria of a specific Big Data are analyzed clearly.Исследованы процессы анализа Big Data. Используя разработанную формальную модель и проведенный критический анализ методов и технологий анализа Big Data, построена онтология анализа Big Data. Исследованы методы, модели и инструменты для усовершенствования онтологии аналитики Big Data и эффективной поддержки разработки структурных элементов модели системы поддержки принятия решений по управлению Big Data.Досліджені процеси аналізу Big Data. Використовуючи розроблену формальну модель та проведений критичний аналіз методів і технологій аналізу Big Data, побудовано онтологію аналізу Big Data. Досліджено методи, моделі та інструменти для удосконалення онтології аналітики Big Data та ефективнішої підтримки розроблення структурних елементів моделі системи підтримки прийняття рішень з керування Big Data