Search CORE

9 research outputs found

Big Data Dimensional Analysis

Author: Gadepally Vijay
Kepner Jeremy
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 03/08/2014
Field of study

The ability to collect and analyze large amounts of data is a growing problem within the scientific community. The growing gap between data and users calls for innovative tools that address the challenges faced by big data volume, velocity and variety. One of the main challenges associated with big data variety is automatically understanding the underlying structures and patterns of the data. Such an understanding is required as a pre-requisite to the application of advanced analytics to the data. Further, big data sets often contain anomalies and errors that are difficult to know a priori. Current approaches to understanding data structure are drawn from the traditional database ontology design. These approaches are effective, but often require too much human involvement to be effective for the volume, velocity and variety of data encountered by big data systems. Dimensional Data Analysis (DDA) is a proposed technique that allows big data analysts to quickly understand the overall structure of a big dataset, determine anomalies. DDA exploits structures that exist in a wide class of data to quickly determine the nature of the data and its statical anomalies. DDA leverages existing schemas that are employed in big data databases today. This paper presents DDA, applies it to a number of data sets, and measures its performance. The overhead of DDA is low and can be applied to existing big data systems without greatly impacting their computing requirements.Comment: From IEEE HPEC 201

arXiv.org e-Print Archive

Crossref

Julia implementation of the Dynamic Distributed Dimensional Data Model

Author: Chen Alexander Y.
Edelman Alan
Gadepally Vijay N.
Hutchison Dylan D.
Kepner Jeremy
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 18/05/2018
Field of study

Julia is a new language for writing data analysis programs that are easy to implement and run at high performance. Similarly, the Dynamic Distributed Dimensional Data Model (D4M) aims to clarify data analysis operations while retaining strong performance. D4M accomplishes these goals through a composable, unified data model on associative arrays. In this work, we present an implementation of D4M in Julia and describe how it enables and facilitates data analysis. Several experiments showcase scalable performance in our new Julia version as compared to the original Matlab implementation

DSpace@MIT

Large Scale Enrichment and Statistical Cyber Characterization of Network Traffic

Author: Buluç Aydın
Davis Tim
Elsakkary Youssef
Estrada Arminda
Grant Daniel
Jananthan Hayden
Jones Michael
Kawaminami Ivan
Kepner Jeremy
Meiners Chad
Morris Andrew
Pisharody Sandeep
Publication venue
Publication date: 07/09/2022
Field of study

Modern network sensors continuously produce enormous quantities of raw data that are beyond the capacity of human analysts. Cross-correlation of network sensors increases this challenge by enriching every network event with additional metadata. These large volumes of enriched network data present opportunities to statistically characterize network traffic and quickly answer a key question: "What are the primary cyber characteristics of my network data?" The Python GraphBLAS and PyD4M analysis frameworks enable anonymized statistical analysis to be performed quickly and efficiently on very large network data sets. This approach is tested using billions of anonymized network data samples from the largest Internet observatory (CAIDA Telescope) and tens of millions of anonymized records from the largest commercially available background enrichment capability (GreyNoise). The analysis confirms that most of the enriched variables follow expected heavy-tail distributions and that a large fraction of the network traffic is due to a small number of cyber activities. This information can simplify the cyber analysts' task by enabling prioritization of cyber activities based on statistical prevalence.Comment: 8 pages, 8 figures, HPE

arXiv.org e-Print Archive

The University of Arizona

Онтологія аналізу Big Data

Author: Brodyak Oksana
Lytvyn Vasyl
Oryshchyn Oksana
Veres Oleh
Vysotska Victoria
Publication venue: 'Private Company Technology Center'
Publication date: 28/12/2017
Field of study

The object of this research is the Big Data (BD) analysis processes. One of the most problematic places is the lack of a clear classification of BD analysis methods, the presence of which will greatly facilitate the selection of an optimal and efficient algorithm for analyzing these data depending on their structure.In the course of the study, Data Mining methods, Technologies Tech Mining, MapReduce technology, data visualization, other technologies and analysis techniques were used. This allows to determine their main characteristics and features for constructing a formal analysis model for Big Data. The rules for analyzing Big Data in the form of an ontological knowledge base are developed with the aim of using it to process and analyze any data.A classifier for forming a set of Big Data analysis rules has been obtained. Each BD has a set of parameters and criteria that determine the methods and technologies of analysis. The very purpose of BD, its structure and content determine the techniques and technologies for further analysis. Thanks to the developed ontology of the knowledge base of BD analysis with Protégé 3.4.7 and the set of RABD rules built in them, the process of selecting the methodologies and technologies for further analysis is shortened and the analysis of the selected BD is automated. This is due to the fact that the proposed approach to the analysis of Big Data has a number of features, in particular ontological knowledge base based on modern methods of artificial intelligence.Thanks to this, it is possible to obtain a complete set of Big Data analysis rules. This is possible only if the parameters and criteria of a specific Big Data are analyzed clearly.Исследованы процессы анализа Big Data. Используя разработанную формальную модель и проведенный критический анализ методов и технологий анализа Big Data, построена онтология анализа Big Data. Исследованы методы, модели и инструменты для усовершенствования онтологии аналитики Big Data и эффективной поддержки разработки структурных элементов модели системы поддержки принятия решений по управлению Big Data.Досліджені процеси аналізу Big Data. Використовуючи розроблену формальну модель та проведений критичний аналіз методів і технологій аналізу Big Data, побудовано онтологію аналізу Big Data. Досліджено методи, моделі та інструменти для удосконалення онтології аналітики Big Data та ефективнішої підтримки розроблення структурних елементів моделі системи підтримки прийняття рішень з керування Big Data

Наукова періодика України

Technology audit and production reserves

Онтологія аналізу Big Data

Author: Brodyak Oksana
Lytvyn Vasyl
Oryshchyn Oksana
Veres Oleh
Vysotska Victoria
Publication venue: РС ТЕСHNOLOGY СЕNTЕR
Publication date: 28/12/2017
Field of study

Наукова періодика України