Search CORE

28,791 research outputs found

Analysis of Call Data Record (CDR) using Hadoop Cluster

Author: Arumugam Arun Sai Prakash
Chowdhury Toufiqur Rahman
Lee Jeongkyu
Publication venue
Publication date: 27/03/2015
Field of study

The management of big data is the most important issue for this decade since the real world applications are generating very large scale of data in petabytes and zetabytes scale. Most popular solution of big data management is a system based on Hadoop Distributed File System. However, implementing enterprise level solution is a challenge because of the production of such huge data. In this project, we employ Hadoop cluster to the telecommunication data since it produces a huge amount of log data regarding to customer calls as well as network equipment. To emphasize more realistic solution we ponder on call data details for our Big Data application. In this project, we have acquired real-time Call Data Record (CDR) data for our implementation from telco operator named Banglalink who is operating 30 million users in Bangladesh. To narrow down the scope, CDR data analytics using Hadoop cluster can result top callers to promote customer experience. This implement can also help Banglalink to implement similar application for backup data warehouse using Hadoop cluster for CDR analytics

UB ScholarWorks

Towards a property graph generator for benchmarking

Author: Bartolini Davide Basilio
Depner Siegfried
Guisado-Gámez Joan
Koupy Petr
Prat-Pérez Arnau
Salas Xavier Fernández
Publication venue
Publication date: 03/04/2017
Field of study

The use of synthetic graph generators is a common practice among graph-oriented benchmark designers, as it allows obtaining graphs with the required scale and characteristics. However, finding a graph generator that accurately fits the needs of a given benchmark is very difficult, thus practitioners end up creating ad-hoc ones. Such a task is usually time-consuming, and often leads to reinventing the wheel. In this paper, we introduce the conceptual design of DataSynth, a framework for property graphs generation with customizable schemas and characteristics. The goal of DataSynth is to assist benchmark designers in generating graphs efficiently and at scale, saving from implementing their own generators. Additionally, DataSynth introduces novel features barely explored so far, such as modeling the correlation between properties and the structure of the graph. This is achieved by a novel property-to-node matching algorithm for which we present preliminary promising results

arXiv.org e-Print Archive

Crossref

Predicting Exploitation of Disclosed Software Vulnerabilities Using Open-source Data

Author: Edkrantz M.
Mell P.
N.
Sabottke C.
Tufekci Z.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 25/07/2017
Field of study

Each year, thousands of software vulnerabilities are discovered and reported to the public. Unpatched known vulnerabilities are a significant security risk. It is imperative that software vendors quickly provide patches once vulnerabilities are known and users quickly install those patches as soon as they are available. However, most vulnerabilities are never actually exploited. Since writing, testing, and installing software patches can involve considerable resources, it would be desirable to prioritize the remediation of vulnerabilities that are likely to be exploited. Several published research studies have reported moderate success in applying machine learning techniques to the task of predicting whether a vulnerability will be exploited. These approaches typically use features derived from vulnerability databases (such as the summary text describing the vulnerability) or social media posts that mention the vulnerability by name. However, these prior studies share multiple methodological shortcomings that inflate predictive power of these approaches. We replicate key portions of the prior work, compare their approaches, and show how selection of training and test data critically affect the estimated performance of predictive models. The results of this study point to important methodological considerations that should be taken into account so that results reflect real-world utility

arXiv.org e-Print Archive

Crossref