28,791 research outputs found
Analysis of Call Data Record (CDR) using Hadoop Cluster
The management of big data is the most important issue for this decade since the real world applications are generating very large scale of data in petabytes and zetabytes scale. Most popular solution of big data management is a system based on Hadoop Distributed File System. However, implementing enterprise level solution is a challenge because of the production of such huge data. In this project, we employ Hadoop cluster to the telecommunication data since it produces a huge amount of log data regarding to customer calls as well as network equipment. To emphasize more realistic solution we ponder on call data details for our Big Data application. In this project, we have acquired real-time Call Data Record (CDR) data for our implementation from telco operator named Banglalink who is operating 30 million users in Bangladesh. To narrow down the scope, CDR data analytics using Hadoop cluster can result top callers to promote customer experience. This implement can also help Banglalink to implement similar application for backup data warehouse using Hadoop cluster for CDR analytics
Towards a property graph generator for benchmarking
The use of synthetic graph generators is a common practice among
graph-oriented benchmark designers, as it allows obtaining graphs with the
required scale and characteristics. However, finding a graph generator that
accurately fits the needs of a given benchmark is very difficult, thus
practitioners end up creating ad-hoc ones. Such a task is usually
time-consuming, and often leads to reinventing the wheel. In this paper, we
introduce the conceptual design of DataSynth, a framework for property graphs
generation with customizable schemas and characteristics. The goal of DataSynth
is to assist benchmark designers in generating graphs efficiently and at scale,
saving from implementing their own generators. Additionally, DataSynth
introduces novel features barely explored so far, such as modeling the
correlation between properties and the structure of the graph. This is achieved
by a novel property-to-node matching algorithm for which we present preliminary
promising results
Predicting Exploitation of Disclosed Software Vulnerabilities Using Open-source Data
Each year, thousands of software vulnerabilities are discovered and reported
to the public. Unpatched known vulnerabilities are a significant security risk.
It is imperative that software vendors quickly provide patches once
vulnerabilities are known and users quickly install those patches as soon as
they are available. However, most vulnerabilities are never actually exploited.
Since writing, testing, and installing software patches can involve
considerable resources, it would be desirable to prioritize the remediation of
vulnerabilities that are likely to be exploited. Several published research
studies have reported moderate success in applying machine learning techniques
to the task of predicting whether a vulnerability will be exploited. These
approaches typically use features derived from vulnerability databases (such as
the summary text describing the vulnerability) or social media posts that
mention the vulnerability by name. However, these prior studies share multiple
methodological shortcomings that inflate predictive power of these approaches.
We replicate key portions of the prior work, compare their approaches, and show
how selection of training and test data critically affect the estimated
performance of predictive models. The results of this study point to important
methodological considerations that should be taken into account so that results
reflect real-world utility
- …