232 research outputs found
Using a Power Law Distribution to describe Big Data
The gap between data production and user ability to access, compute and
produce meaningful results calls for tools that address the challenges
associated with big data volume, velocity and variety. One of the key hurdles
is the inability to methodically remove expected or uninteresting elements from
large data sets. This difficulty often wastes valuable researcher and
computational time by expending resources on uninteresting parts of data.
Social sensors, or sensors which produce data based on human activity, such as
Wikipedia, Twitter, and Facebook have an underlying structure which can be
thought of as having a Power Law distribution. Such a distribution implies that
few nodes generate large amounts of data. In this article, we propose a
technique to take an arbitrary dataset and compute a power law distributed
background model that bases its parameters on observed statistics. This model
can be used to determine the suitability of using a power law or automatically
identify high degree nodes for filtering and can be scaled to work with big
data.Comment: 5 page
Big Data Dimensional Analysis
The ability to collect and analyze large amounts of data is a growing problem
within the scientific community. The growing gap between data and users calls
for innovative tools that address the challenges faced by big data volume,
velocity and variety. One of the main challenges associated with big data
variety is automatically understanding the underlying structures and patterns
of the data. Such an understanding is required as a pre-requisite to the
application of advanced analytics to the data. Further, big data sets often
contain anomalies and errors that are difficult to know a priori. Current
approaches to understanding data structure are drawn from the traditional
database ontology design. These approaches are effective, but often require too
much human involvement to be effective for the volume, velocity and variety of
data encountered by big data systems. Dimensional Data Analysis (DDA) is a
proposed technique that allows big data analysts to quickly understand the
overall structure of a big dataset, determine anomalies. DDA exploits
structures that exist in a wide class of data to quickly determine the nature
of the data and its statical anomalies. DDA leverages existing schemas that are
employed in big data databases today. This paper presents DDA, applies it to a
number of data sets, and measures its performance. The overhead of DDA is low
and can be applied to existing big data systems without greatly impacting their
computing requirements.Comment: From IEEE HPEC 201
RadiX-Net: Structured Sparse Matrices for Deep Neural Networks
The sizes of deep neural networks (DNNs) are rapidly outgrowing the capacity
of hardware to store and train them. Research over the past few decades has
explored the prospect of sparsifying DNNs before, during, and after training by
pruning edges from the underlying topology. The resulting neural network is
known as a sparse neural network. More recent work has demonstrated the
remarkable result that certain sparse DNNs can train to the same precision as
dense DNNs at lower runtime and storage cost. An intriguing class of these
sparse DNNs is the X-Nets, which are initialized and trained upon a sparse
topology with neither reference to a parent dense DNN nor subsequent pruning.
We present an algorithm that deterministically generates RadiX-Nets: sparse DNN
topologies that, as a whole, are much more diverse than X-Net topologies, while
preserving X-Nets' desired characteristics. We further present a
functional-analytic conjecture based on the longstanding observation that
sparse neural network topologies can attain the same expressive power as dense
counterpartsComment: 7 pages, 8 figures, accepted at IEEE IPDPS 2019 GrAPL workshop. arXiv
admin note: substantial text overlap with arXiv:1809.0524
Linear Systems over Join-Blank Algebras
A central problem of linear algebra is solving linear systems. Regarding
linear systems as equations over general semirings (V,otimes,oplus,0,1) instead
of rings or fields makes traditional approaches impossible. Earlier work shows
that the solution space X(A;w) of the linear system Av = w over the class of
semirings called join-blank algebras is a union of closed intervals (in the
product order) with a common terminal point. In the smaller class of max-blank
algebras, the additional hypothesis that the solution spaces of the 1x1 systems
Av = w are closed intervals implies that X(A;w) is a finite union of closed
intervals. We examine the general case, proving that without this additional
hypothesis, we can still make X(A;w) into a finite union of quasi-intervals
Percolation Model of Insider Threats to Assess the Optimum Number of Rules
Rules, regulations, and policies are the basis of civilized society and are
used to coordinate the activities of individuals who have a variety of goals
and purposes. History has taught that over-regulation (too many rules) makes it
difficult to compete and under-regulation (too few rules) can lead to crisis.
This implies an optimal number of rules that avoids these two extremes. Rules
create boundaries that define the latitude an individual has to perform their
activities. This paper creates a Toy Model of a work environment and examines
it with respect to the latitude provided to a normal individual and the
latitude provided to an insider threat. Simulations with the Toy Model
illustrate four regimes with respect to an insider threat: under-regulated,
possibly optimal, tipping-point, and over-regulated. These regimes depend up
the number of rules (N) and the minimum latitude (Lmin) required by a normal
individual to carry out their activities. The Toy Model is then mapped onto the
standard 1D Percolation Model from theoretical physics and the same behavior is
observed. This allows the Toy Model to be generalized to a wide array of more
complex models that have been well studied by the theoretical physics community
and also show the same behavior. Finally, by estimating N and Lmin it should be
possible to determine the regime of any particular environment.Comment: 6 pages, 5 figures, submitted to IEEE HS
- …
