2,147 research outputs found
Generalising Exponential Distributions Using an Extended Marshall-Olkin Procedure
This paper presents a three-parameter family of distributions which includes the common
exponential and the Marshall–Olkin exponential as special cases. This distribution exhibits a
monotone failure rate function, which makes it appealing for practitioners interested in reliability,
and means it can be included in the catalogue of appropriate non-symmetric distributions to
model these issues, such as the gamma and Weibull three-parameter families. Given the lack of
symmetry of this kind of distribution, various statistical and reliability properties of this model are
examined. Numerical examples based on real data reflect the suitable behaviour of this distribution
for modelling purposes
Data Sketches for Disaggregated Subset Sum and Frequent Item Estimation
We introduce and study a new data sketch for processing massive datasets. It
addresses two common problems: 1) computing a sum given arbitrary filter
conditions and 2) identifying the frequent items or heavy hitters in a data
set. For the former, the sketch provides unbiased estimates with state of the
art accuracy. It handles the challenging scenario when the data is
disaggregated so that computing the per unit metric of interest requires an
expensive aggregation. For example, the metric of interest may be total clicks
per user while the raw data is a click stream with multiple rows per user. Thus
the sketch is suitable for use in a wide range of applications including
computing historical click through rates for ad prediction, reporting user
metrics from event streams, and measuring network traffic for IP flows.
We prove and empirically show the sketch has good properties for both the
disaggregated subset sum estimation and frequent item problems. On i.i.d. data,
it not only picks out the frequent items but gives strongly consistent
estimates for the proportion of each frequent item. The resulting sketch
asymptotically draws a probability proportional to size sample that is optimal
for estimating sums over the data. For non i.i.d. data, we show that it
typically does much better than random sampling for the frequent item problem
and never does worse. For subset sum estimation, we show that even for
pathological sequences, the variance is close to that of an optimal sampling
design. Empirically, despite the disadvantage of operating on disaggregated
data, our method matches or bests priority sampling, a state of the art method
for pre-aggregated data and performs orders of magnitude better on skewed data
compared to uniform sampling. We propose extensions to the sketch that allow it
to be used in combining multiple data sets, in distributed systems, and for
time decayed aggregation
Blind image separation based on exponentiated transmuted Weibull distribution
In recent years the processing of blind image separation has been
investigated. As a result, a number of feature extraction algorithms for direct
application of such image structures have been developed. For example,
separation of mixed fingerprints found in any crime scene, in which a mixture
of two or more fingerprints may be obtained, for identification, we have to
separate them. In this paper, we have proposed a new technique for separating a
multiple mixed images based on exponentiated transmuted Weibull distribution.
To adaptively estimate the parameters of such score functions, an efficient
method based on maximum likelihood and genetic algorithm will be used. We also
calculate the accuracy of this proposed distribution and compare the
algorithmic performance using the efficient approach with other previous
generalized distributions. We find from the numerical results that the proposed
distribution has flexibility and an efficient resultComment: 14 pages, 12 figures, 4 tables. International Journal of Computer
Science and Information Security (IJCSIS),Vol. 14, No. 3, March 2016 (pp.
423-433
- …