20 research outputs found
Justification of Logarithmic Loss via the Benefit of Side Information
We consider a natural measure of relevance: the reduction in optimal
prediction risk in the presence of side information. For any given loss
function, this relevance measure captures the benefit of side information for
performing inference on a random variable under this loss function. When such a
measure satisfies a natural data processing property, and the random variable
of interest has alphabet size greater than two, we show that it is uniquely
characterized by the mutual information, and the corresponding loss function
coincides with logarithmic loss. In doing so, our work provides a new
characterization of mutual information, and justifies its use as a measure of
relevance. When the alphabet is binary, we characterize the only admissible
forms the measure of relevance can assume while obeying the specified data
processing property. Our results naturally extend to measuring causal influence
between stochastic processes, where we unify different causal-inference
measures in the literature as instantiations of directed information
Information-Distilling Quantizers
Let and be dependent random variables. This paper considers the
problem of designing a scalar quantizer for to maximize the mutual
information between the quantizer's output and , and develops fundamental
properties and bounds for this form of quantization, which is connected to the
log-loss distortion criterion. The main focus is the regime of low ,
where it is shown that, if is binary, a constant fraction of the mutual
information can always be preserved using
quantization levels, and there exist distributions for which this many
quantization levels are necessary. Furthermore, for larger finite alphabets , it is established that an -fraction of the
mutual information can be preserved using roughly quantization levels
Bregman Divergence Bounds and the Universality of the Logarithmic Loss
A loss function measures the discrepancy between the true values and their
estimated fits, for a given instance of data. In classification problems, a
loss function is said to be proper if the minimizer of the expected loss is the
true underlying probability. In this work we show that for binary
classification, the divergence associated with smooth, proper and convex loss
functions is bounded from above by the Kullback-Leibler (KL) divergence, up to
a normalization constant. It implies that by minimizing the log-loss
(associated with the KL divergence), we minimize an upper bound to any choice
of loss from this set. This property suggests that the log-loss is universal in
the sense that it provides performance guarantees to a broad class of accuracy
measures. Importantly, our notion of universality is not restricted to a
specific problem. This allows us to apply our results to many applications,
including predictive modeling, data clustering and sample complexity analysis.
Further, we show that the KL divergence bounds from above any separable Bregman
divergence that is convex in its second argument (up to a normalization
constant). This result introduces a new set of divergence inequalities, similar
to Pinsker inequality, and extends well-known -divergence inequality
results.Comment: arXiv admin note: substantial text overlap with arXiv:1805.0380
Application of an Artificial Neural Network as a Third-Party Database Auditing System
Data auditing is a fundamental challenge for organizations who deal with large databases. Databases are frequently targeted by attacks that grow in quantity and sophistication every day, and one-third of which are coming from users inside the organizations. Database auditing plays a vital role in protecting against these attacks. Native features in data base auditing systems monitor and capture activities and incidents that occur within a database and notify the database administrator. However, the cost of administration and performance overhead in the software must be considered. As opposed to using native auditing tools, the better solution for having a more secure database is to utilize third-party products. The primary goal of this thesis is to utilize an efficient and optimized deep learning approach to detect suspicious behaviors within a database by calculating the amount of risk that each user poses for the system. This will be accomplished by using an Artificial Neural Network as an enhanced feature of analyzer component of a database auditing system. This ANN will work as a third-party product for the database auditing system. The model has been validated in order to have a low bias and low variance. Moreover, parameter tuning technique has been utilized to find the best parameters that would result in the highest accuracy for the model