60,976 research outputs found
Caveats for information bottleneck in deterministic scenarios
Information bottleneck (IB) is a method for extracting information from one
random variable that is relevant for predicting another random variable
. To do so, IB identifies an intermediate "bottleneck" variable that has
low mutual information and high mutual information . The "IB
curve" characterizes the set of bottleneck variables that achieve maximal
for a given , and is typically explored by maximizing the "IB
Lagrangian", . In some cases, is a deterministic
function of , including many classification problems in supervised learning
where the output class is a deterministic function of the input . We
demonstrate three caveats when using IB in any situation where is a
deterministic function of : (1) the IB curve cannot be recovered by
maximizing the IB Lagrangian for different values of ; (2) there are
"uninteresting" trivial solutions at all points of the IB curve; and (3) for
multi-layer classifiers that achieve low prediction error, different layers
cannot exhibit a strict trade-off between compression and prediction, contrary
to a recent proposal. We also show that when is a small perturbation away
from being a deterministic function of , these three caveats arise in an
approximate way. To address problem (1), we propose a functional that, unlike
the IB Lagrangian, can recover the IB curve in all cases. We demonstrate the
three caveats on the MNIST dataset
CuisineNet: Food Attributes Classification using Multi-scale Convolution Network
Diversity of food and its attributes represents the culinary habits of
peoples from different countries. Thus, this paper addresses the problem of
identifying food culture of people around the world and its flavor by
classifying two main food attributes, cuisine and flavor. A deep learning model
based on multi-scale convotuional networks is proposed for extracting more
accurate features from input images. The aggregation of multi-scale convolution
layers with different kernel size is also used for weighting the features
results from different scales. In addition, a joint loss function based on
Negative Log Likelihood (NLL) is used to fit the model probability to multi
labeled classes for multi-modal classification task. Furthermore, this work
provides a new dataset for food attributes, so-called Yummly48K, extracted from
the popular food website, Yummly. Our model is assessed on the constructed
Yummly48K dataset. The experimental results show that our proposed method
yields 65% and 62% average F1 score on validation and test set which
outperforming the state-of-the-art models.Comment: 8 pages, Submitted in CCIA 201
- …