1 research outputs found
Learning Interesting Categorical Attributes for Refined Data Exploration
This work proposes and evaluates a novel approach to determine interesting
categorical attributes for lists of entities. Once identified, such categories
are of immense value to allow constraining (filtering) a current view of a user
to subsets of entities. We show how a classifier is trained that is able to
tell whether or not a categorical attribute can act as a constraint, in the
sense of human-perceived interestingness. The training data is harnessed from
Web tables, treating the presence or absence of a table as an indication that
the attribute used as a filter constraint is reasonable or not. For learning
the classification model, we review four well-known statistical measures
(features) for categorical attributes---entropy, unalikeability, peculiarity,
and coverage. We additionally propose three new statistical measures to capture
the distribution of data, tailored to our main objective. The learned model is
evaluated by relevance assessments obtained through a user study, reflecting
the applicability of the approach as a whole and, further, demonstrates the
superiority of the proposed diversity measures over existing statistical
measures like information entropy.Comment: 13 pages, 9 figures, 6 table