43,739 research outputs found
Interactive Visual Labelling versus Active Learning: An Experimental Comparison
Methods from supervised machine learning allow the classification of new data automatically and are tremendously helpful for data analysis. The quality of supervised maching learning depends not only on the type of algorithm used, but also on the quality of the labelled dataset used to train the classifier. Labelling instances in a training dataset is often done manually relying on selections and annotations by expert analysts, and is often a tedious and time-consuming process. Active learning algorithms can automatically determine a subset of data instances for which labels would provide useful input to the learning process. Interactive visual labelling techniques are a promising alternative, providing effective visual overviews from which an analyst can simultaneously explore data records and select items to a label. By putting the analyst in the loop, higher accuracy can be achieved in the resulting classifier. While initial results of interactive visual labelling techniques are promising in the sense that user labelling can improve supervised learning, many aspects of these techniques are still largely unexplored. This paper presents a study conducted using the mVis tool to compare three interactive visualisations, similarity map, scatterplot matrix (SPLOM), and parallel coordinates, with each other and with active learning for the purpose of labelling a multivariate dataset. The results show that all three interactive visual labelling techniques surpass active learning algorithms in terms of classifier accuracy, and that users subjectively prefer the similarity map over SPLOM and parallel coordinates for labelling. Users also employ different labelling strategies depending on the visualisation used
Recommended from our members
OAC Explorer: Interactive exploration and comparison of multivariate socioeconomic population characteristics
Recommended from our members
Exploring Uncertainty in Geodemographics with Interactive Graphics
Geodemographic classifiers characterise populations by categorising geographical areas according to the demographic
and lifestyle characteristics of those who live within them. The dimension-reducing quality of such classifiers provides a simple and effective means of characterising population through a manageable set of categories, but inevitably hides heterogeneity, which varies within and between the demographic categories and geographical areas, sometimes systematically. This may have implications for their use, which is widespread in government and commerce for planning, marketing and related activities. We use novel interactive graphics to delve into OAC – a free and open geodemographic classifier that classifies the UK population in over 200,000 small geographical areas into 7 super-groups, 21 groups and 52 sub-groups. Our graphics provide access to the original 41 demographic variables used in the classification and the uncertainty associated with the classification of each geographical area on-demand. It also supports comparison geographically and by category. This serves the dual purpose of helping understand the classifier itself leading to its more informed use and providing a more comprehensive view of population in a comprehensible manner. We assess the impact of these interactive graphics on experienced OAC users who explored the details of the classification, its uncertainty and the nature of between – and within – class variation and then reflect on their experiences. Visualization of the complexities and subtleties of the classification proved to be a thought-provoking exercise both confirming and challenging users’ understanding of population, the OAC classifier and the way it is used in their organisations. Users identified three contexts for which the techniques were deemed useful in the context of local government, confirming the validity of the proposed methods
- …