12,816 research outputs found
The nature of correlation perception in scatterplots
For scatterplots with gaussian distributions of dots, the perception of Pearson correlation r can be
described by two simple laws: a linear one for discrimination, and a logarithmic one for
perceived magnitude (Rensink & Baldridge, 2010). The underlying perceptual mechanisms,
however, remain poorly understood. To cast light on these, four different distributions of
datapoints were examined. The first had 100 points with equal variance in both dimensions.
Consistent with earlier results, just noticeable difference (JND) was a linear function of the distance away from r = 1, and the magnitude of perceived correlation a logarithmic function of this quantity. In addition, these laws were linked, with the intercept of the JND line being the inverse of the bias in perceived magnitude. Three other conditions were also examined: a dot cloud with 25 points, a horizontal compression of the cloud, and a cloud with a uniform distribution of dots. Performance was found to be similar in all conditions. The generality and form of these laws suggest that what underlies correlation perception is not a geometric structure such as the shape of the dot cloud, but the shape of the probability distribution of the dots, likely inferred via a form of ensemble coding. It is suggested that this reflects the ability of observers to perceive the information entropy in an image, with this quantity used as a proxy for Pearson
correlation
A randomized trial in a massive online open course shows people don't know what a statistically significant relationship looks like, but they can learn
Scatterplots are the most common way for statisticians, scientists, and the
public to visually detect relationships between measured variables. At the same
time, and despite widely publicized controversy, P-values remain the most
commonly used measure to statistically justify relationships identified between
variables. Here we measure the ability to detect statistically significant
relationships from scatterplots in a randomized trial of 2,039 students in a
statistics massive open online course (MOOC). Each subject was shown a random
set of scatterplots and asked to visually determine if the underlying
relationships were statistically significant at the P < 0.05 level. Subjects
correctly classified only 47.4% (95% CI: 45.1%-49.7%) of statistically
significant relationships, and 74.6% (95% CI: 72.5%-76.6%) of non-significant
relationships. Adding visual aids such as a best fit line or scatterplot smooth
increased the probability a relationship was called significant, regardless of
whether the relationship was actually significant. Classification of
statistically significant relationships improved on repeat attempts of the
survey, although classification of non-significant relationships did not. Our
results suggest: (1) that evidence-based data analysis can be used to identify
weaknesses in theoretical procedures in the hands of average users, (2) data
analysts can be trained to improve detection of statistically significant
results with practice, but (3) data analysts have incorrect intuition about
what statistically significant relationships look like, particularly for small
effects. We have built a web tool for people to compare scatterplots with their
corresponding p-values which is available here:
http://glimmer.rstudio.com/afisher/EDA/.Comment: 7 pages, including 2 figures and 1 tabl
Norm-based coding of voice identity in human auditory cortex
Listeners exploit small interindividual variations around a generic acoustical structure to discriminate and identify individuals from their voice—a key requirement for social interactions. The human brain contains temporal voice areas (TVA) [1] involved in an acoustic-based representation of voice identity [2, 3, 4, 5 and 6], but the underlying coding mechanisms remain unknown. Indirect evidence suggests that identity representation in these areas could rely on a norm-based coding mechanism [4, 7, 8, 9, 10 and 11]. Here, we show by using fMRI that voice identity is coded in the TVA as a function of acoustical distance to two internal voice prototypes (one male, one female)—approximated here by averaging a large number of same-gender voices by using morphing [12]. Voices more distant from their prototype are perceived as more distinctive and elicit greater neuronal activity in voice-sensitive cortex than closer voices—a phenomenon not merely explained by neuronal adaptation [13 and 14]. Moreover, explicit manipulations of distance-to-mean by morphing voices toward (or away from) their prototype elicit reduced (or enhanced) neuronal activity. These results indicate that voice-sensitive cortex integrates relevant acoustical features into a complex representation referenced to idealized male and female voice prototypes. More generally, they shed light on remarkable similarities in cerebral representations of facial and vocal identity
Visual and interactive exploration of point data
Point data, such as Unit Postcodes (UPC), can provide very detailed information at fine
scales of resolution. For instance, socio-economic attributes are commonly assigned to
UPC. Hence, they can be represented as points and observable at the postcode level.
Using UPC as a common field allows the concatenation of variables from disparate data
sources that can potentially support sophisticated spatial analysis. However, visualising
UPC in urban areas has at least three limitations. First, at small scales UPC occurrences
can be very dense making their visualisation as points difficult. On the other hand,
patterns in the associated attribute values are often hardly recognisable at large scales.
Secondly, UPC can be used as a common field to allow the concatenation of highly
multivariate data sets with an associated postcode. Finally, socio-economic variables
assigned to UPC (such as the ones used here) can be non-Normal in their distributions
as a result of a large presence of zero values and high variances which constrain their
analysis using traditional statistics.
This paper discusses a Point Visualisation Tool (PVT), a proof-of-concept system
developed to visually explore point data. Various well-known visualisation techniques
were implemented to enable their interactive and dynamic interrogation. PVT provides
multiple representations of point data to facilitate the understanding of the relations
between attributes or variables as well as their spatial characteristics. Brushing between
alternative views is used to link several representations of a single attribute, as well as
to simultaneously explore more than one variable. PVT’s functionality shows how the
use of visual techniques embedded in an interactive environment enable the exploration
of large amounts of multivariate point data
- …