Ensemble clustering in visual working memory biases location memories and reduces the Weber noise of relative positions

Abstract

People seem to compute the ensemble statistics of objects and use this information to support the recall of individual objects in visual working memory. However, there are many different ways that hierarchical structure might be encoded. We examined the format of structured memories by asking subjects to recall the locations of objects arranged in different spatial clustering structures. Consistent with previous investigations of structured visual memory, subjects recalled objects biased toward the center of their clusters. Subjects also recalled locations more accurately when they were arranged in fewer clusters containing more objects, suggesting that subjects used the clustering structure of objects to aid recall. Furthermore, subjects had more difficulty recalling larger relative distances, consistent with subjects encoding the positions of objects relative to clusters and recalling them with magnitude-proportional (Weber) noise. Our results suggest that clustering improved the fidelity of recall by biasing the recall of locations toward cluster centers to compensate for uncertainty and by reducing the magnitude of encoded relative distances. Introduction Our visual working memory is limited in its ability to remember objects. In addition to remembering the individual elements of scenes, people may also extract the higher-order structure of an image, such as the elements' average size (e.g., In contrast to the traditional assumption that objects in visual working memory are encoded independently The structure of multiple objects may also constrain the individual constituent objects more rigidly into multiobject ''chunks'' Additionally, studies of spatial memory suggest that people encode the relative positions of objects: Rather than remember the absolute position of a paper, you may remember its position relative to your desk (e.g., the paper is one foot northwest of your desk; Here we evaluate these dimensions of visual memory structure by asking people to remember and report the locations of objects arranged in different spatial clustering structures. Subjects recalled objects more accurately when they were arranged in fewer clusters that each contained more objects separated by smaller relative distances. To directly evaluate the format of subjects' structured memories, we compared human behavior to that of three cognitive models: a hard chunking model, a hierarchical generative model, and a relative position model. The relative position model best accounted for human performance, followed closely by the hierarchical generative model, with the hard chunking model missing key aspects of human behavior. Our results demonstrate two compatible ways in which hierarchical encoding improves the fidelity of visual working memory. First, objects are biased toward their ensemble statistics to compensate for uncertainty about individual object properties. Second, objects are encoded relative to their parents in the hierarchy, and relative positions are corrupted by Weber noise, 1 such that larger relative distances yield greater errors. Experiment To distinguish different hierarchical encoding strategies that people may use, we asked subjects to report the positions of objects arranged in different clustering structures. Different encoding strategies yielded distinct patterns of errors across scenes that varied in the number of objects and the number of clusters in which they were arranged. Thus, we then examined if subjects' responses across different types of environments were consistent with different forms of structured encoding. Methods Subjects Thirty-five students from the University of California, San Diego, participated for course credit. Stimuli We generated 70 environments, each containing objects arranged into different clustering structures. We selected 440 images from Brady, Konkle, Alvarez, and Oliva Each environment had one of seven clustering structures: four clusters each containing one object (4C1), two clusters containing two objects (2C2), 1C4, 8C1, 4C2, 2C4, 1C8 Journal of Vision Results Did subjects encode objects according to their clustering structure? If subjects did encode and utilize the clustering structure of objects instead of independently encoding objects, the errors for objects in the same cluster should be more similar (in the same direction) than expected by chance. We defined the similarity of the errors (q) in reporting the locations of two objects as where x i and x j are vectors containing the spatial translational error of the two objects' reported locations. The numerator is the projection of the translational error vectors with positive values indicating vectors in the same direction and negative values indicating vectors in the opposite direction. The denominator normalizes the numerator, such that q falls between À1 and 1. Thus, if the recalled locations of two objects were both shifted in exactly the same direction, q would be 1; if they were shifted in orthogonal directions, q would be 0; and if they shifted in opposite directions, q would be À1. We calculated the translational error similarity (q) of objects in the same cluster for each environment How did clustering structure affect recall fidelity? If subjects encoded objects independently, then clustering structures should not have affected how accurately subjects recalled locations. We assessed the effect of clustering structure upon the fidelity of recall by calculating the root mean square error (RMSE 2 ) of subjects' responses ( Error model Thus far, we have demonstrated that subjects did not encode objects independently. Given that subjects appeared to use the clustering structure of objects, how did that structure constrain the locations of objects? Did subjects encode objects using hard chunking, a hierarchical generative model, and/or a relative position tree? These encoding models predict different levels of reliance on (and bias toward) objects' hierarchical structure and different patterns of noise. To determine what type(s) of structured encoding subjects' errors were consistent with, we constructed an error model that estimates the extent of errors due to misassociations, bias, and noise. First, subjects may have had difficulty remembering which objects were in which locations. We estimated the probability of correctly matching an object to its location, p T , and the probability of making a misassociation between an object and another object's location, p M ¼ 1 -p T . The probability of misassociating to a particular location then was p M nÀ1 , where n is the number of locations. To determine exactly to which location each object was misassociated, we assumed a bijective mapping of objects to locations (f), such that only one object could be paired with each location. f À1 (i) denotes the inverse mapping from locations to objects. Second, subjects may have been uncertain about objects' locations but used their memories of cluster locations to inform their responses. This would have resulted in objects being drawn toward their clusters. We accounted for two types of such ''regularization'' bias: the degree to which clusters are drawn toward the global centroid of all objects (cluster-to-global bias, b c ) and the degree to which objects are drawn toward their cluster centers (object-to-cluster bias, b o ). Here, a bias of zero indicates the object/cluster is unbiased, and a bias of one indicates the element is drawn completely toward its parent. To parameterize how the locations of objects would be shifted by these sources of bias, we decomposed the true locations of objects, t, into their relative positions and then weighted the relative positions by the bias parameters. The decomposition of the true locations yielded a relative position tree in which the locations of objects were represented relative to their clusters (x), the locations of clusters were relative to the global centroid (c), and the global centroid (g) was the mean of the true locations (t). Conditional on the mapping f À1 (i) of the true locations t to response locations s, the position of an object i's cluster relative to the global center was defined by where M() maps objects to the clusters of which they are members, and C is the absolute position of the cluster center, calculated by averaging the locations of all objects in that cluster. Similarly, the positions of objects relative to their clusters were defined by We then weighted the relative positions of clusters and objects by the cluster-to-global bias (b c ) and the object-to-cluster bias (b o ), respectively. Thus, the biased absolute positions of an object, b i , were Finally, subjects may have remembered locations with some imprecision. To account for this, the model includes three levels of spatial noise that might induce correlations in errors across objects: that which is shared globally across all object locations (r g ), for locations within the same cluster (r c ), and individual object locations (r o ). This decomposition of object positions induces an expected correlation structure on the errors in reporting individual objects, which can be parameterized with a covariance matrix, R, of the form where the three conditions reflect (in order) error covariance shared by all objects, error covariance for objects in the same cluster, and error variance for individual objects. Let H be the set of parameters {p M , b c , b o , r g , r c , r o }. Altogether, for each environment, the likelihood of a set of responses given the targets and parameters was LIKðsjt; f; HÞ ¼ ðp where s denotes the response locations, n is the number of objects, n T is the number of objects correctly mapped to their locations by f, and n M is the number of objects incorrectly mapped to their locations by f. We estimated these parameters (f, p M , b c , b o , r g , r c , r o ) for each environment across subjects using a Markov chain Monte Carlo algorithm (see Appendix C for more details concerning our Markov chain Monte Carlo algorithm and Appendix D for all parameter fits). Did subjects encode objects in addition to their hierarchical structure? Encoding objects as components of hard chunks or a hierarchical generative model should result in distinct patterns of object-to-cluster bias. If subjects encoded objects as hard chunks, they should have retained minimal information about the objects' locations and recalled the objects with a large bias toward their respective cluster centers. If subjects encoded objects in a hierarchical generative model, then they should have recalled objects with more bias toward their cluster centers when clusters contained more objects. Intuitively, subjects can more precisely estimate the centers of clusters that contain more objects and consequently should rely on those clusters more when they are uncertain about the locations of the individual objects. The bias of objects toward clusters was consistently low (b o : M ¼ .19, SEM ¼ .02, max ¼ .62), suggesting that subjects remembered the locations of individual objects within their clustering structure rather than storing chunks and discarding their internal components. Additionally, contrary to the pattern of bias we expected to find if subjects encoded objects in a hierarchical generative model, as objects were arranged in fewer clusters containing more objects, the objects tended to be recalled with less bias toward their clusters Did subjects encode objects in a relative position tree? Subjects may have encoded objects in a relative position tree, wherein object positions are coded as relative offsets from the cluster centers, and cluster centers are coded as relative offsets from the global center. At first glance, this is no different from encoding the objects according to their absolute position. However, if relative positions are recalled with Weber noise Under such a relative encoding scheme, environments that happened to contain more dispersed clusters 3 require larger relative distances to represent positions. Consequently, as the dispersion of clusters in the environment increases, subjects should recall clusters less precisely (that is, r c should increase). The dispersion of clusters in an environment was significantly correlated with the precision with which subjects recalled cluster centers (r ¼ 0.38 p , 0.01) Comparing chunking, hierarchical generative, and relative position models To directly test explicit formulations of different encoding theories, we designed three cognitive models that would encode a display and generate responses according to its biases: a hard chunking model that only remembers clusters, a hierarchical generative model that encodes absolute positions (similar to Orhan & Jacobs, 2013), and a model that encodes objects in a relative position tree and recalls relative positions with Weber noise. Each model uses a nonparametric Dirichlet process to determine the clustering of the objects Nonparametric Dirichlet process We used a nonparametric Dirichlet process to determine the clustering structure of the objects Chunking model The hard chunking model uses solely information about the clusters and which objects belong to which clusters to recall the locations of objects. Importantly, the chunking model knows nothing about the locations of the individual objects. Instead, the model recalls the location of an object by randomly sampling from the object's cluster based on the center and standard deviation of the cluster estimated by the Dirichlet process. The model has no free parameters. Hierarchical generative model The hierarchical generative model uses knowledge of clusters' locations to compensate for uncertainty in the individual objects' locations. This model is similar to the Dirichlet process mixture model used by The hierarchical generative model noisily encodes the absolute locations of all the objects as well as the properties of their clusters. Because the model pools memories of individual objects to determine the mean and dispersion of their respective clusters, each additional object in a cluster allows the model to estimate the position of that cluster more precisely. This model uses the same process to estimate the precision of the global center from the locations of the clusters. During recall, the model first recalls the locations of the clusters by averaging the positions of the clusters and global center, weighted by their precisions. The model then recalls the locations of individual objects by averaging the positions of the objects and their clusters, weighted by the precision of the encoded object locations and the posterior predictive spread of objects within a cluster, respectively. This model has one free parameter: the noise with which objects are encoded. We set the noise parameter to the average object location noise (r o ) estimated by our error model separately for the four-object and eight-object conditions. Relative position model The relative position model remembers the relative positions of objects and clusters with Weber noise and uses clustering to reduce the magnitude of relative positions. Using the clustering structure inferred by the Dirichlet process, the relative position model remembers the positions of objects relative to their clusters and the clusters relative to the global center. The model encodes relative positions via their distance and angle and recalls them with circular Gaussian noise on angle and proportional (Weber) noise on distance. The angular and distance noise are captured by two free parameters. We fit the model separately for the fourobject and eight-object conditions. Can the models predict the difficulty of environments? We tested whether the models could predict the difficulty, measured in RMSE, of each of the environments across and within clustering structures General discussion People can encode more information about multiple objects if they exploit the objects' shared statistical structure rather than encoding them independently. We considered several ways people might use this structure when encoding objects and found that in addition to using a hierarchical generative model to infer object properties, people also use the hierarchy to encode object properties as relative offsets from the central tendency of their group. Because relative positions seem to be recalled with Weber noise, hierarchical clustering reduces the number of large distances that subjects encoded and thus increases overall accuracy

    Similar works