31 research outputs found
Record-Linkage from a Technical Point of View
TRecord linkage is used for preparing sampling frames, deduplication of lists and combining information on the same object from two different databases. If the identifiers of the same objects in two different databases have error free unique common identifiers like personal identification numbers (PID), record linkage is a simple file merge operation. If the identifiers contains errors, record linkage is a challenging task. In many applications, the files have widely different numbers of observations, for example a few thousand records of a sample survey and a few million records of an administrative database of social security numbers. Available software, privacy issues and future research topics are discussed.Record-Linkage, Data-mining, Privacy preserving protocols
The relation between Pearson's correlation coefficient r and Salton's cosine measure
The relation between Pearson's correlation coefficient and Salton's cosine
measure is revealed based on the different possible values of the division of
the L1-norm and the L2-norm of a vector. These different values yield a sheaf
of increasingly straight lines which form together a cloud of points, being the
investigated relation. The theoretical results are tested against the author
co-citation relations among 24 informetricians for whom two matrices can be
constructed, based on co-citations: the asymmetric occurrence matrix and the
symmetric co-citation matrix. Both examples completely confirm the theoretical
results. The results enable us to specify an algorithm which provides a
threshold value for the cosine above which none of the corresponding Pearson
correlations would be negative. Using this threshold value can be expected to
optimize the visualization of the vector space
GraphMaps: Browsing Large Graphs as Interactive Maps
Algorithms for laying out large graphs have seen significant progress in the
past decade. However, browsing large graphs remains a challenge. Rendering
thousands of graphical elements at once often results in a cluttered image, and
navigating these elements naively can cause disorientation. To address this
challenge we propose a method called GraphMaps, mimicking the browsing
experience of online geographic maps.
GraphMaps creates a sequence of layers, where each layer refines the previous
one. During graph browsing, GraphMaps chooses the layer corresponding to the
zoom level, and renders only those entities of the layer that intersect the
current viewport. The result is that, regardless of the graph size, the number
of entities rendered at each view does not exceed a predefined threshold, yet
all graph elements can be explored by the standard zoom and pan operations.
GraphMaps preprocesses a graph in such a way that during browsing, the
geometry of the entities is stable, and the viewer is responsive. Our case
studies indicate that GraphMaps is useful in gaining an overview of a large
graph, and also in exploring a graph on a finer level of detail.Comment: submitted to GD 201
Record-linkage from a technical point of view
"Record linkage is used for preparing sampling frames, deduplication of lists and combining information on the same object from two different databases. If the identifiers of the same objects in two different databases have error free unique common identifiers like personal identification numbers (PID), record linkage is a simple file merge operation. If the identifiers contain errors, record linkage is a challenging task. In many applications, the files have widely different numbers of observations, for example a few thousand records of a sample survey and a few million records of an administrative database of social security numbers. Available software, privacy issues and future research topics are discussed." [author's abstract
Animating the development of Social Networks over time using a dynamic extension of multidimensional scaling
The animation of network visualizations poses technical and theoretical
challenges. Rather stable patterns are required before the mental map enables a
user to make inferences over time. In order to enhance stability, we developed
an extension of stress-minimization with developments over time. This dynamic
layouter is no longer based on linear interpolation between independent static
visualizations, but change over time is used as a parameter in the
optimization. Because of our focus on structural change versus stability the
attention is shifted from the relational graph to the latent eigenvectors of
matrices. The approach is illustrated with animations for the journal citation
environments of Social Networks, the (co-)author networks in the carrying
community of this journal, and the topical development using relations among
its title words. Our results are also compared with animations based on
PajekToSVGAnim and SoNIA