5 research outputs found
Overlap Removal of Dimensionality Reduction Scatterplot Layouts
Dimensionality Reduction (DR) scatterplot layouts have become a ubiquitous
visualization tool for analyzing multidimensional data items with presence in
different areas. Despite its popularity, scatterplots suffer from occlusion,
especially when markers convey information, making it troublesome for users to
estimate items' groups' sizes and, more importantly, potentially obfuscating
critical items for the analysis under execution. Different strategies have been
devised to address this issue, either producing overlap-free layouts, lacking
the powerful capabilities of contemporary DR techniques in uncover interesting
data patterns, or eliminating overlaps as a post-processing strategy. Despite
the good results of post-processing techniques, the best methods typically
expand or distort the scatterplot area, thus reducing markers' size (sometimes)
to unreadable dimensions, defeating the purpose of removing overlaps. This
paper presents a novel post-processing strategy to remove DR layouts' overlaps
that faithfully preserves the original layout's characteristics and markers'
sizes. We show that the proposed strategy surpasses the state-of-the-art in
overlap removal through an extensive comparative evaluation considering
multiple different metrics while it is 2 or 3 orders of magnitude faster for
large datasets.Comment: 11 pages and 9 figure
Generalized topographic block model
Co-clustering leads to parsimony in data visualisation with a number of parameters dramatically reduced in comparison to the dimensions of the data sample. Herein, we propose a new generalized approach for nonlinear mapping by a re-parameterization of the latent block mixture model. The densities modeling the blocks are in an exponential family such that the Gaussian, Bernoulli and Poisson laws are particular cases. The inference of the parameters is derived from the block expectation–maximization algorithm with a Newton–Raphson procedure at the maximization step. Empirical experiments with textual data validate the interest of our generalized model
Textual data summarization using the Self-Organized Co-Clustering model
International audienceRecently, different studies have demonstrated the use of co-clustering, a data mining technique which simultaneously produces row-clusters of observations and column-clusters of features. The present work introduces a novel co-clustering model to easily summarize textual data in a document-term format. In addition to highlighting homogeneous co-clusters as other existing algorithms do we also distinguish noisy co-clusters from significant co-clusters, which is particularly useful for sparse document-term matrices. Furthermore, our model proposes a structure among the significant co-clusters, thus providing improved interpretability to users. The approach proposed contends with state-of-the-art methods for document and term clustering and offers user-friendly results. The model relies on the Poisson distribution and on a constrained version of the Latent Block Model, which is a probabilistic approach for co-clustering. A Stochastic Expectation-Maximization algorithm is proposed to run the model’s inference as well as a model selection criterion to choose the number of coclusters. Both simulated and real data sets illustrate the eciency of this model by its ability to easily identify relevant co-clusters
Recommended from our members
Patient Record Summarization Through Joint Phenotype Learning and Interactive Visualization
Complex patient are becoming more and more of a challenge to the health care system given the amount of care they require and the amount of documentation needed to keep track of their state of health and treatment. Record keeping using the EHR makes this easier but mounting amounts of patient data also means that clinicians are faced with information overload. Information overload has been shown to have deleterious effects on care, with increased safety concerns due to missed information. Patient record summarization has been a promising mitigator for information overload. Subsequently, a lot of research has been dedicated to record summarization since the introduction of EHRs. In this dissertation we examine whether unsupervised inference methods can derive patient problem-oriented summaries, that are robust to different patients. By grounding our experiments with HIV patients we leverage the data of a group of patients that are similar in that they share one common disease (HIV) but also exhibit complex histories of diverse comorbidities. Using a user-centered, iterative design process, we design an interactive, longitudinal patient record summarization tool, that leverages automated inferences about the patient's problems. We find that unsupervised, joint learning of problems using correlated topic models, adapted to handle the multiple data types (structured and unstructured) of the EHR, is successful in identifying the salient problems of complex patients. Utilizing interactive visualization that exposes inference results to users enables them to make sense of a patient's problems over time and to answer questions about a patient more accurately and faster than using the EHR alone