7 research outputs found
Generalized Plot Matrices, Automatic Cognostics, and Efficient Data Exploration
Statistical visualization of large-scale data has become an increasingly essential task in the era of big data. In particular, exploratory data analysis and visualization is the first step towards any in-depth statistical modeling and analysis. Being able to rapidly specify and generate visualizations regardless of data-scale is crucial. Trelliscope handles data visualization at scale by attaching cognostics (univariate metrics) to each panel aiding in the organization of panels of interest. While Trelliscope provides a general framework for visualizing data at scale, there are several aspects that can be improved to help users generate displays more rapidly (such as cognostics, axis scales, etc.). When visually modeling complex data with Trelliscope, traditional two-grouped plot matrices do not allow for a mixed-scale axis to display both continuous and discrete data natively. Web-based visualization systems like Trelliscope, that retrieve information from a back-end service such as R, must maximize performance for an engaging user experience. Addressing the mixed-scale plot matrix axis, a generalized plot matrix is developed for two-grouped data which displays both continuous and discrete data using appropriate visualization methods for each panel. To compliment Trelliscope’s panel organization, automatic cognostic summaries are established by mapping the context of what is visualized to classes of metrics that are meaningful for each type of visualization layer at no additional user effort. Finally, communication from web-based visualization systems to back-end R services is greatly improved by leveraging the GraphQL query language which minimizes the number of required data queries needed to perform data extraction. Together, these three contributions curtail the increasing complexity and scale of data visualization
Parallel computing in linear mixed models
In this study, we propose a parallel programming method for linear mixed models (LMM) generated from big data. A commonly used algorithm, expectation maximization (EM), is preferred for its use of maximum likelihood estimations, as the estimations are stable and simple. However, EM has a high computation cost. In our proposed method, we use a divide and recombine to split the data into smaller subsets, running the algorithm steps in parallel on multiple local cores and combining the results. The proposed method is used to fit LMM with dense and sparse parameters and for large number of observations. It is faster than the classical approach and generalizes for big data. Supplementary sources for the proposed method are available in the R package lmmpar
GGally: Extension to 'ggplot2'
The R package 'ggplot2' is a plotting system based on the grammar of graphics. 'GGally' extends 'ggplot2' by adding several functions to reduce the complexity of combining geometric objects with transformed data. Some of these functions include a pairwise plot matrix, a two group pairwise plot matrix, a parallel coordinates plot, a survival plot, and several functions to plot networks
ggally: v1.1.0
GGally 1.1.0
ggcoef - New!
plot model coefficients with broom and ggplot2 PR#162
Plotting model coefficients (http://www.r-statistics.com/2010/07/visualization-of-regression-coefficients-in-r/)
gglegend - New!
pull out the legend of a plot which can also be used in ggpairs PR#155, PR#169
ggally_densityDiag
fixed bug where '...' was not respected (d0fe633)
ggally_smooth
added 'method' parameter (411213c)
ggally_ratio
Does not call ggfluctuation2 anymore. PR#165
ggcorr
fixed issue with unnamed correlation matrix used as input PR#146
fixed issue undesired shifting when layout.exp was > 0 PR#171
ggfluctuation2
is being deprecated. Please use ggally_ratio instead PR#165
ggnetworkmap
fixed issue with overlaying network on a world map PR#157
ggparcoord
Fixed odd bug where a list was trying to be forced as a double PR#162
ggpairs
Fixed improperly rotated axes with ggally_ratio PR#165
ggscatmat
added 'corMethod' parameter for use in upper triangle PR#145
ggsurv
size.est and size.ci parameters added PR#153
ordering changed to reflect survival time PR#147
added a vignette PR#154
wrap
documentation updated PR#152
changes default behavior only. If an argument is supplied, the argument will take precedence
github chat
https://gitter.im/ggobi/ggally is the place to visit for general questions.
travis-ci
cache packages for faster checking
install covr and lintr from github for testing purpose
ggobi/ggally: v1.3.2
GGally 1.3.2
ggpairs and ggduo
Removed warning where pure numeric names gave a warning (#238, @lepennec)
Fixed ordering issue with horizontal boxplots (#239)
ggparcoord
Fixed missing x aes requirement when shadebox is provided (#237, @treysp)
Package
Made igraph a non required dependency for tests (#240