7 research outputs found

    Generalized Plot Matrices, Automatic Cognostics, and Efficient Data Exploration

    Get PDF
    Statistical visualization of large-scale data has become an increasingly essential task in the era of big data. In particular, exploratory data analysis and visualization is the first step towards any in-depth statistical modeling and analysis. Being able to rapidly specify and generate visualizations regardless of data-scale is crucial. Trelliscope handles data visualization at scale by attaching cognostics (univariate metrics) to each panel aiding in the organization of panels of interest. While Trelliscope provides a general framework for visualizing data at scale, there are several aspects that can be improved to help users generate displays more rapidly (such as cognostics, axis scales, etc.). When visually modeling complex data with Trelliscope, traditional two-grouped plot matrices do not allow for a mixed-scale axis to display both continuous and discrete data natively. Web-based visualization systems like Trelliscope, that retrieve information from a back-end service such as R, must maximize performance for an engaging user experience. Addressing the mixed-scale plot matrix axis, a generalized plot matrix is developed for two-grouped data which displays both continuous and discrete data using appropriate visualization methods for each panel. To compliment Trelliscope’s panel organization, automatic cognostic summaries are established by mapping the context of what is visualized to classes of metrics that are meaningful for each type of visualization layer at no additional user effort. Finally, communication from web-based visualization systems to back-end R services is greatly improved by leveraging the GraphQL query language which minimizes the number of required data queries needed to perform data extraction. Together, these three contributions curtail the increasing complexity and scale of data visualization

    Parallel computing in linear mixed models

    No full text
    In this study, we propose a parallel programming method for linear mixed models (LMM) generated from big data. A commonly used algorithm, expectation maximization (EM), is preferred for its use of maximum likelihood estimations, as the estimations are stable and simple. However, EM has a high computation cost. In our proposed method, we use a divide and recombine to split the data into smaller subsets, running the algorithm steps in parallel on multiple local cores and combining the results. The proposed method is used to fit LMM with dense and sparse parameters and for large number of observations. It is faster than the classical approach and generalizes for big data. Supplementary sources for the proposed method are available in the R package lmmpar

    GGally: Extension to 'ggplot2'

    No full text
    The R package 'ggplot2' is a plotting system based on the grammar of graphics. 'GGally' extends 'ggplot2' by adding several functions to reduce the complexity of combining geometric objects with transformed data. Some of these functions include a pairwise plot matrix, a two group pairwise plot matrix, a parallel coordinates plot, a survival plot, and several functions to plot networks

    ggally: v1.1.0

    No full text
    GGally 1.1.0 ggcoef - New! plot model coefficients with broom and ggplot2 PR#162 Plotting model coefficients (http://www.r-statistics.com/2010/07/visualization-of-regression-coefficients-in-r/) gglegend - New! pull out the legend of a plot which can also be used in ggpairs PR#155, PR#169 ggally_densityDiag fixed bug where '...' was not respected (d0fe633) ggally_smooth added 'method' parameter (411213c) ggally_ratio Does not call ggfluctuation2 anymore. PR#165 ggcorr fixed issue with unnamed correlation matrix used as input PR#146 fixed issue undesired shifting when layout.exp was > 0 PR#171 ggfluctuation2 is being deprecated. Please use ggally_ratio instead PR#165 ggnetworkmap fixed issue with overlaying network on a world map PR#157 ggparcoord Fixed odd bug where a list was trying to be forced as a double PR#162 ggpairs Fixed improperly rotated axes with ggally_ratio PR#165 ggscatmat added 'corMethod' parameter for use in upper triangle PR#145 ggsurv size.est and size.ci parameters added PR#153 ordering changed to reflect survival time PR#147 added a vignette PR#154 wrap documentation updated PR#152 changes default behavior only. If an argument is supplied, the argument will take precedence github chat https://gitter.im/ggobi/ggally is the place to visit for general questions. travis-ci cache packages for faster checking install covr and lintr from github for testing purpose

    ggobi/ggally: v1.3.2

    No full text
    GGally 1.3.2 ggpairs and ggduo Removed warning where pure numeric names gave a warning (#238, @lepennec) Fixed ordering issue with horizontal boxplots (#239) ggparcoord Fixed missing x aes requirement when shadebox is provided (#237, @treysp) Package Made igraph a non required dependency for tests (#240
    corecore