Search CORE

3 research outputs found

Doctor of Philosophy

Author: Nguyen Hoa Thanh
Publication venue: University of Utah
Publication date: 01/01/2017
Field of study

dissertationCorrelation is a powerful relationship measure used in many fields to estimate trends and make forecasts. When the data are complex, large, and high dimensional, correlation identification is challenging. Several visualization methods have been proposed to solve these problems, but they all have limitations in accuracy, speed, or scalability. In this dissertation, we propose a methodology that provides new visual designs that show details when possible and aggregates when necessary, along with robust interactive mechanisms that together enable quick identification and investigation of meaningful relationships in large and high-dimensional data. We propose four techniques using this methodology. Depending on data size and dimensionality, the most appropriate visualization technique can be provided to optimize the analysis performance. First, to improve correlation identification tasks between two dimensions, we propose a new correlation task-specific visualization method called correlation coordinate plot (CCP). CCP transforms data into a powerful coordinate system for estimating the direction and strength of correlations among dimensions. Next, we propose three visualization designs to optimize correlation identification tasks in large and multidimensional data. The first is snowflake visualization (Snowflake), a focus+context layout for exploring all pairwise correlations. The next proposed design is a new interactive design for representing and exploring data relationships in parallel coordinate plots (PCPs) for large data, called data scalable parallel coordinate plots (DSPCP). Finally, we propose a novel technique for storing and accessing the multiway dependencies through visualization (MultiDepViz). We evaluate these approaches by using various use cases, compare them to prior work, and generate user studies to demonstrate how our proposed approaches help users explore correlation in large data efficiently. Our results confirmed that CCP/Snowflake, DSPCP, and MultiDepViz methods outperform some current visualization techniques such as scatterplots (SCPs), PCPs, SCP matrix, Corrgram, Angular Histogram, and UntangleMap in both accuracy and timing. Finally, these approaches are applied in real-world applications such as a debugging tool, large-scale code performance data, and large-scale climate data

The University of Utah: J. Willard Marriott Digital Library

Efficient query processing on graph databases

Author: Cheng J.
Goldman R.
Guting R. H.
Holder L. B.
James Cheng
Jiang H.
Manku G. S.
Milo T.
Ng W.
Srinivasa S.
Wilfred Ng
Williams D. W.
Yan X.
Yiping Ke
Yu J. X.
Zhang N.
Zhang S.
Zhao P.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Efficient Correlation Search from Graph Databases

Author: Cheng James
Ke Yiping
Ng Wilfred
Publication venue
Publication date: 01/01/2008
Field of study

We propose a new problem of correlation mining from graph databases, called Correlated Graph Search (CGS). CGS adopts Pearson's correlation coefficient as the correlation measure to take into account the occurrence distributions of graphs. However, the CGS problem poses significant challenges, since every subgraph of a graph in the database is a candidate, but the number of subgraphs is exponential. We derive two necessary conditions that set bounds on the occurrence probability of a candidate in the database. With this result, we devise an efficient algorithm that mines the candidate set from a much smaller projected database, and thus, we are able to obtain a significantly smaller set of candidates. Three heuristic rules are further developed to refine the candidate set. We also make use of the bounds to directly answer high-support queries without mining the candidates. Our experimental results demonstrate the efficiency of our algorithm. Finally, we show that our algorithm provides a general solution when most of the commonly used correlation measures are used to generalize the CGS problem

Hong Kong University of Science and Technology Institutional Repository