80,792 research outputs found
Representation Independent Analytics Over Structured Data
Database analytics algorithms leverage quantifiable structural properties of
the data to predict interesting concepts and relationships. The same
information, however, can be represented using many different structures and
the structural properties observed over particular representations do not
necessarily hold for alternative structures. Thus, there is no guarantee that
current database analytics algorithms will still provide the correct insights,
no matter what structures are chosen to organize the database. Because these
algorithms tend to be highly effective over some choices of structure, such as
that of the databases used to validate them, but not so effective with others,
database analytics has largely remained the province of experts who can find
the desired forms for these algorithms. We argue that in order to make database
analytics usable, we should use or develop algorithms that are effective over a
wide range of choices of structural organizations. We introduce the notion of
representation independence, study its fundamental properties for a wide range
of data analytics algorithms, and empirically analyze the amount of
representation independence of some popular database analytics algorithms. Our
results indicate that most algorithms are not generally representation
independent and find the characteristics of more representation independent
heuristics under certain representational shifts
Compact representation of one-particle wavefunctions and scalar fields obtained from electronic-structure calculations
We present a code-independent compact representation of one-electron
wavefunctions and other volumetric data (electron density, electrostatic
potential, etc.) produced by electronic-structure calculations. The compactness
of the representation insures minimization of digital storage requirements for
the computational data, while the code-independence makes the data ready for
"big data" analytics. Our approach allows to minimize differences between
original and the new representation, and is in principle information-lossless.
The procedure for obtaining the wavefunction representation is closely related
to construction of natural atomic orbitals, and benefits from the localization
of Wannier functions. Thus, our approach fits perfectly any infrastructure
providing a code-independent tool set for electronic-structure data analysis
parallelMCMCcombine: An R Package for Bayesian Methods for Big Data and Analytics
Recent advances in big data and analytics research have provided a wealth of
large data sets that are too big to be analyzed in their entirety, due to
restrictions on computer memory or storage size. New Bayesian methods have been
developed for large data sets that are only large due to large sample sizes;
these methods partition big data sets into subsets, and perform independent
Bayesian Markov chain Monte Carlo analyses on the subsets. The methods then
combine the independent subset posterior samples to estimate a posterior
density given the full data set. These approaches were shown to be effective
for Bayesian models including logistic regression models, Gaussian mixture
models and hierarchical models. Here, we introduce the R package
parallelMCMCcombine which carries out four of these techniques for combining
independent subset posterior samples. We illustrate each of the methods using a
Bayesian logistic regression model for simulation data and a Bayesian Gamma
model for real data; we also demonstrate features and capabilities of the R
package. The package assumes the user has carried out the Bayesian analysis and
has produced the independent subposterior samples outside of the package. The
methods are primarily suited to models with unknown parameters of fixed
dimension that exist in continuous parameter spaces. We envision this tool will
allow researchers to explore the various methods for their specific
applications, and will assist future progress in this rapidly developing field.Comment: for published version see:
http://www.plosone.org/article/fetchObject.action?uri=info%3Adoi%2F10.1371%2Fjournal.pone.0108425&representation=PD
Semantic Modeling of Analytic-based Relationships with Direct Qualification
Successfully modeling state and analytics-based semantic relationships of
documents enhances representation, importance, relevancy, provenience, and
priority of the document. These attributes are the core elements that form the
machine-based knowledge representation for documents. However, modeling
document relationships that can change over time can be inelegant, limited,
complex or overly burdensome for semantic technologies. In this paper, we
present Direct Qualification (DQ), an approach for modeling any semantically
referenced document, concept, or named graph with results from associated
applied analytics. The proposed approach supplements the traditional
subject-object relationships by providing a third leg to the relationship; the
qualification of how and why the relationship exists. To illustrate, we show a
prototype of an event-based system with a realistic use case for applying DQ to
relevancy analytics of PageRank and Hyperlink-Induced Topic Search (HITS).Comment: Proceedings of the 2015 IEEE 9th International Conference on Semantic
Computing (IEEE ICSC 2015
GraphX: Unifying Data-Parallel and Graph-Parallel Analytics
From social networks to language modeling, the growing scale and importance
of graph data has driven the development of numerous new graph-parallel systems
(e.g., Pregel, GraphLab). By restricting the computation that can be expressed
and introducing new techniques to partition and distribute the graph, these
systems can efficiently execute iterative graph algorithms orders of magnitude
faster than more general data-parallel systems. However, the same restrictions
that enable the performance gains also make it difficult to express many of the
important stages in a typical graph-analytics pipeline: constructing the graph,
modifying its structure, or expressing computation that spans multiple graphs.
As a consequence, existing graph analytics pipelines compose graph-parallel and
data-parallel systems using external storage systems, leading to extensive data
movement and complicated programming model.
To address these challenges we introduce GraphX, a distributed graph
computation framework that unifies graph-parallel and data-parallel
computation. GraphX provides a small, core set of graph-parallel operators
expressive enough to implement the Pregel and PowerGraph abstractions, yet
simple enough to be cast in relational algebra. GraphX uses a collection of
query optimization techniques such as automatic join rewrites to efficiently
implement these graph-parallel operators. We evaluate GraphX on real-world
graphs and workloads and demonstrate that GraphX achieves comparable
performance as specialized graph computation systems, while outperforming them
in end-to-end graph pipelines. Moreover, GraphX achieves a balance between
expressiveness, performance, and ease of use
Large High Resolution Displays for Co-Located Collaborative Intelligence Analysis
Large, high-resolution vertical displays carry the potential to increase the accuracy of collaborative sensemaking, given correctly designed visual analytics tools. From an exploratory user study using a fictional intelligence analysis task, we investigated how users interact with the display to construct spatial schemas and externalize information, as well as how they establish shared and private territories. We investigated the spatial strategies of users partitioned by tool type used (document- or entity-centric). We classified the types of territorial behavior exhibited in terms of how the users interacted with the display (integrated or independent workspaces). Next, we examined how territorial behavior impacted the common ground between the pairs of users. Finally, we recommend design guidelines for building co-located collaborative visual analytics tools specifically for use on large, high-resolution vertical displays
- …