408 research outputs found
Analyzing Large Collections of Electronic Text Using OLAP
Computer-assisted reading and analysis of text has various applications in
the humanities and social sciences. The increasing size of many electronic text
archives has the advantage of a more complete analysis but the disadvantage of
taking longer to obtain results. On-Line Analytical Processing is a method used
to store and quickly analyze multidimensional data. By storing text analysis
information in an OLAP system, a user can obtain solutions to inquiries in a
matter of seconds as opposed to minutes, hours, or even days. This analysis is
user-driven allowing various users the freedom to pursue their own direction of
research
Recommended from our members
Semantics-Space-Time Cube. A Conceptual Framework for Systematic Analysis of Texts in Space and Time
We propose an approach to analyzing data in which texts are associated with spatial and temporal references with the aim to understand how the text semantics vary over space and time. To represent the semantics, we apply probabilistic topic modeling. After extracting a set of topics and representing the texts by vectors of topic weights, we aggregate the data into a data cube with the dimensions corresponding to the set of topics, the set of spatial locations (e.g., regions), and the time divided into suitable intervals according to the scale of the planned analysis. Each cube cell corresponds to a combination (topic, location, time interval) and contains aggregate measures characterizing the subset of the texts concerning this topic and having the spatial and temporal references within these location and interval. Based on this structure, we systematically describe the space of analysis tasks on exploring the interrelationships among the three heterogeneous information facets, semantics, space, and time. We introduce the operations of projecting and slicing the cube, which are used to decompose complex tasks into simpler subtasks. We then present a design of a visual analytics system intended to support these subtasks. To reduce the complexity of the user interface, we apply the principles of structural, visual, and operational uniformity while respecting the specific properties of each facet. The aggregated data are represented in three parallel views corresponding to the three facets and providing different complementary perspectives on the data. The views have similar look-and-feel to the extent allowed by the facet specifics. Uniform interactive operations applicable to any view support establishing links between the facets. The uniformity principle is also applied in supporting the projecting and slicing operations on the data cube. We evaluate the feasibility and utility of the approach by applying it in two analysis scenarios using geolocated social media data for studying people's reactions to social and natural events of different spatial and temporal scales
A Descriptive Framework for Temporal Data Visualizations Based on Generalized Space-Time Cubes
International audienceWe present the generalized space-time cube, a descriptive model for visualizations of temporal data. Visualizations are described as operations on the cube, which transform the cube's 3D shape into readable 2D visualizations. Operations include extracting subparts of the cube, flattening it across space or time or transforming the cubes geometry and content. We introduce a taxonomy of elementary space-time cube operations and explain how these operations can be combined and parameterized. The generalized space-time cube has two properties: (1) it is purely conceptual without the need to be implemented, and (2) it applies to all datasets that can be represented in two dimensions plus time (e.g. geo-spatial, videos, networks, multivariate data). The proper choice of space-time cube operations depends on many factors, for example, density or sparsity of a cube. Hence, we propose a characterization of structures within space-time cubes, which allows us to discuss strengths and limitations of operations. We finally review interactive systems that support multiple operations, allowing a user to customize his view on the data. With this framework, we hope to facilitate the description, criticism and comparison of temporal data visualizations, as well as encourage the exploration of new techniques and systems. This paper is an extension of Bach et al.'s (2014) work
Integration of Data Mining and Data Warehousing: a practical methodology
The ever growing repository of data in all fields poses new challenges to the modern analytical
systems. Real-world datasets, with mixed numeric and nominal variables, are difficult to analyze and require effective visual exploration that conveys semantic relationships of data. Traditional data mining techniques such as clustering clusters only the numeric data. Little research has been carried out in tackling the problem of clustering high cardinality nominal variables to get better insight of underlying dataset. Several works in the literature proved the likelihood of integrating data mining with warehousing to discover knowledge from data. For the seamless integration, the mined data has
to be modeled in form of a data warehouse schema. Schema generation process is complex manual
task and requires domain and warehousing familiarity. Automated techniques are required to generate warehouse schema to overcome the existing dependencies. To fulfill the growing analytical needs and to overcome the existing limitations, we propose a novel methodology in this paper that permits efficient analysis of mixed numeric and nominal data, effective visual data exploration, automatic warehouse schema generation and integration of data mining and warehousing. The proposed methodology is evaluated by performing case study on real-world data set. Results show that multidimensional analysis can be performed in an easier and flexible way to discover meaningful
knowledge from large datasets
- …