Search CORE

1,753 research outputs found

The Problem of Yield Curve Shifts Predition: The Neural Networks Approach

Author: Melnikov Roman
Publication venue
Publication date
Field of study

Research Papers in Economics

Recommended from our members

Brushing dimensions--a dual visual analysis model for high-dimensional data

Author: Filzmoser P.
Hauser H.
Turkay C.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

In many application fields, data analysts have to deal with datasets that contain many expressions per item. The effective analysis of such multivariate datasets is dependent on the user's ability to understand both the intrinsic dimensionality of the dataset as well as the distribution of the dependent values with respect to the dimensions. In this paper, we propose a visualization model that enables the joint interactive visual analysis of multivariate datasets with respect to their dimensions as well as with respect to the actual data values. We describe a dual setting of visualization and interaction in items space and in dimensions space. The visualization of items is linked to the visualization of dimensions with brushing and focus+context visualization. With this approach, the user is able to jointly study the structure of the dimensions space as well as the distribution of data items with respect to the dimensions. Even though the proposed visualization model is general, we demonstrate its application in the context of a DNA microarray data analysis

City Research Online

Crossref

Dwarf: A Complete System for Analyzing High-Dimensional Data Sets

Author: Sismanis John
Publication venue
Publication date: 23/08/2004
Field of study

The need for data analysis by different industries, including telecommunications, retail, manufacturing and financial services, has generated a flurry of research, highly sophisticated methods and commercial products. However, all of the current attempts are haunted by the so-called "high-dimensionality curse"; the complexity of space and time increases exponentially with the number of analysis "dimensions". This means that all existing approaches are limited only to coarse levels of analysis and/or to approximate answers with reduced precision. As the need for detailed analysis keeps increasing, along with the volume and the detail of the data that is stored, these approaches are very quickly rendered unusable. I have developed a unique method for efficiently performing analysis that is not affected by the high-dimensionality of data and scales only polynomially -and almost linearly- with the dimensions without sacrificing any accuracy in the returned results. I have implemented a complete system (called "Dwarf") and performed an extensive experimental evaluation that demonstrated tremendous improvements over existing methods for all aspects of performing analysis -initial computation, storing, querying and updating it. I have extended my research to the "data-streaming" model where updates are performed on-line, exacerbating any concurrent analysis but has a very high impact on applications like security, network management/monitoring router traffic control and sensor networks. I have devised streaming algorithms that provide complex statistics within user-specified relative-error bounds over a data stream. I introduced the class of "distinct implicated statistics", which is much more general than the established class of "distinct count" statistics. The latter has been proved invaluable in applications such as analyzing and monitoring the distinct count of species in a population or even in query optimization. The "distinct implicated statistics" class provides invaluable information about the correlations in the stream and is necessary for applications such as security. My algorithms are designed to use bounded amounts of memory and processing -so that they can even be implemented in hardware for resource-limited environments such as network-routers or sensors- and also to work in "noisy" environments, where some data may be flawed either implicitly due to the extraction process or explicitly

Digital Repository at the University of Maryland

Business Intelligence for Small and Middle-Sized Entreprises

Author: Chauchat Jean-Hugues
Darmont Jérôme
Grabova Oksana
Zolotaryova Iryna
Publication venue
Publication date: 01/01/2010
Field of study

Data warehouses are the core of decision support sys- tems, which nowadays are used by all kind of enter- prises in the entire world. Although many studies have been conducted on the need of decision support systems (DSSs) for small businesses, most of them adopt ex- isting solutions and approaches, which are appropriate for large-scaled enterprises, but are inadequate for small and middle-sized enterprises. Small enterprises require cheap, lightweight architec- tures and tools (hardware and software) providing on- line data analysis. In order to ensure these features, we review web-based business intelligence approaches. For real-time analysis, the traditional OLAP architecture is cumbersome and storage-costly; therefore, we also re- view in-memory processing. Consequently, this paper discusses the existing approa- ches and tools working in main memory and/or with web interfaces (including freeware tools), relevant for small and middle-sized enterprises in decision making

arXiv.org e-Print Archive

CiteSeerX

HAL

Hal-Diderot

Design of petroleum company's metadata and an effective knowledge mapping methodology

Author: Dreher Heinz
Nimmagadda Shastri
Publication venue: IASTED - International Association of Science and Technology for Development
Publication date: 01/01/2007
Field of study

Success of information flow depends on intelligent datastorage and its management in a multi-disciplinaryenvironment. Multi-dimensional data entities, data typesand ambiguous semantics, often pose uncertainty andinconsistency in data retrieval from volumes of petroleumdata sources. In our approach, conceptual schemas andsub-schemas have been described based on variousoperational functions of the petroleum industry. Theseschemas are integrated, to ensure their consistency andvalidity, so that the information retrieved from anintegrated metadata (in the form of a data warehouse)structure derives its authenticity from its implementation.The data integration process validating the petroleummetadata has been demonstrated for one of the Gulfoffshore basins for an effective knowledge mapping andinterpreting it successfully for the derivation of usefulgeological knowledge. Warehoused data are used formining data patterns, trends and correlations amongknowledge-base data attributes that led to interpretation ofinteresting geological features. These technologies appearto be more amenable for exploration of more petroleumresources in the mature gulf basins

espace@Curtin

Recommended from our members

Semantics-Space-Time Cube. A Conceptual Framework for Systematic Analysis of Texts in Space and Time

Author: Andrienko G.
Andrienko N.
Chen S.
Chen W.
Li J.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/04/2018
Field of study

We propose an approach to analyzing data in which texts are associated with spatial and temporal references with the aim to understand how the text semantics vary over space and time. To represent the semantics, we apply probabilistic topic modeling. After extracting a set of topics and representing the texts by vectors of topic weights, we aggregate the data into a data cube with the dimensions corresponding to the set of topics, the set of spatial locations (e.g., regions), and the time divided into suitable intervals according to the scale of the planned analysis. Each cube cell corresponds to a combination (topic, location, time interval) and contains aggregate measures characterizing the subset of the texts concerning this topic and having the spatial and temporal references within these location and interval. Based on this structure, we systematically describe the space of analysis tasks on exploring the interrelationships among the three heterogeneous information facets, semantics, space, and time. We introduce the operations of projecting and slicing the cube, which are used to decompose complex tasks into simpler subtasks. We then present a design of a visual analytics system intended to support these subtasks. To reduce the complexity of the user interface, we apply the principles of structural, visual, and operational uniformity while respecting the specific properties of each facet. The aggregated data are represented in three parallel views corresponding to the three facets and providing different complementary perspectives on the data. The views have similar look-and-feel to the extent allowed by the facet specifics. Uniform interactive operations applicable to any view support establishing links between the facets. The uniformity principle is also applied in supporting the projecting and slicing operations on the data cube. We evaluate the feasibility and utility of the approach by applying it in two analysis scenarios using geolocated social media data for studying people's reactions to social and natural events of different spatial and temporal scales

City Research Online

Crossref

Fraunhofer-ePrints

The Dwarf Data Cube Eliminates the Highy Dimensionality Curse

Author: Roussopoulos Nick
Sismanis Yannis
Publication venue
Publication date: 18/12/2003
Field of study

The data cube operator encapsulates all possible groupings of a data set and has proved to be an invaluable tool in analyzing vast amounts of data. However its apparent exponential complexity has significantly limited its applicability to low dimensional datasets. Recently the idea of the dwarf data cube model was introduced, and showed that high-dimensional ``dwarf data cubes'' are orders of magnitudes smaller in size than the original data cubes even when they calculate and store every possible aggregation with 100\% precision. In this paper we present a surprising analytical result proving that the size of dwarf cubes grows polynomially with the dimensionality of the data set and, therefore, a full data cube at 100% precision is not inherently cursed by high dimensionality. This striking result of polynomial complexity reformulates the context of cube management and redefines most of the problems associated with data-warehousing and On-Line Analytical Processing. We also develop an efficient algorithm for estimating the size of dwarf data cubes before actually computing them. Finally, we complement our analytical approach with an experimental evaluation using real and synthetic data sets, and demonstrate our results. UMIACS-TR-2003-12

Digital Repository at the University of Maryland

A Sensor Network Data Compression Algorithm Based on Suboptimal Clustering and Virtual Landmark Routing Within Clusters

Author: Chou
Peng Jiang
Shengqiang Li
Sundeep
Wang
Wang
Zhou
Publication venue: Molecular Diversity Preservation International (MDPI)
Publication date: 01/01/2010
Field of study

A kind of data compression algorithm for sensor networks based on suboptimal clustering and virtual landmark routing within clusters is proposed in this paper. Firstly, temporal redundancy existing in data obtained by the same node in sequential instants can be eliminated. Then sensor networks nodes will be clustered. Virtual node landmarks in clusters can be established based on cluster heads. Routing in clusters can be realized by combining a greedy algorithm and a flooding algorithm. Thirdly, a global structure tree based on cluster heads will be established. During the course of data transmissions from nodes to cluster heads and from cluster heads to sink, the spatial redundancy existing in the data will be eliminated. Only part of the raw data needs to be transmitted from nodes to sink, and all raw data can be recovered in the sink based on a compression code and part of the raw data. Consequently, node energy can be saved, largely because transmission of redundant data can be avoided. As a result the overall performance of the sensor network can obviously be improved

Multidisciplinary Digital Publishing Institute

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

Processing of an iceberg query on distributed and centralized databases

Author: MOHAMMED KASIM IMTHIYAZ
Publication venue
Publication date: 02/09/2004
Field of study

Master'sMASTER OF SCIENC

ScholarBank@NUS

Query Rewriting in Itemset Mining

Author: Botta Marco
Esposito Roberto
Meo Rosa
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

Abstract. In recent years, researchers have begun to study inductive databases, a new generation of databases for leveraging decision support applications. In this context, the user interacts with the DBMS using advanced, constraint-based languages for data mining where constraints have been specifically introduced to increase the relevance of the results and, at the same time, to reduce its volume. In this paper we study the problem of mining frequent itemsets using an inductive database 1 . We propose a technique for query answering which consists in rewriting the query in terms of union and intersection of the result sets of other queries, previously executed and materialized. Unfortunately, the exploitation of past queries is not always applicable. We then present sufficient conditions for the optimization to apply and show that these conditions are strictly connected with the presence of functional dependencies between the attributes involved in the queries. We show some experiments on an initial prototype of an optimizer which demonstrates that this approach to query answering is not only viable but in many practical cases absolutely necessary since it reduces drastically the execution time

CiteSeerX

Institutional Research Information System University of Turin