Search CORE

3,328 research outputs found

A Survey of Parallel Data Mining

Author: Freitas Alex A.
Publication venue
Publication date
Field of study

With the fast, continuous increase in the number and size of databases, parallel data mining is a natural and cost-effective approach to tackle the problem of scalability in data mining. Recently there has been a considerable research on parallel data mining. However, most projects focus on the parallelization of a single kind of data mining algorithm/paradigm. This paper surveys parallel data mining with a broader perspective. More precisely, we discuss the parallelization of data mining algorithms of four knowledge discovery paradigms, namely rule induction, instance-based learning, genetic algorithms and neural networks. Using the lessons learned from this discussion, we also derive a set of heuristic principles for designing efficient parallel data mining algorithms

Updating Data Warehouses with Temporal Data

Author: Rahman Nayem
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/01/2008
Field of study

There has been a growing trend to use temporal data in a data warehouse for making strategic and tactical decisions. The key idea of temporal data management is to make data available at the right time with different time intervals. The temporal data storing enables this by making all the different time slices of data available to whoever needs it. Users with different data latency needs can all be accommodated. Data can be “frozen” via a view on the proper time slice. Data as of a point in time can be obtained across multiple tables or multiple subject areas, resolving consistency and synchronization issues. This paper will discuss implementations such as temporal data updates, coexistence of load and query against the same table, performance of load and report queries, and maintenance of views against the tables with temporal data

AIS Electronic Library (AISeL)

XML content warehousing: Improving sociological studies of mailing lists and web data

Author: Colazzo Dario
Dudouet François-Xavier
Manolescu Ioana
Nguyen Benjamin
Senellart Pierre
Vion Antoine
Publication venue
Publication date: 01/01/2011
Field of study

In this paper, we present the guidelines for an XML-based approach for the sociological study of Web data such as the analysis of mailing lists or databases available online. The use of an XML warehouse is a flexible solution for storing and processing this kind of data. We propose an implemented solution and show possible applications with our case study of profiles of experts involved in W3C standard-setting activity. We illustrate the sociological use of semi-structured databases by presenting our XML Schema for mailing-list warehousing. An XML Schema allows many adjunctions or crossings of data sources, without modifying existing data sets, while allowing possible structural evolution. We also show that the existence of hidden data implies increased complexity for traditional SQL users. XML content warehousing allows altogether exhaustive warehousing and recursive queries through contents, with far less dependence on the initial storage. We finally present the possibility of exporting the data stored in the warehouse to commonly-used advanced software devoted to sociological analysis

arXiv.org e-Print Archive

HAL-CentraleSupelec

HAL AMU

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

Implementation of a land use and spatial interaction model based on random utility choices and social accounting matrices

Author: Antony Hargreaves
Marcial Echenique
Vadim Grinevich
Vassilis Zachariadis
Publication venue
Publication date
Field of study

Random utility modelling has been established as one of the main paradigms for the implementation of land use and transport interaction (LUTI) models. Despite widespread application of such models, the respective literature provides relatively little detail on the theoretical consistency of the overall formal framework of the random utility based LUTI models. To address this gap, we present a detailed formal description of a generic land use and spatial interaction model that adheres to the random utility paradigm through the explicit distinction between utility and cost across all processes that imply behaviour of agents. The model is rooted in an extended input-output table, with the workforce and households accounts being disaggregated by socio-economic type. Similarly, the land account is broken down by domestic and non-domestic land use types. The model is developed around two processes. Firstly, the generation of demand for inputs required by established production; the estimation of the level of demand between sectors, households and land use types is supported by social accounting techniques. When appropriate the implicit production functions are assumed depended on costs of inputs, which gives rise to price-elastic demands. Secondly, the spatial assignment of demanded inputs (industrial activity, workforce, land) to locations of production; here sequences of decisions are used to distribute demand (both spatially and, when necessary, a-spatially) and to propagate costs and utilities of production and consumption that emerge from imbalances between supply and demand. The implementation of this generic model is discussed in relation to the case of the Greater South East region of the UK, including London, the South East and the East of England. We present the calibration process, data requirements, necessary assumptions and resulting implications. We discuss outputs under various land use strategies and economic scenarios, such as regulated versus competing land uses, constrained versus unconstrained densities, and high versus low economic and population growth rates. By adjusting the design constraints of the spatial planning and infrastructure supply strategies we aim to improve their sustainability.

Pragmatic Ontology Evolution: Reconciling User Requirements and Application Performance

Author: A Groß
DN Xuan
F Osborne
F Osborne
F Osborne
F Osborne
F Zablith
F Zablith
H Kondylakis
I Guyon
L Ding
L Qin
M Hartung
M Sabou
NF Noy
P Cimiano
P Ristoski
R Kohavi
SE Middleton
Y Liang
Z Huang
Z Sellami
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Increasingly, organizations are adopting ontologies to describe their large catalogues of items. These ontologies need to evolve regularly in response to changes in the domain and the emergence of new requirements. An important step of this process is the selection of candidate concepts to include in the new version of the ontology. This operation needs to take into account a variety of factors and in particular reconcile user requirements and application performance. Current ontology evolution methods focus either on ranking concepts according to their relevance or on preserving compatibility with existing applications. However, they do not take in consideration the impact of the ontology evolution process on the performance of computational tasks – e.g., in this work we focus on instance tagging, similarity computation, generation of recommendations, and data clustering. In this paper, we propose the Pragmatic Ontology Evolution (POE) framework, a novel approach for selecting from a group of candidates a set of concepts able to produce a new version of a given ontology that i) is consistent with the a set of user requirements (e.g., max number of concepts in the ontology), ii) is parametrised with respect to a number of dimensions (e.g., topological considerations), and iii) effectively supports relevant computational tasks. Our approach also supports users in navigating the space of possible solutions by showing how certain choices, such as limiting the number of concepts or privileging trendy concepts rather than historical ones, would reflect on the application performance. An evaluation of POE on the real-world scenario of the evolving Springer Nature taxonomy for editorial classification yielded excellent results, demonstrating a significant improvement over alternative approaches

A Survey On Data Mining Techniques and Applications

Author: Sunil Kumar Patel, Kshipra Soni
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/05/2017
Field of study

Data Mining refers to the analysis of experimental data sets to seek out relationships and to summarize the data in ways in which are each comprehensible and helpful. Compared with alternative DM techniques, Intelligent Systems (ISs) based mostly approaches that embody Artificial Neural Networks (ANNs), fuzzy pure mathematics, approximate reasoning, and derivative-free optimisation strategies similar to Genetic Algorithms (GAs), are tolerant of impreciseness, uncertainty, partial truth, and approximation. This paper reviews varieties of Data Mining techniques and applications

International Journal on Recent and Innovation Trends in Computing and Communication

Identifying the Challenges in Reducing Latency in GSN using Predictors

Author: Benzing Andreas
Herrmann Klaus
Koldehofe Boris
Rothermel Kurt
Publication venue: European Association of Software Science and Technology
Publication date: 27/02/2009
Field of study

Simulations based on real-time data continuously gathered from sensor networks all over the world have received growing attention due to the increasing availability of measured data. Furthermore, predictive techniques have been employed in the realm of such networks to reduce communication for energy-efficiency. However, research has focused on the high amounts of data transferred rather than latency requirements posed by the applications. We propose using predictors to supply data with low latency as required for accurate simulations. This paper investigates requirements for a successful combination of these concepts and discusses challenges that arise

Compressed Video Action Recognition

Author: Hu Hexiang
Krähenbühl Philipp
Manmatha R.
Smola Alexander J.
Wu Chao-Yuan
Zaheer Manzil
Publication venue
Publication date: 29/03/2018
Field of study

Training robust deep video representations has proven to be much more challenging than learning deep image representations. This is in part due to the enormous size of raw video streams and the high temporal redundancy; the true and interesting signal is often drowned in too much irrelevant data. Motivated by that the superfluous information can be reduced by up to two orders of magnitude by video compression (using H.264, HEVC, etc.), we propose to train a deep network directly on the compressed video. This representation has a higher information density, and we found the training to be easier. In addition, the signals in a compressed video provide free, albeit noisy, motion information. We propose novel techniques to use them effectively. Our approach is about 4.6 times faster than Res3D and 2.7 times faster than ResNet-152. On the task of action recognition, our approach outperforms all the other methods on the UCF-101, HMDB-51, and Charades dataset.Comment: CVPR 2018 (Selected for spotlight presentation

arXiv.org e-Print Archive