3,328 research outputs found

    A Survey of Parallel Data Mining

    Get PDF
    With the fast, continuous increase in the number and size of databases, parallel data mining is a natural and cost-effective approach to tackle the problem of scalability in data mining. Recently there has been a considerable research on parallel data mining. However, most projects focus on the parallelization of a single kind of data mining algorithm/paradigm. This paper surveys parallel data mining with a broader perspective. More precisely, we discuss the parallelization of data mining algorithms of four knowledge discovery paradigms, namely rule induction, instance-based learning, genetic algorithms and neural networks. Using the lessons learned from this discussion, we also derive a set of heuristic principles for designing efficient parallel data mining algorithms

    Updating Data Warehouses with Temporal Data

    Get PDF
    There has been a growing trend to use temporal data in a data warehouse for making strategic and tactical decisions. The key idea of temporal data management is to make data available at the right time with different time intervals. The temporal data storing enables this by making all the different time slices of data available to whoever needs it. Users with different data latency needs can all be accommodated. Data can be “frozen” via a view on the proper time slice. Data as of a point in time can be obtained across multiple tables or multiple subject areas, resolving consistency and synchronization issues. This paper will discuss implementations such as temporal data updates, coexistence of load and query against the same table, performance of load and report queries, and maintenance of views against the tables with temporal data

    XML content warehousing: Improving sociological studies of mailing lists and web data

    Get PDF
    In this paper, we present the guidelines for an XML-based approach for the sociological study of Web data such as the analysis of mailing lists or databases available online. The use of an XML warehouse is a flexible solution for storing and processing this kind of data. We propose an implemented solution and show possible applications with our case study of profiles of experts involved in W3C standard-setting activity. We illustrate the sociological use of semi-structured databases by presenting our XML Schema for mailing-list warehousing. An XML Schema allows many adjunctions or crossings of data sources, without modifying existing data sets, while allowing possible structural evolution. We also show that the existence of hidden data implies increased complexity for traditional SQL users. XML content warehousing allows altogether exhaustive warehousing and recursive queries through contents, with far less dependence on the initial storage. We finally present the possibility of exporting the data stored in the warehouse to commonly-used advanced software devoted to sociological analysis

    Implementation of a land use and spatial interaction model based on random utility choices and social accounting matrices

    Get PDF
    Random utility modelling has been established as one of the main paradigms for the implementation of land use and transport interaction (LUTI) models. Despite widespread application of such models, the respective literature provides relatively little detail on the theoretical consistency of the overall formal framework of the random utility based LUTI models. To address this gap, we present a detailed formal description of a generic land use and spatial interaction model that adheres to the random utility paradigm through the explicit distinction between utility and cost across all processes that imply behaviour of agents. The model is rooted in an extended input-output table, with the workforce and households accounts being disaggregated by socio-economic type. Similarly, the land account is broken down by domestic and non-domestic land use types. The model is developed around two processes. Firstly, the generation of demand for inputs required by established production; the estimation of the level of demand between sectors, households and land use types is supported by social accounting techniques. When appropriate the implicit production functions are assumed depended on costs of inputs, which gives rise to price-elastic demands. Secondly, the spatial assignment of demanded inputs (industrial activity, workforce, land) to locations of production; here sequences of decisions are used to distribute demand (both spatially and, when necessary, a-spatially) and to propagate costs and utilities of production and consumption that emerge from imbalances between supply and demand. The implementation of this generic model is discussed in relation to the case of the Greater South East region of the UK, including London, the South East and the East of England. We present the calibration process, data requirements, necessary assumptions and resulting implications. We discuss outputs under various land use strategies and economic scenarios, such as regulated versus competing land uses, constrained versus unconstrained densities, and high versus low economic and population growth rates. By adjusting the design constraints of the spatial planning and infrastructure supply strategies we aim to improve their sustainability.

    Pragmatic Ontology Evolution: Reconciling User Requirements and Application Performance

    Get PDF
    Increasingly, organizations are adopting ontologies to describe their large catalogues of items. These ontologies need to evolve regularly in response to changes in the domain and the emergence of new requirements. An important step of this process is the selection of candidate concepts to include in the new version of the ontology. This operation needs to take into account a variety of factors and in particular reconcile user requirements and application performance. Current ontology evolution methods focus either on ranking concepts according to their relevance or on preserving compatibility with existing applications. However, they do not take in consideration the impact of the ontology evolution process on the performance of computational tasks – e.g., in this work we focus on instance tagging, similarity computation, generation of recommendations, and data clustering. In this paper, we propose the Pragmatic Ontology Evolution (POE) framework, a novel approach for selecting from a group of candidates a set of concepts able to produce a new version of a given ontology that i) is consistent with the a set of user requirements (e.g., max number of concepts in the ontology), ii) is parametrised with respect to a number of dimensions (e.g., topological considerations), and iii) effectively supports relevant computational tasks. Our approach also supports users in navigating the space of possible solutions by showing how certain choices, such as limiting the number of concepts or privileging trendy concepts rather than historical ones, would reflect on the application performance. An evaluation of POE on the real-world scenario of the evolving Springer Nature taxonomy for editorial classification yielded excellent results, demonstrating a significant improvement over alternative approaches

    A Survey On Data Mining Techniques and Applications

    Get PDF
    Data Mining refers to the analysis of experimental data sets to seek out relationships and to summarize the data in ways in which are each comprehensible and helpful. Compared with alternative DM techniques, Intelligent Systems (ISs) based mostly approaches that embody Artificial Neural Networks (ANNs), fuzzy pure mathematics, approximate reasoning, and derivative-free optimisation strategies similar to Genetic Algorithms (GAs), are tolerant of impreciseness, uncertainty, partial truth, and approximation. This paper reviews varieties of Data Mining techniques and applications

    Identifying the Challenges in Reducing Latency in GSN using Predictors

    Get PDF
    Simulations based on real-time data continuously gathered from sensor networks all over the world have received growing attention due to the increasing availability of measured data. Furthermore, predictive techniques have been employed in the realm of such networks to reduce communication for energy-efficiency. However, research has focused on the high amounts of data transferred rather than latency requirements posed by the applications. We propose using predictors to supply data with low latency as required for accurate simulations. This paper investigates requirements for a successful combination of these concepts and discusses challenges that arise

    Compressed Video Action Recognition

    Full text link
    Training robust deep video representations has proven to be much more challenging than learning deep image representations. This is in part due to the enormous size of raw video streams and the high temporal redundancy; the true and interesting signal is often drowned in too much irrelevant data. Motivated by that the superfluous information can be reduced by up to two orders of magnitude by video compression (using H.264, HEVC, etc.), we propose to train a deep network directly on the compressed video. This representation has a higher information density, and we found the training to be easier. In addition, the signals in a compressed video provide free, albeit noisy, motion information. We propose novel techniques to use them effectively. Our approach is about 4.6 times faster than Res3D and 2.7 times faster than ResNet-152. On the task of action recognition, our approach outperforms all the other methods on the UCF-101, HMDB-51, and Charades dataset.Comment: CVPR 2018 (Selected for spotlight presentation
    • …
    corecore