3,373 research outputs found

    Adaptive grid based localized learning for multidimensional data

    Get PDF
    Rapid advances in data-rich domains of science, technology, and business has amplified the computational challenges of Big Data synthesis necessary to slow the widening gap between the rate at which the data is being collected and analyzed for knowledge. This has led to the renewed need for efficient and accurate algorithms, framework, and algorithmic mechanisms essential for knowledge discovery, especially in the domains of clustering, classification, dimensionality reduction, feature ranking, and feature selection. However, data mining algorithms are frequently challenged by the sparseness due to the high dimensionality of the datasets in such domains which is particularly detrimental to the performance of unsupervised learning algorithms. The motivation for the research presented in this dissertation is to develop novel data mining algorithms to address the challenges of high dimensionality, sparseness and large volumes of datasets by using a unique grid-based localized learning paradigm for data movement clustering and classification schema. The grid-based learning is recognized in data mining as these algorithms are inherently efficient since they reduce the search space by partitioning the feature space into effective partitions. However, these approaches have not been successfully devised for supervised learning algorithms or sparseness reduction algorithm as they require careful estimation of grid sizes, partitions and data movement error calculations. Grid-based localized learning algorithms can scale well with an increase in dimensionality and the size of the datasets. To fulfill the goal of designing and developing learning algorithms that can handle data sparseness, high data dimensionality, and large size of data, in a concurrent manner to avoid the feature selection biases, a set of novel data mining algorithms using grid-based localized learning principles are developed and presented. The first algorithm is a unique computational framework for feature ranking that employs adaptive grid-based data shrinking for feature ranking. This method addresses the limitations of existing feature ranking methods by using a scoring function that discovers and exploits dependencies from all the features in the data. Data shrinking principles are established and metricized to capture and exploit dependencies between features. The second core algorithmic contribution is a novel supervised learning algorithm that utilizes grid-based localized learning to build a nonparametric classification model. In this classification model, feature space is divided using uniform/non-uniform partitions and data space subdivision is performed using a grid structure which is then used to build a classification model using grid-based nearest-neighbor learning. The third algorithm is an unsupervised clustering algorithm that is augmented with data shrinking to enhance the clustering performance of the algorithm. This algorithm addresses the limitations of the existing grid-based data shrinking and clustering algorithms by using an adaptive grid-based learning. Multiple experiments on a diversified set of datasets evaluate and discuss the effectiveness of dimensionality reduction, feature selection, unsupervised and supervised learning, and the scalability of the proposed methods compared to the established methods in the literature

    Clustering-Based Materialized View Selection in Data Warehouses

    Full text link
    Materialized view selection is a non-trivial task. Hence, its complexity must be reduced. A judicious choice of views must be cost-driven and influenced by the workload experienced by the system. In this paper, we propose a framework for materialized view selection that exploits a data mining technique (clustering), in order to determine clusters of similar queries. We also propose a view merging algorithm that builds a set of candidate views, as well as a greedy process for selecting a set of views to materialize. This selection is based on cost models that evaluate the cost of accessing data using views and the cost of storing these views. To validate our strategy, we executed a workload of decision-support queries on a test data warehouse, with and without using our strategy. Our experimental results demonstrate its efficiency, even when storage space is limited

    Weakening organizational ties? A classification of styles of volunteering in the Flemish red cross

    Get PDF
    This article presents an initial empirical assessment of a new analytical framework of styles of volunteering (SOV). The framework suggests that volunteering can be categorized in terms of a multidimensional set of cultural and structural indicators that cohere in systematic and varying ways. With data drawn from a survey of 652 Flemish Red Cross volunteers, a multivariate analysis reveals five different SOV categories of volunteers: episodic contributors, established administrators, reliable coworkers, service-oriented core volunteers, and critical key figures. The research findings indicate that the volunteer reality is far more complex than suggested by conventional approaches to the study of volunteering
    corecore