1,295 research outputs found

    Forest Tree- An Efficient Proposal Approach for Data Mining

    Get PDF
    Data Mining (DM) is a way of looking on different models, summaries, & derived values from a given gathered data. DM itself work on the process of looking for analytical information in huge amount of available databases. An illustration of a predictive riddle is targeted marketing. There are many factors that influence the performance of mining on large data sets. In this paper we are going to use forest tree technique in order to improve the way of performance of how the data is to be fetched and when on implementation it will definitely overcome the performance of previous work which includes existing approach decision tree algorithm

    Forest Tree Algorithm- An Efficient Approach of Data Mining Over Decision Tree

    Get PDF
    Mining of Data (DM) is a way to display different models, summaries and values derived from a given data collected. The DM itself works in the process of searching for analytical information on the large number of available databases. An example of a predictive enigma is targeted marketing. There are many factors that affect data mining performance in large data sets. In this article we will use the forest tree technique to improve performance in search for data and implementation, surely overcome the previous work performance that includes the approach of the existing tree decision tree algorithm

    Median evidential c-means algorithm and its application to community detection

    Get PDF
    Median clustering is of great value for partitioning relational data. In this paper, a new prototype-based clustering method, called Median Evidential C-Means (MECM), which is an extension of median c-means and median fuzzy c-means on the theoretical framework of belief functions is proposed. The median variant relaxes the restriction of a metric space embedding for the objects but constrains the prototypes to be in the original data set. Due to these properties, MECM could be applied to graph clustering problems. A community detection scheme for social networks based on MECM is investigated and the obtained credal partitions of graphs, which are more refined than crisp and fuzzy ones, enable us to have a better understanding of the graph structures. An initial prototype-selection scheme based on evidential semi-centrality is presented to avoid local premature convergence and an evidential modularity function is defined to choose the optimal number of communities. Finally, experiments in synthetic and real data sets illustrate the performance of MECM and show its difference to other methods

    A comparison of clustering and modification based graph anonymization methods with constraints

    Get PDF
    In this paper a comparison is performed on two of the key methods for graph anonymization and their behavior is evaluated when constraints are incorporated into the anonymization process. The two methods tested are node clustering and node modification and are applied to online social network (OSN) graph datasets. The constraints implement user defined utility requirements for the community structure of the graph and major hub nodes. The methods are benchmarked using three real OSN datasets and different levels of k?anonymity. The results show that the constraints reduce the information loss while incurring an acceptable disclosure risk. Overall, it is found that the modification method with constraints gives the best results for information loss and risk of disclosure.This research is partially supported by the Spanish MEC (projects ARES CONSOLIDER INGENIO 2010 CSD2007-00004 -- eAEGIS TSI2007-65406-C03-02 -- and HIPERGRAPH TIN2009-14560-C03-01)Peer Reviewe

    Doctor of Philosophy

    Get PDF
    dissertationWith the tremendous growth of data produced in the recent years, it is impossible to identify patterns or test hypotheses without reducing data size. Data mining is an area of science that extracts useful information from the data by discovering patterns and structures present in the data. In this dissertation, we will largely focus on clustering which is often the first step in any exploratory data mining task, where items that are similar to each other are grouped together, making downstream data analysis robust. Different clustering techniques have different strengths, and the resulting groupings provide different perspectives on the data. Due to the unsupervised nature i.e., the lack of domain experts who can label the data, validation of results is very difficult. While there are measures that compute "goodness" scores for clustering solutions as a whole, there are few methods that validate the assignment of individual data items to their clusters. To address these challenges we focus on developing a framework that can generate, compare, combine, and evaluate different solutions to make more robust and significant statements about the data. In the first part of this dissertation, we present fast and efficient techniques to generate and combine different clustering solutions. We build on some recent ideas on efficient representations of clusters of partitions to develop a well founded metric that is spatially aware to compare clusterings. With the ability to compare clusterings, we describe a heuristic to combine different solutions to produce a single high quality clustering. We also introduce a Markov chain Monte Carlo approach to sample different clusterings from the entire landscape to provide the users with a variety of choices. In the second part of this dissertation, we build certificates for individual data items and study their influence on effective data reduction. We present a geometric approach by defining regions of influence for data items and clusters and use this to develop adaptive sampling techniques to speedup machine learning algorithms. This dissertation is therefore a systematic approach to study the landscape of clusterings in an attempt to provide a better understanding of the data
    • …
    corecore