4,424 research outputs found

    Certainty of outlier and boundary points processing in data mining

    Full text link
    Data certainty is one of the issues in the real-world applications which is caused by unwanted noise in data. Recently, more attentions have been paid to overcome this problem. We proposed a new method based on neutrosophic set (NS) theory to detect boundary and outlier points as challenging points in clustering methods. Generally, firstly, a certainty value is assigned to data points based on the proposed definition in NS. Then, certainty set is presented for the proposed cost function in NS domain by considering a set of main clusters and noise cluster. After that, the proposed cost function is minimized by gradient descent method. Data points are clustered based on their membership degrees. Outlier points are assigned to noise cluster and boundary points are assigned to main clusters with almost same membership degrees. To show the effectiveness of the proposed method, two types of datasets including 3 datasets in Scatter type and 4 datasets in UCI type are used. Results demonstrate that the proposed cost function handles boundary and outlier points with more accurate membership degrees and outperforms existing state of the art clustering methods.Comment: Conference Paper, 6 page

    On the role of pre and post-processing in environmental data mining

    Get PDF
    The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed

    Adaptive inferential sensors based on evolving fuzzy models

    Get PDF
    A new technique to the design and use of inferential sensors in the process industry is proposed in this paper, which is based on the recently introduced concept of evolving fuzzy models (EFMs). They address the challenge that the modern process industry faces today, namely, to develop such adaptive and self-calibrating online inferential sensors that reduce the maintenance costs while keeping the high precision and interpretability/transparency. The proposed new methodology makes possible inferential sensors to recalibrate automatically, which reduces significantly the life-cycle efforts for their maintenance. This is achieved by the adaptive and flexible open-structure EFM used. The novelty of this paper lies in the following: (1) the overall concept of inferential sensors with evolving and self-developing structure from the data streams; (2) the new methodology for online automatic selection of input variables that are most relevant for the prediction; (3) the technique to detect automatically a shift in the data pattern using the age of the clusters (and fuzzy rules); (4) the online standardization technique used by the learning procedure of the evolving model; and (5) the application of this innovative approach to several real-life industrial processes from the chemical industry (evolving inferential sensors, namely, eSensors, were used for predicting the chemical properties of different products in The Dow Chemical Company, Freeport, TX). It should be noted, however, that the methodology and conclusions of this paper are valid for the broader area of chemical and process industries in general. The results demonstrate that well-interpretable and with-simple-structure inferential sensors can automatically be designed from the data stream in real time, which predict various process variables of interest. The proposed approach can be used as a basis for the development of a new generation of adaptive and evolving inferential sensors that can a- ddress the challenges of the modern advanced process industry

    Fuzzy heterogeneous neurons for imprecise classification problems

    Get PDF
    In the classical neuron model, inputs are continuous real-valued quantities. However, in many important domains from the real world, objects are described by a mixture of continuous and discrete variables, usually containing missing information and uncertainty. In this paper, a general class of neuron models accepting heterogeneous inputs in the form of mixtures of continuous (crisp and/or fuzzy) and discrete quantities admitting missing data is presented. From these, several particular models can be derived as instances and different neural architectures constructed with them. Such models deal in a natural way with problems for which information is imprecise or even missing. Their possibilities in classification and diagnostic problems are here illustrated by experiments with data from a real-world domain in the field of environmental studies. These experiments show that such neurons can both learn and classify complex data very effectively in the presence of uncertain information.Peer ReviewedPostprint (author's final draft

    Growing a Tree in the Forest: Constructing Folksonomies by Integrating Structured Metadata

    Full text link
    Many social Web sites allow users to annotate the content with descriptive metadata, such as tags, and more recently to organize content hierarchically. These types of structured metadata provide valuable evidence for learning how a community organizes knowledge. For instance, we can aggregate many personal hierarchies into a common taxonomy, also known as a folksonomy, that will aid users in visualizing and browsing social content, and also to help them in organizing their own content. However, learning from social metadata presents several challenges, since it is sparse, shallow, ambiguous, noisy, and inconsistent. We describe an approach to folksonomy learning based on relational clustering, which exploits structured metadata contained in personal hierarchies. Our approach clusters similar hierarchies using their structure and tag statistics, then incrementally weaves them into a deeper, bushier tree. We study folksonomy learning using social metadata extracted from the photo-sharing site Flickr, and demonstrate that the proposed approach addresses the challenges. Moreover, comparing to previous work, the approach produces larger, more accurate folksonomies, and in addition, scales better.Comment: 10 pages, To appear in the Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining(KDD) 201
    corecore