48 research outputs found

    Comparative Analysis of Various Data Stream Mining Procedures and Various Dimension Reduction Techniques

    Get PDF
    In recent years data mining is contributing to be the great research area, as we know data mining is the process of extracting needful information from the given set of data which will be further used for various purposes, it could be for commercial use or for scientific use .while fetching the information (mined data) proper methodologies with good approximations have to be used .In our survey we have provided the study about various data stream clustering techniques and various dimension reduction techniques with their characteristics to improve the quality of clustering, we have also provided our approach(our proposal) for clustering the streamed data using suitable procedures ,In our approach for stream data mining a dimension reduction technique have been used then after the Fuzzy C-means algorithm have been applied on it to improve the quality of clustering. Keywords: Data Stream, Dimension Reduction, Clusterin

    Software Reuse in Cardiology Related Medical Database Using K-Means Clustering Technique

    Full text link
    Software technology based on reuse is identified as a process of designing software for the reuse purpose. The software reuse is a process in which the existing software is used to build new software. A metric is a quantitative indicator of an attribute of an item or thing. Reusability is the likelihood for a segment of source code that can be used again to add new functionalities with slight or no modification. A lot of research has been projected using reusability in reducing code, domain, requirements, design etc., but very little work is reported using software reuse in medical domain. An attempt is made to bridge the gap in this direction, using the concepts of clustering and classifying the data based on the distance measures. In this paper cardiologic database is considered for study. The developed model will be useful for Doctors or Paramedics to find out the patients level in the cardiologic disease, deduce the medicines required in seconds and propose them to the patient. In order to measure the reusability K means clustering algorithm is used.Comment: 5 pages. arXiv admin note: text overlap with arXiv:1212.031

    A binary level set method based on k-Means for contour tracking on skin cancer images

    Full text link
    A great challenge of research and development activities have recently highlighted in segmenting of the skin cancer images. This paper presents a novel algorithm to improve the segmentation results of level set algorithm with skin cancer images. The major contribution of presented algorithm is to simplify skin cancer images for the computer aided object analysis without loss of significant information and to decrease the required computational cost. The presented algorithm uses k-means clustering technique and explores primitive segmentation to get initial label estimation for level set algorithm. The proposed segmentation method provides better segmentation results as compared to standard level set segmentation technique and modified fuzzy cmeans clustering technique

    Data Mining for Decision Support of the Quality Improvement Process

    Get PDF
    A two-stage methodology is presented for enhancing the process of assigning quality problems to quality improvement teams in industrial firms. The method advances the decision support system of the quality improvement process by grouping the related quality problems in two steps:. First, a soft grouping is performed using association rules as a data mining technique, and then, resulted groups are finalized by employing a costs minimization model. Moreover, to find the optimal groups, a mathematical programming language is used. Results show that this methodology is beneficial and attractive in making the quality improvement process more efficient and in providing support to managerial decisions for creating quality improvement teams. As a practical illustration, the implementation of this methodology is investigated for an EDM fast hole drilling process

    Adaptive Mining Techniques for Data Streams Using Algorithm Output Granularity Mohamed

    Get PDF
    Mining data streams is an emerging area of research given the potentially large number of business and scientific applications. A significant challenge in analyzing /mining data streams is the high data rate of the stream. In this paper, we propose a novel approach to cope with the high data rate of incoming data streams. We termed our approach "algorithm output granularity". It is a resource-aware approach that is adaptable to available memory, time constraints, and data stream rate. The approach is generic and applicable to clustering, classification and counting frequent items mining techniques. We have developed a data stream clustering algorithm based on the algorithm output granularity approach. We present this algorithm and discuss its implementation and empirical evaluation. The experiments show acceptable accuracy accompanied with run-time efficiency. They show that the proposed algorithm outperforms the K-means in terms of running time while preserving the accuracy that our algorithm can achieve

    Supervised clustering of streaming data for email batch detection

    Get PDF
    We address the problem of detecting batches of emails that have been created according to the same template. This problem is motivated by the desire to filter spam more effectively by exploiting collective information about entire batches of jointly generated messages. The application matches the problem setting of supervised clustering, because examples of correct clusterings can be collected. Known decoding procedures for supervised clustering are cubic in the number of instances. When decisions cannot be reconsidered once they have been made – owing to the streaming nature of the data – then the decoding problem can be solved in linear time. We devise a sequential decoding procedure and derive the corresponding optimization problem of supervised clustering. We study the impact of collective attributes of email batches on the effectiveness of recognizing spam emails. 1

    Privacy-Preserving Clustering of Data Streams

    Get PDF
    [[abstract]]As most previous studies on privacy-preserving data mining placed specific importance on the security of massive amounts of data from a static database, consequently data undergoing privacy-preservation often leads to a decline in the accuracy of mining results. Furthermore, following by the rapid advancement of Internet and telecommunication technology, subsequently data types have transformed from traditional static data into data streams with consecutive, rapid, temporal, and unpredictable properties. Due to the increase of such data types, traditional privacy-preserving data mining algorithms requiring complex calculation are no longer applicable. As a result, this paper has proposed a method of Privacy-Preserving Clustering of Data Streams (PPCDS) to improve data stream mining procedures while concurrently preserving privacy with a high degree of mining accuracy. PPCDS is mainly composed of two phases: Rotation-Based Perturbation and cluster mining. In the phase of data rotating perturbation phase, a rotation transformation matrix is applied to rapidly perturb the data streams in order to preserve data privacy. In the cluster mining phase, perturbed data will first establish a micro-cluster through optimization of cluster centers, then applying statistical calculation to update a micro-cluster, as well as using geometric time frame to allocate and store a micro-cluster, and finally output mining result through a macro-cluster generation. Two simple data structure are added in the macro-cluster generation process to avoid recalculating the distance between the macro-point and the cluster center in the generation process. This process reduces the repeated calculation time in order to enhance mining efficiency without losing mining accuracy.[[incitationindex]]EI[[booktype]]ç´™

    Enhanced P2P Services Providing Multimedia Content

    Get PDF
    The retrieval facilities of most Peer-to-Peer (P2P) systems are limited to queries based on unique identifiers or small sets of keywords. Unfortunately, this approach is very inadequate and inefficient when a huge amount of multimedia resources is shared. To address this major limitation, we propose an original image and video sharing system, in which a user is able to interactively search interesting resources by means of content-based image and video retrieval techniques. In order to limit the network traffic load, maximizing the usefulness of each peer contacted in the query process, we also propose the adoption of an adaptive overlay routing algorithm, exploiting compact representations of the multimedia resources shared by each peer. Experimental results confirm the validity of the proposed approach, that is capable of dynamically adapting the network topology to peer interests, on the basis of query interactions among users
    corecore