2,773 research outputs found

    Dynamic Thresholding Mechanisms for IR-Based Filtering in Efficient Source Code Plagiarism Detection

    Full text link
    To solve time inefficiency issue, only potential pairs are compared in string-matching-based source code plagiarism detection; wherein potentiality is defined through a fast-yet-order-insensitive similarity measurement (adapted from Information Retrieval) and only pairs which similarity degrees are higher or equal to a particular threshold is selected. Defining such threshold is not a trivial task considering the threshold should lead to high efficiency improvement and low effectiveness reduction (if it is unavoidable). This paper proposes two thresholding mechanisms---namely range-based and pair-count-based mechanism---that dynamically tune the threshold based on the distribution of resulted similarity degrees. According to our evaluation, both mechanisms are more practical to be used than manual threshold assignment since they are more proportional to efficiency improvement and effectiveness reduction.Comment: The 2018 International Conference on Advanced Computer Science and Information Systems (ICACSIS

    Robust techniques and applications in fuzzy clustering

    Get PDF
    This dissertation addresses issues central to frizzy classification. The issue of sensitivity to noise and outliers of least squares minimization based clustering techniques, such as Fuzzy c-Means (FCM) and its variants is addressed. In this work, two novel and robust clustering schemes are presented and analyzed in detail. They approach the problem of robustness from different perspectives. The first scheme scales down the FCM memberships of data points based on the distance of the points from the cluster centers. Scaling done on outliers reduces their membership in true clusters. This scheme, known as the Mega-clustering, defines a conceptual mega-cluster which is a collective cluster of all data points but views outliers and good points differently (as opposed to the concept of Dave\u27s Noise cluster). The scheme is presented and validated with experiments and similarities with Noise Clustering (NC) are also presented. The other scheme is based on the feasible solution algorithm that implements the Least Trimmed Squares (LTS) estimator. The LTS estimator is known to be resistant to noise and has a high breakdown point. The feasible solution approach also guarantees convergence of the solution set to a global optima. Experiments show the practicability of the proposed schemes in terms of computational requirements and in the attractiveness of their simplistic frameworks. The issue of validation of clustering results has often received less attention than clustering itself. Fuzzy and non-fuzzy cluster validation schemes are reviewed and a novel methodology for cluster validity using a test for random position hypothesis is developed. The random position hypothesis is tested against an alternative clustered hypothesis on every cluster produced by the partitioning algorithm. The Hopkins statistic is used as a basis to accept or reject the random position hypothesis, which is also the null hypothesis in this case. The Hopkins statistic is known to be a fair estimator of randomness in a data set. The concept is borrowed from the clustering tendency domain and its applicability to validating clusters is shown here. A unique feature selection procedure for use with large molecular conformational datasets with high dimensionality is also developed. The intelligent feature extraction scheme not only helps in reducing dimensionality of the feature space but also helps in eliminating contentious issues such as the ones associated with labeling of symmetric atoms in the molecule. The feature vector is converted to a proximity matrix, and is used as an input to the relational fuzzy clustering (FRC) algorithm with very promising results. Results are also validated using several cluster validity measures from literature. Another application of fuzzy clustering considered here is image segmentation. Image analysis on extremely noisy images is carried out as a precursor to the development of an automated real time condition state monitoring system for underground pipelines. A two-stage FCM with intelligent feature selection is implemented as the segmentation procedure and results on a test image are presented. A conceptual framework for automated condition state assessment is also developed

    A Hybrid Multi-user Cloud Access Control based Block Chain Framework for Privacy Preserving Distributed Databases

    Get PDF
    Most of the traditional medical applications are insecure and difficult to compute the data integrity with variable hash size. Traditional medical data security systems are insecure and it depend on static parameters for data security. Also, distributed based cloud storage systems are independent of integrity computational and data security due to unstructured data and computational memory. As the size of the data and its dimensions are increasing in the public and private cloud servers, it is difficult to provide the machine learning based privacy preserving in cloud computing environment. Block-chain technology plays a vital role for large cloud databases. Most of the conventional block-chain frameworks are based on the existing integrity and confidentiality models. Also, these models are based on the data size and file format. In this model, a novel integrity verification and encryption framework is designed and implemented in cloud environment.  In order to overcome these problems in the cloud computing environment, a hybrid integrity and security-based block-chain framework is designed and implemented on the large distributed databases. In this framework,a novel decision tree classifier is used along with non-linear mathematical hash algorithm and advanced attribute-based encryption models are used to improve the privacy of multiple users on the large cloud datasets. Experimental results proved that the proposed advanced privacy preserving based block-chain technology has better efficiency than the traditional block-chain based privacy preserving systems on large distributed databases

    Development of a quantitative health index and diagnostic method for efficient asset management of power transformers

    Get PDF
    Power transformers play a very important role in electrical power networks and are frequently operated longer than their expected design life. Therefore, to ensure their best operating performance in a transmission network, the fault condition of each transformer must be assessed regularly. For an accurate fault diagnosis, it is important to have maximum information about an individual transformer based on unbiased measurements. This can best be achieved using artificial intelligence (AI) that can systematically analyse the complex features of diagnostic measurements. Clustering techniques are a form of AI that is particularly well suited to fault diagnosis. To provide an assessment of transformers, a hybrid k-means algorithm, and probabilistic Parzen window estimation are used in this research. The clusters they form are representative of a single or multiple fault categories. The proposed technique computes the maximum probability of transformers in each cluster to determine their fault categories. The main focus of this research is to determine a quantitative health index (HI) to characterize the operating condition of transformers. Condition assessment tries to detect incipient faults before they become too serious, which requires a sensitive and quantified approach. Therefore, the HI needs to come from a proportionate system that can estimate health condition of transformers over time. To quantify this condition, the General Regression Neural Network (GRNN), a type of AI, has been chosen in this research. The GRNN works well with small sets of training data and avoids the needs to estimate large sets of model parameters, following a largely non-parametric approach. The methodology used here regards transformers as a collection of subsystems and summarizes their individual condition into a quantified HI based on the existing agreed benchmarks drawn from IEEE and CIGRE standards. To better calibrate the HI, it may be mapped to a failure probability estimate for each transformer over the coming year. Experimental results of the research show that the proposed methods are more effective than previously published approaches when diagnosing critical faults. Moreover, this novel HI approach can provide a comprehensive assessment of transformers based on the actual condition of their individual subsystems

    Development of a quantitative health index and diagnostic method for efficient asset management of power transformers

    Get PDF
    Power transformers play a very important role in electrical power networks and are frequently operated longer than their expected design life. Therefore, to ensure their best operating performance in a transmission network, the fault condition of each transformer must be assessed regularly. For an accurate fault diagnosis, it is important to have maximum information about an individual transformer based on unbiased measurements. This can best be achieved using artificial intelligence (AI) that can systematically analyse the complex features of diagnostic measurements. Clustering techniques are a form of AI that is particularly well suited to fault diagnosis. To provide an assessment of transformers, a hybrid k-means algorithm, and probabilistic Parzen window estimation are used in this research. The clusters they form are representative of a single or multiple fault categories. The proposed technique computes the maximum probability of transformers in each cluster to determine their fault categories. The main focus of this research is to determine a quantitative health index (HI) to characterize the operating condition of transformers. Condition assessment tries to detect incipient faults before they become too serious, which requires a sensitive and quantified approach. Therefore, the HI needs to come from a proportionate system that can estimate health condition of transformers over time. To quantify this condition, the General Regression Neural Network (GRNN), a type of AI, has been chosen in this research. The GRNN works well with small sets of training data and avoids the needs to estimate large sets of model parameters, following a largely non-parametric approach. The methodology used here regards transformers as a collection of subsystems and summarizes their individual condition into a quantified HI based on the existing agreed benchmarks drawn from IEEE and CIGRE standards. To better calibrate the HI, it may be mapped to a failure probability estimate for each transformer over the coming year. Experimental results of the research show that the proposed methods are more effective than previously published approaches when diagnosing critical faults. Moreover, this novel HI approach can provide a comprehensive assessment of transformers based on the actual condition of their individual subsystems
    • …
    corecore