11 research outputs found

    Clustering Software Components for Program Restructuring and Component Reuse Using Hybrid XNOR Similarity Function

    Get PDF
    AbstractComponent based software development has gained a lot of practical importance in the field of software engineering from academic researchers and also from industry perspective. Finding components for efficient software reuse is one of the important problems aimed by researchers. Clustering reduces the search space of components by grouping similar entities together thus ensuring reduced time complexity as it reduces the search time for component retrieval. In this research, we instigate a generalized approach for clustering a given set of documents or software components by defining a similarity function called hybrid XNOR function to find degree of similarity between two document sets or software components. A similarity matrix is obtained for a given set of documents or components by applying hybrid XNOR function. We define and design the algorithm for component or document clustering which has the input as similarity matrix and output being set of clusters. The output is a set of highly cohesive pattern groups or components

    Learning with Clustering Structure

    Full text link
    We study supervised learning problems using clustering constraints to impose structure on either features or samples, seeking to help both prediction and interpretation. The problem of clustering features arises naturally in text classification for instance, to reduce dimensionality by grouping words together and identify synonyms. The sample clustering problem on the other hand, applies to multiclass problems where we are allowed to make multiple predictions and the performance of the best answer is recorded. We derive a unified optimization formulation highlighting the common structure of these problems and produce algorithms whose core iteration complexity amounts to a k-means clustering step, which can be approximated efficiently. We extend these results to combine sparsity and clustering constraints, and develop a new projection algorithm on the set of clustered sparse vectors. We prove convergence of our algorithms on random instances, based on a union of subspaces interpretation of the clustering structure. Finally, we test the robustness of our methods on artificial data sets as well as real data extracted from movie reviews.Comment: Completely rewritten. New convergence proofs in the clustered and sparse clustered case. New projection algorithm on sparse clustered vector

    Selection of compressible signals from telemetry data

    Get PDF
    Sensors are deployed in all aspects of modern city infrastructure and generate vast amounts of data. Only subsets of this data, however, are relevant to individual organisations. For example, a local council may collect suspension movement from vehicles to detect pot-holes, but this data is not relevant when assessing traffic flow. Supervised feature selection aims to find the set of signals that best predict a target variable. Typical approaches use either measures of correlation or similarity, as in filter methods, or predictive power in a learned model, as in wrapper methods. In both approaches selected features often have high entropies and are not suitable for compression. This is of particular issue in the automotive domain where fast communication and archival of vehicle telemetry data is likely to be prevalent in the near future, especially with technologies such as V2V and V2X. In this paper, we adapt a popular feature selection filter method to consider the compressibility of signals being selected for use in a predictive model. In particular, we add a compression term to the Minimal Redundancy Maximal Relevance (MRMR) filter and introduce Minimal Redundancy Maximal Relevance And Compression (MRMRAC). Using MRMRAC, we then select features from the Controller Area Network (CAN) and predict each of current instantaneous fuel consumption, engine torque, vehicle speed, and gear position, using a Support Vector Machine (SVM). We show that while performance is slightly lower when compression is considered, the compressibility of the selected features is significantly improved

    A Novel Approach for Text Classification

    Get PDF
    Abstract Text Classification (TC) is the process of associating text documents with the classes considered most appropriate, thereby distinguishing topics such as particle physics from optical physics. A lot of research work has been done in this field but there is a need to categorize a collection of text documents into mutually exclusive categories by extracting the concepts or features using supervised learning paradigm and different classification algorithms. In this paper, a new Fuzzy Similarity Based Concept Mining Model (FSCMM) is proposed to classify a set of text documents into pre -defined Category Groups (CG) by providing them training and preparing on the sentence, document and integrated corpora levels along with feature reduction, ambiguity removal on each level to achieve high system performance. Fuzzy Feature Category Similarity Analyzer (FFCSA) is used to analyze each extracted feature of Integrated Corpora Feature Vector (ICFV) with the corresponding categories or classes. This model uses Support Vector Machine Classifier (SVMC) to classify correctly the training data patterns into two groups; i. e., + 1 and -1, thereby producing accurate and correct results. The proposed model works efficiently and effectively with great performance and high -accuracy results

    Preprocessing Techniques to Support Event Detection Data Fusion on Social Media Data

    Get PDF
    This thesis focuses on collection and preprocessing of streaming social media feeds for metadata as well as the visual and textual information. Today, news media has been the main source of immediate news events, large and small. However, the information conveyed on these news sources is delayed due to the lack of proximity and general knowledge of the event. Such news have started relying on social media sources for initial knowledge of these events. Previous works focused on captured textual data from social media as a data source to detect events. This preprocessing framework postures to facilitate the data fusion of images and text for event detection. Results from the preprocessing techniques explained in this work show the textual and visual data collected are able to be proceeded into a workable format for further processing. Moreover, the textual and visual data collected are transformed into bag-of-words vectors for future data fusion and event detection
    corecore