6 research outputs found

    Evaluation of Principal Components Analysis (Pca) and Data Clustering Techniques (Dct) on Medical Data

    Get PDF
    The present study investigates the performance analysis of PCA filters and six clustering algorithms on the medical data (Hepatitis) which happens to be multidimensional and of high dimension with complexities much more than the conventional data. By Clustering process data reduction is achieved in order to obtain an efficient processing time to mitigate a curse of dimensionality. Usually, in medical diagnosis, the chief guiding symptoms (rubrics) coupled with the clinical tests help in accurate diagnosis of the diseases/disorders. Hence, the primary factors have maximum impact/influence on the detection of the specific disorders. Therefore, the present study is undertaken and the results predict that farthestfirst clustering algorithm happens to be the best clustering algorithm without PCA filter in general, while cobweb clustering algorithm could be preferred with PCA filter in some other medical datasets

    Season-Based Occupancy Prediction in Residential Buildings Using Data Mining Techniques

    Get PDF
    Considering the continuous increase of global energy consumption and the fact that buildings account for a large part of electricity use, it is essential to reduce energy consumption in buildings to mitigate greenhouse gas emissions and costs for both building owners and tenants. A reliable occupancy prediction model plays a critical role in improving the performance of energy simulation and occupant-centric building operations. In general, occupancy and occupant activities differ by season, and it is important to account for the dynamic nature of occupancy in simulations and to propose energy-efficient strategies. The present work aims to develop a data mining-based framework, including feature selection and the establishment of seasonal-customized occupancy prediction (SCOP) models to predict the occupancy in buildings considering different seasons. In the proposed framework, the recursive feature elimination with cross-validation (RFECV) feature selection was first implemented to select the optimal variables concerning the highest prediction accuracy. Later, six machine learning (ML) algorithms were considered to establish four SCOP models to predict occupancy presence, and their prediction performances were compared in terms of prediction accuracy and computational cost. To evaluate the effectiveness of the developed data mining framework, it was applied to an apartment in Lyon, France. The results show that the RFECV process reduced the computational time while improving the ML models’ prediction performances. Additionally, the SCOP models could achieve higher prediction accuracy than the conventional prediction model measured by performance evaluation metrics of F-1 score and area under the curve. Among the considered ML models, the gradient-boosting decision tree, random forest, and artificial neural network showed better performances, achieving more than 85% accuracy in Summer, Fall, and Winter, and over 80% in Spring. The essence of the framework is valuable for developing strategies for building energy consumption estimation and higher-resolution occupancy level prediction, which are easily influenced by seasons

    A Practical Tool for Visualizing and Data Mining Medical Time Series

    No full text
    The increasing interest in time series data mining in the last decade has had surprisingly little impact on real world medical applications. Real world practitioners who work with time series on a daily basis rarely take advantage of the wealth of tools that the data mining community has made available. In this work, we attempt to address this problem by introducing a simple parameter-light tool that allows users to efficiently navigate through large collections of time series. Our system has the unique advantage that it can be embedded directly into any standard graphical user interfaces, such as Microsoft Windows, thus making deployment easier. Our approach extracts features from a time series of arbitrary length and uses information about the relative frequency of these features to color a bitmap in a principled way. By visualizing the similarities and differences within a collection of bitmaps, a user can quickly discover clusters, anomalies, and other regularities within their data collection. We demonstrate the utility of our approach with a set of comprehensive experiments on real datasets from a variety of medical domains, including ECGs and EEGs

    A Practical Tool for Visualizing and Data Mining Medical Time Series

    No full text
    The increasing interest in time series data mining in the last decade has had surprisingly little impact on real world medical applications. Real world practitioners who work with time series on a daily basis rarely take advantage of the wealth of tools that the data mining community has made available. In this work, we attempt to address this problem by introducing a simple parameter-light tool that allows users to efficiently navigate through large collections of time series. Our system has the unique advantage that it can be embedded directly into any standard graphical user interfaces, such as Microsoft Windows, thus making deployment easier. Our approach extracts features from a time series of arbitrary length and uses information about the relative frequency of these features to color a bitmap in a principled way. By visualizing the similarities and differences within a collection of bitmaps, a user can quickly discover clusters, anomalies, and other regularities within their data collection. We demonstrate the utility of our approach with a set of comprehensive experiments on real datasets from a variety of medical domains, including ECGs and EEGs

    A Practical Tool for Visualizing and Data Mining Medical Time Series Li Wei

    No full text
    The increasing interest in time series data mining has had surprisingly little impact on real world medical applications. Practitioners who work with time series on a daily basis rarely take advantage of the wealth of tools that the data mining community has made available. In this work, we attempt to address this problem by introducing a parameter-light tool that allows users to efficiently navigate through large collections of time series. Our approach extracts features from a time series of arbitrary length and uses information about the relative frequency of these features to color a bitmap in a principled way. By visualizing the similarities and differences within a collection of bitmaps, a user can quickly discover clusters, anomalies, and other regularities within the data collection. We demonstrate the utility of our approach with a set of comprehensive experiments on real datasets from a variety of medical domains. 1
    corecore