20 research outputs found

    Tracking the Temporal-Evolution of Supernova Bubbles in Numerical Simulations

    Get PDF
    The study of low-dimensional, noisy manifolds embedded in a higher dimensional space has been extremely useful in many applications, from the chemical analysis of multi-phase flows to simulations of galactic mergers. Building a probabilistic model of the manifolds has helped in describing their essential properties and how they vary in space. However, when the manifold is evolving through time, a joint spatio-temporal modelling is needed, in order to fully comprehend its nature. We propose a first-order Markovian process that propagates the spatial probabilistic model of a manifold at fixed time, to its adjacent temporal stages. The proposed methodology is demonstrated using a particle simulation of an interacting dwarf galaxy to describe the evolution of a cavity generated by a Supernov

    Recent Developments in Smart Healthcare

    Get PDF
    Medicine is undergoing a sector-wide transformation thanks to the advances in computing and networking technologies. Healthcare is changing from reactive and hospital-centered to preventive and personalized, from disease focused to well-being centered. In essence, the healthcare systems, as well as fundamental medicine research, are becoming smarter. We anticipate significant improvements in areas ranging from molecular genomics and proteomics to decision support for healthcare professionals through big data analytics, to support behavior changes through technology-enabled self-management, and social and motivational support. Furthermore, with smart technologies, healthcare delivery could also be made more efficient, higher quality, and lower cost. In this special issue, we received a total 45 submissions and accepted 19 outstanding papers that roughly span across several interesting topics on smart healthcare, including public health, health information technology (Health IT), and smart medicine

    Data balancing approaches in quality, defect, and pattern analysis

    Get PDF
    The imbalanced ratio of data is one of the most significant challenges in various industrial domains. Consequently, numerous data-balancing approaches have been proposed over the years. However, most of these data-balancing methods come with their own limitations that can potentially impact data-driven decision-making models in critical sectors such as product quality assurance, manufacturing defect identification, and pattern recognition in healthcare diagnostics. This dissertation addresses three research questions related to data-balancing approaches: 1) What are the scopes of data-balancing approaches toward the major and minor samples? 2) What is the effect of traditional Machine Learning (ML) and Synthetic Minority Over-sampling Technique (SMOTE)-based data-balancing on imbalanced data analysis? and 3) How does imbalanced data affect the performance of Deep Learning (DL)-based models? To achieve these objectives, this dissertation thoroughly analyzes existing reference works and identifies their limitations. It has been observed that most existing data-balancing approaches have several limitations, such as creating noise during oversampling, removing important information during undersampling, and being unable to perform well with multidimensional data. Furthermore, it has also been observed that SMOTE-based approaches have been the most widely used data-balancing approaches as they can create synthetic samples that are easy to implement compared to other existing techniques. However, SMOTE also has its limitations, and therefore, it is required to identify whether there is any significant effect of SMOTE-based oversampled approaches on ML-based data-driven models' performance. To do that, the study conducts several hypothesis tests considering several popular ML algorithms with and without hyperparameter settings. Based on the overall hypothesis, it is found that, in many cases based on the reference dataset, there is no significant performance improvement on data-driven ML models once the imbalanced data is balanced using SMOTE approaches. Additionally, the study finds that SMOTE-based synthetic samples often do not follow the Gaussian distribution or do not follow the same distribution of the data as the original dataset. Therefore, the study suggests that Generative Adversarial Network (GAN)-based approaches could be a better alternative to develop more realistic samples and might overcome the limitations of SMOTE-based data-balancing approaches. However, GAN is often difficult to train, and very limited studies demonstrate the promising outcome of GAN-based tabular data balancing as GAN is mainly developed for image data generation. Additionally, GAN is hard to train as it is computationally not efficient. To overcome such limitations, the present study proposes several data-balancing approaches such as GAN-based oversampling (GBO), Support Vector Machine (SVM)-SMOTE-GAN (SSG), and Borderline-SMOTE-GAN (BSGAN). The proposed approaches outperform existing SMOTE-based data-balancing approaches in various highly imbalanced tabular datasets and can produce realistic samples. Additionally, the oversampled data follows the distribution of the original dataset. The dissertation later examines two case scenarios where data-balancing approaches can play crucial roles, specifically in healthcare diagnostics and additive manufacturing. The study considers several Chest radiography (X-ray) and Computed Tomography (CT)-scan image datasets for the healthcare diagnostics scenario to detect patients with COVID-19 symptoms. The study employs six different Transfer Learning (TL) approaches, namely Visual Geometry Group (VGG)16, Residual Network (ResNet)50, ResNet101, Inception-ResNet Version 2 (InceptionResNetV2), Mobile Network version 2 (MobileNetV2), and VGG19. Based on the overall analysis, it has been observed that, except for the ResNet-based model, most of the TL models have been able to detect patients with COVID-19 symptoms with an accuracy of almost 99\%. However, one potential drawback of TL approaches is that the models have been learning from the wrong regions. For example, instead of focusing on the infected lung regions, the TL-based models have been focusing on the non-infected regions. To address this issue, the study has updated the TL-based models to reduce the models' wrong localization. Similarly, the study conducts an additional investigation on an imbalanced dataset containing defect and non-defect images of 3D-printed cylinders. The results show that TL-based models are unable to locate the defect regions, highlighting the challenge of detecting defects using imbalanced data. To address this limitation, the study proposes preprocessing-based approaches, including algorithms such as Region of Interest Net (ROIN), Region of Interest and Histogram Equalizer Net (ROIHEN), and Region of Interest with Histogram Equalization and Details Enhancer Net (ROIHEDEN) to improve the model's performance and accurately identify the defect region. Furthermore, this dissertation employs various model interpretation techniques, such as Local Interpretable Model-Agnostic Explanations (LIME), SHapley Additive exPlanations (SHAP), and Gradient-weighted Class Activation Mapping (Grad-CAM), to gain insights into the features in numerical, categorical, and image data that characterize the models' predictions. These techniques are used across multiple experiments and significantly contribute to a better understanding the models' decision-making processes. Lastly, the study considers a small mixed dataset containing numerical, categorical, and image data. Such diverse data types are often challenging for developing data-driven ML models. The study proposes a computationally efficient and simple ML model to address these data types by leveraging the Multilayer Perceptron and Convolutional Neural Network (MLP-CNN). The proposed MLP-CNN models demonstrate superior accuracy in identifying COVID-19 patients' patterns compared to existing methods. In conclusion, this research proposes various approaches to tackle significant challenges associated with class imbalance problems, including the sensitivity of ML models to multidimensional imbalanced data, distribution issues arising from data expansion techniques, and the need for model explainability and interpretability. By addressing these issues, this study can potentially mitigate data balancing challenges across various industries, particularly those that involve quality, defect, and pattern analysis, such as healthcare diagnostics, additive manufacturing, and product quality. By providing valuable insights into the models' decision-making process, this research could pave the way for developing more accurate and robust ML models, thereby improving their performance in real-world applications

    Energy Data Analytics for Smart Meter Data

    Get PDF
    The principal advantage of smart electricity meters is their ability to transfer digitized electricity consumption data to remote processing systems. The data collected by these devices make the realization of many novel use cases possible, providing benefits to electricity providers and customers alike. This book includes 14 research articles that explore and exploit the information content of smart meter data, and provides insights into the realization of new digital solutions and services that support the transition towards a sustainable energy system. This volume has been edited by Andreas Reinhardt, head of the Energy Informatics research group at Technische Universität Clausthal, Germany, and Lucas Pereira, research fellow at Técnico Lisboa, Portugal

    AVATAR - Machine Learning Pipeline Evaluation Using Surrogate Model

    Get PDF
    © 2020, The Author(s). The evaluation of machine learning (ML) pipelines is essential during automatic ML pipeline composition and optimisation. The previous methods such as Bayesian-based and genetic-based optimisation, which are implemented in Auto-Weka, Auto-sklearn and TPOT, evaluate pipelines by executing them. Therefore, the pipeline composition and optimisation of these methods requires a tremendous amount of time that prevents them from exploring complex pipelines to find better predictive models. To further explore this research challenge, we have conducted experiments showing that many of the generated pipelines are invalid, and it is unnecessary to execute them to find out whether they are good pipelines. To address this issue, we propose a novel method to evaluate the validity of ML pipelines using a surrogate model (AVATAR). The AVATAR enables to accelerate automatic ML pipeline composition and optimisation by quickly ignoring invalid pipelines. Our experiments show that the AVATAR is more efficient in evaluating complex pipelines in comparison with the traditional evaluation approaches requiring their execution

    Big Data mining and machine learning techniques applied to real world scenarios

    Get PDF
    Data mining techniques allow the extraction of valuable information from heterogeneous and possibly very large data sources, which can be either structured or unstructured. Unstructured data, such as text files, social media, mobile data, are much more than structured data, and grow at a higher rate. Their high volume and the inherent ambiguity of natural language make unstructured data very hard to process and analyze. Appropriate text representations are therefore required in order to capture word semantics as well as to preserve statistical information, e.g. word counts. In Big Data scenarios, scalability is also a primary requirement. Data mining and machine learning approaches should take advantage of large-scale data, exploiting abundant information and avoiding the curse of dimensionality. The goal of this thesis is to enhance text understanding in the analysis of big data sets, introducing novel techniques that can be employed for the solution of real world problems. The presented Markov methods temporarily achieved the state-of-the-art on well-known Amazon reviews corpora for cross-domain sentiment analysis, before being outperformed by deep approaches in the analysis of large data sets. A noise detection method for the identification of relevant tweets leads to 88.9% accuracy in the Dow Jones Industrial Average daily prediction, which is the best result in literature based on social networks. Dimensionality reduction approaches are used in combination with LinkedIn users' skills to perform job recommendation. A framework based on deep learning and Markov Decision Process is designed with the purpose of modeling job transitions and recommending pathways towards a given career goal. Finally, parallel primitives for vendor-agnostic implementation of Big Data mining algorithms are introduced to foster multi-platform deployment, code reuse and optimization

    Advances in knowledge discovery and data mining Part II

    Get PDF
    19th Pacific-Asia Conference, PAKDD 2015, Ho Chi Minh City, Vietnam, May 19-22, 2015, Proceedings, Part II</p

    A facilities maintenance management process based on degradation prediction using sensed data

    Get PDF
    Energy efficiency and user comfort have recently become priorities in the Facility Management (FM) sector. This has resulted in the use of innovative building components, such as thermal solar panels, heat pumps, etc., as they have potential to provide better performance, energy savings and increased user comfort. However, as the complexity of components increases, the requirement for maintenance management also increases. The standard routine for building maintenance is inspection which results in repairs or replacement when a fault is found. This routine leads to unnecessary inspections which have a cost with respect to downtime of a component and work hours. This research proposes an alternative routine: performing building maintenance at the point in time when the component is degrading and requires maintenance, thus reducing the frequency of unnecessary inspections. This thesis demonstrates that statistical techniques can be used as part of a maintenance management methodology to invoke maintenance before failure occurs. The proposed FM process is presented through a scenario utilising current Building Information Modelling (BIM) technology and innovative contractual and organisational models. This FM scenario supports a Degradation based Maintenance (DbM) scheduling methodology, implemented using two statistical techniques, Particle Filters (PFs) and Gaussian Processes (GPs). DbM consists of extracting and tracking a degradation metric for a component. Limits for the degradation metric are identified based on one of a number of proposed processes. These processes determine the limits based on the maturity of the historical information available. DbM is implemented for three case study components: a heat exchanger; a heat pump; and a set of bearings. The identified degradation points for each case study, from a PF, a GP and a hybrid (PF and GP combined) DbM implementation are assessed against known degradation points. The GP implementations are successful for all components. For the PF implementations, the results presented in this thesis find that the extracted metrics and limits identify degradation occurrences accurately for components which are in continuous operation. For components which have seasonal operational periods, the PF may wrongly identify degradation. The GP performs more robustly than the PF, but the PF, on average, results in fewer false positives. The hybrid implementations, which are a combination of GP and PF results, are successful for 2 of 3 case studies and are not affected by seasonal data. Overall, DbM is effectively applied for the three case study components. The accuracy of the implementations is dependant on the relationships modelled by the PF and GP, and on the type and quantity of data available. This novel maintenance process can improve equipment performance and reduce energy wastage from BSCs operation
    corecore