5 research outputs found

    Exploiting synthetically generated data with semi-supervised learning for small and imbalanced datasets

    Get PDF
    Data augmentation is rapidly gaining attention in machine learning. Synthetic data can be generated by simple transformations or through the data distribution. In the latter case, the main challenge is to estimate the label associated to new synthetic patterns. This paper studies the effect of generating synthetic data by convex combination of patterns and the use of these as unsupervised information in a semi-supervised learning framework with support vector machines, avoiding thus the need to label synthetic examples. We perform experiments on a total of 53 binary classification datasets. Our results show that this type of data over-sampling supports the well-known cluster assumption in semi-supervised learning, showing outstanding results for small high-dimensional datasets and imbalanced learning problems

    Property-based biomass feedstock grading using k-Nearest Neighbour technique

    Get PDF
    Abstract: Energy generation from biomass requires a nexus of different sources irrespective of origin. A detailed and scientific understanding of the class to which a biomass resource belongs is therefore highly essential for energy generation. An intelligent classification of biomass resources based on properties offers a high prospect in analytical, operational and strategic decision-making. This study proposes the -Nearest Neighbour (-NN) classification model to classify biomass based on their properties. The study scientifically classified 214 biomass dataset obtained from several articles published in reputable journals. Four different values of (=1,2,3,4) were experimented for various self normalizing distance functions and their results compared for effectiveness and efficiency in order to determine the optimal model. The -NN model based on Mahalanobis distance function revealed a great accuracy at =3 with Root Mean Squared Error (RMSE), Accuracy, Error, Sensitivity, Specificity, False positive rate, Kappa statistics and Computation time (in seconds) of 1.42, 0.703, 0.297, 0.580, 0.953, 0.047, 0.622, and 4.7 respectively. The authors concluded that -NN based classification model is feasible and reliable for biomass classification. The implementation of this classification models shows that -NN can serve as a handy tool for biomass resources classification irrespective of the sources and origins

    Methods for generating and evaluating synthetic longitudinal patient data: a systematic review

    Full text link
    The proliferation of data in recent years has led to the advancement and utilization of various statistical and deep learning techniques, thus expediting research and development activities. However, not all industries have benefited equally from the surge in data availability, partly due to legal restrictions on data usage and privacy regulations, such as in medicine. To address this issue, various statistical disclosure and privacy-preserving methods have been proposed, including the use of synthetic data generation. Synthetic data are generated based on some existing data, with the aim of replicating them as closely as possible and acting as a proxy for real sensitive data. This paper presents a systematic review of methods for generating and evaluating synthetic longitudinal patient data, a prevalent data type in medicine. The review adheres to the PRISMA guidelines and covers literature from five databases until the end of 2022. The paper describes 17 methods, ranging from traditional simulation techniques to modern deep learning methods. The collected information includes, but is not limited to, method type, source code availability, and approaches used to assess resemblance, utility, and privacy. Furthermore, the paper discusses practical guidelines and key considerations for developing synthetic longitudinal data generation methods

    Exploiting Synthetically Generated Data with Semi-Supervised Learning for Small and Imbalanced Datasets

    No full text

    Experimental investigation and modelling of the heating value and elemental composition of biomass through artificial intelligence

    Get PDF
    Abstract: Knowledge advancement in artificial intelligence and blockchain technologies provides new potential predictive reliability for biomass energy value chain. However, for the prediction approach against experimental methodology, the prediction accuracy is expected to be high in order to develop a high fidelity and robust software which can serve as a tool in the decision making process. The global standards related to classification methods and energetic properties of biomass are still evolving given different observation and results which have been reported in the literature. Apart from these, there is a need for a holistic understanding of the effect of particle sizes and geospatial factors on the physicochemical properties of biomass to increase the uptake of bioenergy. Therefore, this research carried out an experimental investigation of some selected bioresources and also develops high-fidelity models built on artificial intelligence capability to accurately classify the biomass feedstocks, predict the main elemental composition (Carbon, Hydrogen, and Oxygen) on dry basis and the Heating value in (MJ/kg) of biomass...Ph.D. (Mechanical Engineering Science
    corecore