9,465 research outputs found

    Improving the quality of predictive models in small data GSDOT: A new algorithm for generating synthetic data

    Get PDF
    Douzas, G., Lechleitner, M., & Bacao, F. (2022). Improving the quality of predictive models in small data GSDOT: A new algorithm for generating synthetic data. PLoS ONE, 17(4), 1-15. [e0265626]. https://doi.org/10.1371/journal.pone.0265626In the age of the data deluge there are still many domains and applications restricted to the use of small datasets. The ability to harness these small datasets to solve problems through the use of supervised learning methods can have a significant impact in many important areas. The insufficient size of training data usually results in unsatisfactory performance of machine learning algorithms. The current research work aims to contribute to mitigate the small data problem through the creation of artificial instances, which are added to the training process. The proposed algorithm, Geometric Small Data Oversampling Technique, uses geometric regions around existing samples to generate new high quality instances. Experimental results show a significant improvement in accuracy when compared with the use of the initial small dataset as well as other popular artificial data generation techniques.publishersversionpublishe

    Small data oversampling: improving small data prediction accuracy using the geometric SMOTE algorithm

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsIn the age of Big Data, many machine learning tasks in numerous industries are still restricted due to the use of small datasets. The limited availability of data often results in unsatisfactory prediction performance of supervised learning algorithms and, consequently, poor decision making. The current research work aims to mitigate the small dataset problem by artificial data generation in the pre-processing phase of the data analysis process. The oversampling technique Geometric SMOTE is applied to generate new training instances and enhance crisp data structures. Experimental results show a significant improvement on the prediction accuracy when compared with the use of original, small datasets and over other oversampling techniques such as Random Oversampling, SMOTE and Borderline SMOTE. These findings show that artificial data creation is a promising approach to overcome the problem of small data in classification tasks

    Diffusion of Emissions Abating Technology

    Get PDF
    The environmental Kuznets curve (EKC) has been extensively criticized on econometric and theoretical grounds. Recent econometric results and case studies show that national emissions of important pollutants are monotonic in income but changes in technology can lead over time to reductions in pollution - a lowering of the EKC - and that pollution reducing innovations and standards may be adopted with relatively short time lags in some developing countries. This study combines the recent literature on measuring environmental efficiency and technological change using production frontier methods with the use of the Kalman filter - a time series method for signal extraction - to model the state of abatement technology in a panel of countries over time. The EKC is reformulated as the best practice technology frontier - countries' position relative to the frontier reflects the degree to which they have adopted best practice. The results are used to determine whether countries are converging to best practice over time and how many years it will take each country to achieve current best practice. The model is applied to sulfur dioxide emissions from sixteen mainly developed countries.

    Extreme Data Mining: Inference from Small Datasets

    Get PDF
    Neural networks have been applied successfully in many fields. However, satisfactory results can only be found under large sample conditions. When it comes to small training sets, the performance may not be so good, or the learning task can even not be accomplished. This deficiency limits the applications of neural network severely. The main reason why small datasets cannot provide enough information is that there exist gaps between samples, even the domain of samples cannot be ensured. Several computational intelligence techniques have been proposed to overcome the limits of learning from small datasets. We have the following goals: i. To discuss the meaning of small in the context of inferring from small datasets. ii. To overview computational intelligence solutions for this problem. iii. To illustrate the introduced concepts with a real-life application

    A Fuzzy Approach to the Synthesis of Cognitive Maps for Modeling Decision Making in Complex Systems

    Get PDF
    The object of this study is fuzzy cognitive modeling as a means of studying semistructured socio-economic systems. The features of constructing cognitive maps, providing the ability to choose management decisions in complex semistructured socio-economic systems, are described. It is shown that further improvement of technologies necessary for developing decision support systems and their practical use is still relevant. This work aimed to improve the accuracy of cognitive modeling of semistructured systems based on a fuzzy cognitive map of structuring nonformalized situations (MSNS) with the evaluation of root-mean-square error (RMSE) and mean average squared error (MASE) coefficients. In order to achieve the goal, the following main methods were used: systems analysis methods, fuzzy logic and fuzzy sets theory postulates, theory of integral wavelet transform, correlation and autocorrelation analyses. As a result, a new methodology for constructing MSNS was proposed—a map of structuring nonformalized situations that combines the positive properties of previous fuzzy cognitive maps. The solution of modeling problems based on this methodology should increase the reliability and quality of analysis and modeling of semistructured systems and processes under uncertainty. The analysis using open datasets proved that compared to the classical ARIMA, SVR, MLP, and Fuzzy time series models, our proposed model provides better performance in terms of MASE and RMSE metrics, which confirms its advantage. Thus, it is advisable to use our proposed algorithm in the future as a mathematical basis for developing software tools for the analysis and modeling of problems in semistructured systems and processes. Doi: 10.28991/ESJ-2022-06-02-012 Full Text: PD

    Where is the future of China’s biogas? Review, forecast, and policy implications

    Get PDF
    This paper discusses the history and present status of different categories of biogas production in China, most of which are classified into rural household production, agriculture-based engineering production, and industry-based engineering production. To evaluate the future biogas production of China, five models including the Hubbert model, the Weibull model, the generalized Weng model, the H–C–Z model, and the Grey model are applied to analyze and forecast the biogas production of each province and the entire country. It is proved that those models which originated from oil research can also be applied to other energy sources. The simulation results reveal that China’s total biogas production is unlikely to keep on a fast-growing trend in the next few years, mainly due to a recent decrease in rural household production, and this greatly differs from the previous goal set by the official department. In addition, China’s biogas production will present a more uneven pattern among regions in the future. This paper will give preliminary explanation for the regional difference of the three biogas sectors and propose some recommendations for instituting corresponding policies and strategies to promote the development of the biogas industry in China

    A Probabilistic-Based Approach to Monitoring Tool Wear State and Assessing Its Effect on Workpiece Quality in Nickel-Based Alloys

    Get PDF
    The objective of this research is first to investigate the applicability and advantage of statistical state estimation methods for predicting tool wear in machining nickel-based superalloys over deterministic methods, and second to study the effects of cutting tool wear on the quality of the part. Nickel-based superalloys are among those classes of materials that are known as hard-to-machine alloys. These materials exhibit a unique combination of maintaining their strength at high temperature and have high resistance to corrosion and creep. These unique characteristics make them an ideal candidate for harsh environments like combustion chambers of gas turbines. However, the same characteristics that make nickel-based alloys suitable for aggressive conditions introduce difficulties when machining them. High strength and low thermal conductivity accelerate the cutting tool wear and increase the possibility of the in-process tool breakage. A blunt tool nominally deteriorates the surface integrity and damages quality of the machined part by inducing high tensile residual stresses, generating micro-cracks, altering the microstructure or leaving a poor roughness profile behind. As a consequence in this case, the expensive superalloy would have to be scrapped. The current dominant solution for industry is to sacrifice the productivity rate by replacing the tool in the early stages of its life or to choose conservative cutting conditions in order to lower the wear rate and preserve workpiece quality. Thus, monitoring the state of the cutting tool and estimating its effects on part quality is a critical task for increasing productivity and profitability in machining superalloys. This work aims to first introduce a probabilistic-based framework for estimating tool wear in milling and turning of superalloys and second to study the detrimental effects of functional state of the cutting tool in terms of wear and wear rate on part quality. In the milling operation, the mechanisms of tool failure were first identified and, based on the rapid catastrophic failure of the tool, a Bayesian inference method (i.e., Markov Chain Monte Carlo, MCMC) was used for parameter calibration of tool wear using a power mechanistic model. The calibrated model was then used in the state space probabilistic framework of a Kalman filter to estimate the tool flank wear. Furthermore, an on-machine laser measuring system was utilized and fused into the Kalman filter to improve the estimation accuracy. In the turning operation the behavior of progressive wear was investigated as well. Due to the nonlinear nature of wear in turning, an extended Kalman filter was designed for tracking progressive wear, and the results of the probabilistic-based method were compared with a deterministic technique, where significant improvement (more than 60% increase in estimation accuracy) was achieved. To fulfill the second objective of this research in understanding the underlying effects of wear on part quality in cutting nickel-based superalloys, a comprehensive study on surface roughness, dimensional integrity and residual stress was conducted. The estimated results derived from a probabilistic filter were used for finding the proper correlations between wear, surface roughness and dimensional integrity, along with a finite element simulation for predicting the residual stress profile for sharp and worn cutting tool conditions. The output of this research provides the essential information on condition monitoring of the tool and its effects on product quality. The low-cost Hall effect sensor used in this work to capture spindle power in the context of the stochastic filter can effectively estimate tool wear in both milling and turning operations, while the estimated wear can be used to generate knowledge of the state of workpiece surface integrity. Therefore the true functionality and efficiency of the tool in superalloy machining can be evaluated without additional high-cost sensing

    3D Printing in the Era of the Prosumer: The Role of Technology Readiness, Gender, and Age in User Acceptance of Desktop 3D Printing in American Households

    Get PDF
    Technology acceptance of Desktop 3D printing for fabrication at home is an emerging field of research in Asia and Europe. The proposal explains how Desktop 3D printing provides an innovative manufacturing alternative to the traditional manufacturing processes and as such facilitates innovation among prosumers. The link of how such innovations have the potential to sustain economic growth is also explained thus substantiating the need to understand the Technology acceptance of Desktop 3D printing for fabrication at home. The unified theory of acceptance and use of technology (UTAUT) model (Williams et al., 2015) was the most commonly used model in previous research to study the adoption of Desktop 3D printing for fabrication at home. The current research proposes an extension to the UTAUT model that accounts for the Technology Readiness of the individual. The extended UTAUT model is applied to study the acceptance of Desktop 3D printing for fabrication in American households which will be a new contribution to the literature. Partial Least Squares Structural Equation Modeling (PLS-SEM) is proposed to analyze the extended UTAUT model to determine the key factors that influence the acceptance of Desktop 3D printing. A multi-group analysis based on Gender is also proposed to identify how significant the differences are in the key factors. This research contributes theoretically to the emerging stream of research that focuses on integrating technology acceptance theories with the Technology readiness concept. Practically, this research contributes to the techno-marketing literature of 3D printer manufactures that seek to increase the adoption rate of Desktop 3D printers by women in American households
    • …
    corecore