19 research outputs found
Rolling bearing fault diagnosis based on health baseline method
In order to excavate the relationship between the different features of the vibration signal, and to provide more useful information for the fault diagnosis of rolling bearings, this paper developed a new method of fault diagnosis-health baseline method and introduced the technological process of this method in detail. Through the case study, a health baseline based on two kinds of linear models was constructed. After testing, this method can distinguish the normal state of the rolling bearing, the external ring fault and the rolling element fault, which indicates that the method was feasible and effective for the fault diagnosis of the rolling bearing
Statistical Comparisons of the Top 10 Algorithms in Data Mining for Classification Task
This work is builds on the study of the 10 top data mining algorithms identified by the IEEE International Conference on Data Mining (ICDM) community in December 2006. We address the same study, but with the application of statistical tests to establish, a more appropriate and justified ranking classifier for classification tasks. Current studies and practices on theoretical and empirical comparison of several methods, approaches, advocated tests that are more appropriate. Thereby, recent studies recommend a set of simple and robust non-parametric tests for statistical comparisons classifiers. In this paper, we propose to perform non-parametric statistical tests by the Friedman test with post-hoc tests corresponding to the comparison of several classifiers on multiple data sets. The tests provide a better judge for the relevance of these algorithms
An Enhanced Random Linear Oracle Ensemble Method using Feature Selection Approach based on Naïve Bayes Classifier
Random Linear Oracle (RLO) ensemble replaced each classifier with two mini-ensembles, allowing base classifiers to be trained using different data set, improving the variety of trained classifiers. Naïve Bayes (NB) classifier was chosen as the base classifier for this research due to its simplicity and computational inexpensive. Different feature selection algorithms are applied to RLO ensemble to investigate the effect of different sized data towards its performance. Experiments were carried out using 30 data sets from UCI repository, as well as 6 learning algorithms, namely NB classifier, RLO ensemble, RLO ensemble trained with Genetic Algorithm (GA) feature selection using accuracy of NB classifier as fitness function, RLO ensemble trained with GA feature selection using accuracy of RLO ensemble as fitness function, RLO ensemble trained with t-test feature selection, and RLO ensemble trained with Kruskal-Wallis test feature selection. The results showed that RLO ensemble could significantly improve the diversity of NB classifier in dealing with distinctively selected feature sets through its fusionselection paradigm. Consequently, feature selection algorithms could greatly benefit RLO ensemble, with properly selected number of features from filter approach, or GA natural selection from wrapper approach, it received great classification accuracy improvement, as well as growth in diversity
An Evaluation of Selection Strategies for Active Learning with Regression
While active learning for classification problems has received considerable attention in recent years, studies on problems of regression are rare. This paper provides a systematic review of the most commonly used selection strategies for active learning within the context of linear regression. The recently developed Exploration Guided Active Learning (EGAL) algorithm, previously deployed within a classification context, is explored as a selection strategy for regression problems. Active learning is demonstrated to significantly improve the learning rate of linear regression models. Experimental results show that a purely diversity-based approach t
Anomaly detection and classification in traffic flow data from fluctuations in the flow-density relationship
We describe and validate a novel data-driven approach to the real time
detection and classification of traffic anomalies based on the identification
of atypical fluctuations in the relationship between density and flow. For
aggregated data under stationary conditions, flow and density are related by
the fundamental diagram. However, high resolution data obtained from modern
sensor networks is generally non-stationary and disaggregated. Such data
consequently show significant statistical fluctuations. These fluctuations are
best described using a bivariate probability distribution in the density-flow
plane. By applying kernel density estimation to high-volume data from the UK
National Traffic Information Service (NTIS), we empirically construct these
distributions for London's M25 motorway. Curves in the density-flow plane are
then constructed, analogous to quantiles of univariate distributions. These
curves quantitatively separate atypical fluctuations from typical traffic
states. Although the algorithm identifies anomalies in general rather than
specific events, we find that fluctuations outside the 95\% probability curve
correlate strongly with the spikes in travel time associated with significant
congestion events. Moreover, the size of an excursion from the typical region
provides a simple, real-time measure of the severity of detected anomalies. We
validate the algorithm by benchmarking its ability to identify labelled events
in historical NTIS data against some commonly used methods from the literature.
Detection rate, time-to-detect and false alarm rate are used as metrics and
found to be generally comparable except in situations when the speed
distribution is bi-modal. In such situations, the new algorithm achieves a much
lower false alarm rate without suffering significant degradation on the other
metrics. This method has the additional advantage of being self-calibrating.Comment: 23 pages, 12 figure
A reduced-uncertainty hybrid evolutionary algorithm for solving dynamic shortest-path routing problem
The need for effective packet transmission to deliver advanced performance in wireless networks creates the need to find shortest network paths efficiently and quickly. This paper addresses a Reduced Uncertainty Based Hybrid Evolutionary Algorithm (RUBHEA) to solve Dynamic Shortest Path Routing Problem (DSPRP) effectively and rapidly. Genetic Algorithm (GA) and Particle Swarm Optimization (PSO) are integrated as a hybrid algorithm to find the best solution within the search space of dynamically changing networks. Both GA and PSO share context of individuals to reduce uncertainty in RUBHEA. Various regions of search space are explored and learned by RUBHEA. By employing a modified priority encoding method, each individual in both GA and PSO are represented as a potential solution for DSPRP. A Complete statistical analysis has been performed to compare the performance of RUBHEA with various state-of-the-art algorithms. It shows that RUBHEA is considerably superior (reducing the failure rate by up to 50%) to similar approaches with increasing number of nodes encountered in the networks
Forecasting Workforce Requirement for State Transportation Agencies: A Machine Learning Approach
A decline in the number of construction engineers and inspectors available at State Transportation Agencies (STAs) to manage the ever-increasing lane miles has emphasized the importance of workforce planning in this sector. One of the crucial aspects of workforce planning involves forecasting the required workforce for any industry or agency. This thesis developed machine learning models to estimate the person-hour requirements of STAs at the agency and project levels. The Arkansas Department of Transportation (ARDOT) was used as a case study, using its employee data between 2012 and 2021. At the project level, machine learning regressors ranging from linear, tree ensembles, kernel-based, and neural network-based models were developed. At the agency level, a classic time series modeling approach, as well as neural networks-based models, were developed to forecast the monthly person-hour requirements of the agency. Parametric and non-parametric tests were employed in comparing the models across both levels. The results indicated a high performance from the random forest regressor, a tree ensemble with bagging, which recorded an average R-squared value of 0.91. The one-dimensional convolutional neural network model was the most effective model for forecasting the monthly person requirements at the agency level. It recorded an average RMSE of 4,500 person-hours monthly over short-range forecasting and an average of 5,000 person-hours monthly over long-range forecasting. These findings underscore the capability of machine learning models to provide more accurate workforce demand forecasts for STAs and the construction industry. This enhanced accuracy in workforce planning will contribute to improved resource allocation and management
Forecasting Workforce Requirement for State Transportation Agencies: A Machine Learning Approach
A decline in the number of construction engineers and inspectors available at State Transportation Agencies (STAs) to manage the ever-increasing lane miles has emphasized the importance of workforce planning in this sector. One of the crucial aspects of workforce planning involves forecasting the required workforce for any industry or agency. This thesis developed machine learning models to estimate the person-hour requirements of STAs at the agency and project levels. The Arkansas Department of Transportation (ARDOT) was used as a case study, using its employee data between 2012 and 2021. At the project level, machine learning regressors ranging from linear, tree ensembles, kernel-based, and neural network-based models were developed. At the agency level, a classic time series modeling approach, as well as neural networks-based models, were developed to forecast the monthly person-hour requirements of the agency. Parametric and non-parametric tests were employed in comparing the models across both levels. The results indicated a high performance from the random forest regressor, a tree ensemble with bagging, which recorded an average R-squared value of 0.91. The one-dimensional convolutional neural network model was the most effective model for forecasting the monthly person requirements at the agency level. It recorded an average RMSE of 4,500 person-hours monthly over short-range forecasting and an average of 5,000 person-hours monthly over long-range forecasting. These findings underscore the capability of machine learning models to provide more accurate workforce demand forecasts for STAs and the construction industry. This enhanced accuracy in workforce planning will contribute to improved resource allocation and management
diseño de investigación para la comparación de algoritmos de machine learning aplicados a la predicción del valor del precio de criptomonedas, a través de pruebas estadísticas de contraste y post hoc, para seleccionar aquellos con el mejor desempeño.
Compara algoritmos de machine learning aplicados a la predicción del valor del precio de criptomonedas, a través de pruebas estadísticas de contraste y post hoc, para seleccionar aquellos con el mejor desempeño