7,275 research outputs found
Recommended from our members
EARLY-WARNING PREDICTION FOR MACHINE FAILURES IN AUTOMATED INDUSTRIES USING ADVANCED MACHINE LEARNING TECHNIQUES
This Culminating Experience Project explores the use of machine learning algorithms to detect machine failure. The research questions are: Q1) How does the quality of input data, including issues such as outliers, and noise, impact the accuracy and reliability of machine failure prediction models in industrial settings? Q2) How does the integration of SMOTE with feature engineering techniques influence the overall performance of machine learning models in detecting and preventing machine failures? Q3) What is the performance of different machine learning algorithms in predicting machine failures, and which algorithm is the most effective? The research findings are: Q1) Effective outlier handling is vital for predictive maintenance as the variables distribution initially showed a right-skewed pattern but after rectifying, it became more centralized, with correlations between specific sensors showing potential for further exploration. Q2) Data balancing through SMOTE and feature engineering is essential due to the rarity of actual failure instances. Substantial challenges are observed when predicting \u27Failure\u27 instances, with a lower true positive rate (73%), resulting in low precision (0.02) and recall (0.73) for \u27Failure\u27 predictions. This is further reflected in the low F1-Score (0.03) for \u27Failure,\u27 indicating a trade-off between precision and recall. Despite a commendable overall accuracy of 94%, the class imbalance within the dataset (92,200 \u27Running\u27 instances vs. 126 \u27Failure\u27 instances) remains a contributing factor to the model\u27s limitations. Q3) Machine learning algorithm performance varies, with Catboost excelling in accuracy and failure detection. The choice of algorithm and continuous model refinement are critical for enhanced predictive accuracy in industrial contexts. The main conclusions are: Q1) Addressing outliers in data preprocessing significantly enhances the accuracy of machine failure prediction models. Q2) focuses on addressing the issue of equipment failure parameter imbalance. It was found in the research findings that there was a significant imbalance in the failure data, with only 0.14% of the dataset representing actual failures and 99.86% of the dataset pertaining to non-failure data. This extreme class disparity can result in biased models that underperform on underrepresented classes, which is a common problem in machine learning. Q3) Catboost outperforms other algorithms in predicting machine failures with remarkable accuracy and failure detection rates of 92% accuracy and 99% times it is correct, and further exploration of diverse data and algorithms is needed for tailored industrial applications. Future research areas include advanced outlier handling, sensor relationships, and data balancing for improved model accuracy. Addressing rare failures, enhancing model performance, and exploring diverse machine learning algorithms are critical for advancing predictive maintenance
Automated Classification of Airborne Laser Scanning Point Clouds
Making sense of the physical world has always been at the core of mapping. Up
until recently, this has always dependent on using the human eye. Using
airborne lasers, it has become possible to quickly "see" more of the world in
many more dimensions. The resulting enormous point clouds serve as data sources
for applications far beyond the original mapping purposes ranging from flooding
protection and forestry to threat mitigation. In order to process these large
quantities of data, novel methods are required. In this contribution, we
develop models to automatically classify ground cover and soil types. Using the
logic of machine learning, we critically review the advantages of supervised
and unsupervised methods. Focusing on decision trees, we improve accuracy by
including beam vector components and using a genetic algorithm. We find that
our approach delivers consistently high quality classifications, surpassing
classical methods
An assessment of the effectiveness of using data analytics to predict death claim seasonality and protection policy review lapses in a life insurance company
Data analytics tools are becoming increasingly common in the life insurance industry. This research considers two use cases for predictive analytics in a life insurance company based in Ireland. The first case study relates to the use of time series models to forecast the seasonality of death claim notifications. The baseline model predicted no seasonal variation in death claim notifications over a calendar year. This reflects the life insurance company’s current approach, whereby it is assumed that claims are notified linearly over a calendar year. More accurate forecasting of death claims seasonality would enhance the life insurance company’s cashflow planning and analysis of financial results. The performance of five time series models was compared against the baseline model. The time series models included a simple historical average model, a classical SARIMA model, the Random Forest Regressor and Prophet machine learning models and the LSTM deep learning model. The models were trained on both the life insurance company’s historical death claims data and on Irish population deaths data for the 25-74 age cohort over the same observation periods. The results demonstrated that machine learning time series models were generally more effective than the baseline model in forecasting death claim seasonality. It was also demonstrated that models trained on both Irish population deaths and the life insurance company’s historical death claims could outperform the baseline model. The best forecaster was Facebook’s Prophet model, trained on the life insurance company’s claims data. Each of the models trained on Irish population deaths data outperformed the baseline model. The SARIMA and LSTM consistently underperformed the baseline model when both were trained on death claims data. All models performed better when claims directly related to Covid-19 were removed from the testing data. The second case study relates to the use of classification models to predict protection policy lapse behaviour following a policy review. The life insurance company currently has no method of predicting individual policy lapses, hence the baseline model assumed that all policies had an equal probability of lapsing. More accurate prediction of policy review lapse outcomes would enhance the life insurance company’s profit forecasting ability. It would also provide the company with the opportunity to potentially reduce lapse rates at policy review by tailoring alternative options for certain groups of policyholders. The performance of 12 classification models was assessed against the baseline model - KNN, Naïve Bayes, Support Vector Machine, Decision Tree, Random Forest, Extra Trees, XGBoost, LightGBM, AdaBoost and Multi-Layer Perceptron (MLP). To address class imbalance in the data, 11 rebalancing techniques were assessed. These included cost-sensitive algorithms (Class Weight Balancing), oversampling (Random Oversampling, ADASYN, SMOTE, Borderline SMOTE), undersampling (Random Undersampling, and Near Miss versions 1 to 3) as well as a combination of oversampling and undersampling (SMOTETomek and SMOTEENN). When combined with rebalancing methods, the predictive capacity of the classification models outperformed the baseline model in almost every case. However, results varied by train/test split and by evaluation metric. Oversampling models performed best on F1 Score and ROC-AUC while SMOTEENN and the undersampling models generated the highest levels of Recall. The top F1 Score was generated by the Naïve Bayes model when combined with SMOTE. The MLP model generated the highest ROC-AUC when combined with BorderlineSMOTE. The results of both case studies demonstrate that data analytics techniques can enhance a life insurance company’s predictive toolkit. It is recommended that further opportunities to enhance the predictive ability of the time series and classification models be explored
Applying Machine Learning to Biological Status (QValues) from Physio-chemical Conditions of Irish Rivers
This thesis evaluates and optimises a variety of predictive models for assessing biological classification status, with an emphasis on water quality monitoring. Grounded in previous pertinent studies, it builds on the findings of (Arrighi and Castelli, 2023) concerning Tuscany’s river catchments, highlighting a solid correlation between river ecological status and parameters like summer climate and land use. They achieved an 80% prediction precision using the Random Forest algorithm, particularly adept at identifying good ecological conditions, leveraging a dataset devoid of chemical data
Predictive Customer Lifetime value modeling: Improving customer engagement and business performance
CookUnity, a meal subscription service, has witnessed substantial annual revenue growth over the past three years. However, this growth has primarily been driven by the acquisition of new users to expand the customer base, rather than an evident increase in customers' spending levels. If it weren't for the raised subscription prices, the company's customer lifetime value (CLV) would have remained the same as it was three years ago. Consequently, the company's leadership recognizes the need to adopt a holistic approach to unlock an enhancement in CLV.
The objective of this thesis is to develop a comprehensive understanding of CLV, its implications, and how companies leverage it to inform strategic decisions. Throughout the course of this study, our central focus is to deliver a fully functional and efficient machine learning solution to CookUnity. This solution will possess exceptional predictive capabilities, enabling accurate forecasting of each customer's future CLV. By equipping CookUnity with this powerful tool, our aim is to empower the company to strategically leverage CLV for sustained growth.
To achieve this objective, we analyze various methodologies and approaches to CLV analysis, evaluating their applicability and effectiveness within the context of CookUnity. We thoroughly explore available data sources that can serve as predictors of CLV, ensuring the incorporation of the most relevant and meaningful variables in our model. Additionally, we assess different research methodologies to identify the top-performing approach and examine its implications for implementation at CookUnity.
By implementing data-driven strategies based on our predictive CLV model, CookUnity will be able to optimize order levels and maximize the lifetime value of its customer base. The outcome of this thesis will be a robust ML solution with remarkable prediction accuracy and practical usability within the company. Furthermore, the insights gained from our research will contribute to a broader understanding of CLV in the subscription-based business context, stimulating further exploration and advancement in this field of study
A Comprehensive Survey on Rare Event Prediction
Rare event prediction involves identifying and forecasting events with a low
probability using machine learning and data analysis. Due to the imbalanced
data distributions, where the frequency of common events vastly outweighs that
of rare events, it requires using specialized methods within each step of the
machine learning pipeline, i.e., from data processing to algorithms to
evaluation protocols. Predicting the occurrences of rare events is important
for real-world applications, such as Industry 4.0, and is an active research
area in statistical and machine learning. This paper comprehensively reviews
the current approaches for rare event prediction along four dimensions: rare
event data, data processing, algorithmic approaches, and evaluation approaches.
Specifically, we consider 73 datasets from different modalities (i.e.,
numerical, image, text, and audio), four major categories of data processing,
five major algorithmic groupings, and two broader evaluation approaches. This
paper aims to identify gaps in the current literature and highlight the
challenges of predicting rare events. It also suggests potential research
directions, which can help guide practitioners and researchers.Comment: 44 page
Revolutionizing Global Food Security: Empowering Resilience through Integrated AI Foundation Models and Data-Driven Solutions
Food security, a global concern, necessitates precise and diverse data-driven
solutions to address its multifaceted challenges. This paper explores the
integration of AI foundation models across various food security applications,
leveraging distinct data types, to overcome the limitations of current deep and
machine learning methods. Specifically, we investigate their utilization in
crop type mapping, cropland mapping, field delineation and crop yield
prediction. By capitalizing on multispectral imagery, meteorological data, soil
properties, historical records, and high-resolution satellite imagery, AI
foundation models offer a versatile approach. The study demonstrates that AI
foundation models enhance food security initiatives by providing accurate
predictions, improving resource allocation, and supporting informed
decision-making. These models serve as a transformative force in addressing
global food security limitations, marking a significant leap toward a
sustainable and secure food future
Exploring Data Mining Techniques for Tree Species Classification Using Co-Registered LiDAR and Hyperspectral Data
NASA Goddard’s LiDAR, Hyperspectral, and Thermal imager provides co-registered remote sensing data on experimental forests. Data mining methods were used to achieve a final tree species classification accuracy of 68% using a combined LiDAR and hyperspectral dataset, and show promise for addressing deforestation and carbon sequestration on a species-specific level
Advancements in Multi-temporal Remote Sensing Data Analysis Techniques for Precision Agriculture
L'abstract è presente nell'allegato / the abstract is in the attachmen
Understanding Emojis for Financial Sentiment Analysis
Social media content has been widely used for financial forecasting and sentiment analysis. However, emojis as a new “lingua franca” on social media are often omitted during standard data pre-processing processes, we thus speculate that they may carry additional useful information. In this research, we study the effect of emojis in facilitating financial sentiment analysis and explore the most effective way to handle them during model training. Experiments are conducted on two datasets from stock and crypto markets. Various machine learning models, deep learning models, and the state-of-the-art GPT-based model are used, and we compare their performances across different emoji encodings. Results show a consistent increase in model performances when emojis are converted to their descriptive phrases, and significant enhancements after refining the descriptive terms of the most important emojis before fitting them into the models. Our research shows that emojis are a valuable source for better understanding financial social media texts that cannot be omitted
- …