7,275 research outputs found

    Automated Classification of Airborne Laser Scanning Point Clouds

    Full text link
    Making sense of the physical world has always been at the core of mapping. Up until recently, this has always dependent on using the human eye. Using airborne lasers, it has become possible to quickly "see" more of the world in many more dimensions. The resulting enormous point clouds serve as data sources for applications far beyond the original mapping purposes ranging from flooding protection and forestry to threat mitigation. In order to process these large quantities of data, novel methods are required. In this contribution, we develop models to automatically classify ground cover and soil types. Using the logic of machine learning, we critically review the advantages of supervised and unsupervised methods. Focusing on decision trees, we improve accuracy by including beam vector components and using a genetic algorithm. We find that our approach delivers consistently high quality classifications, surpassing classical methods

    An assessment of the effectiveness of using data analytics to predict death claim seasonality and protection policy review lapses in a life insurance company

    Get PDF
    Data analytics tools are becoming increasingly common in the life insurance industry. This research considers two use cases for predictive analytics in a life insurance company based in Ireland. The first case study relates to the use of time series models to forecast the seasonality of death claim notifications. The baseline model predicted no seasonal variation in death claim notifications over a calendar year. This reflects the life insurance company’s current approach, whereby it is assumed that claims are notified linearly over a calendar year. More accurate forecasting of death claims seasonality would enhance the life insurance company’s cashflow planning and analysis of financial results. The performance of five time series models was compared against the baseline model. The time series models included a simple historical average model, a classical SARIMA model, the Random Forest Regressor and Prophet machine learning models and the LSTM deep learning model. The models were trained on both the life insurance company’s historical death claims data and on Irish population deaths data for the 25-74 age cohort over the same observation periods. The results demonstrated that machine learning time series models were generally more effective than the baseline model in forecasting death claim seasonality. It was also demonstrated that models trained on both Irish population deaths and the life insurance company’s historical death claims could outperform the baseline model. The best forecaster was Facebook’s Prophet model, trained on the life insurance company’s claims data. Each of the models trained on Irish population deaths data outperformed the baseline model. The SARIMA and LSTM consistently underperformed the baseline model when both were trained on death claims data. All models performed better when claims directly related to Covid-19 were removed from the testing data. The second case study relates to the use of classification models to predict protection policy lapse behaviour following a policy review. The life insurance company currently has no method of predicting individual policy lapses, hence the baseline model assumed that all policies had an equal probability of lapsing. More accurate prediction of policy review lapse outcomes would enhance the life insurance company’s profit forecasting ability. It would also provide the company with the opportunity to potentially reduce lapse rates at policy review by tailoring alternative options for certain groups of policyholders. The performance of 12 classification models was assessed against the baseline model - KNN, Naïve Bayes, Support Vector Machine, Decision Tree, Random Forest, Extra Trees, XGBoost, LightGBM, AdaBoost and Multi-Layer Perceptron (MLP). To address class imbalance in the data, 11 rebalancing techniques were assessed. These included cost-sensitive algorithms (Class Weight Balancing), oversampling (Random Oversampling, ADASYN, SMOTE, Borderline SMOTE), undersampling (Random Undersampling, and Near Miss versions 1 to 3) as well as a combination of oversampling and undersampling (SMOTETomek and SMOTEENN). When combined with rebalancing methods, the predictive capacity of the classification models outperformed the baseline model in almost every case. However, results varied by train/test split and by evaluation metric. Oversampling models performed best on F1 Score and ROC-AUC while SMOTEENN and the undersampling models generated the highest levels of Recall. The top F1 Score was generated by the Naïve Bayes model when combined with SMOTE. The MLP model generated the highest ROC-AUC when combined with BorderlineSMOTE. The results of both case studies demonstrate that data analytics techniques can enhance a life insurance company’s predictive toolkit. It is recommended that further opportunities to enhance the predictive ability of the time series and classification models be explored

    Applying Machine Learning to Biological Status (QValues) from Physio-chemical Conditions of Irish Rivers

    Get PDF
    This thesis evaluates and optimises a variety of predictive models for assessing biological classification status, with an emphasis on water quality monitoring. Grounded in previous pertinent studies, it builds on the findings of (Arrighi and Castelli, 2023) concerning Tuscany’s river catchments, highlighting a solid correlation between river ecological status and parameters like summer climate and land use. They achieved an 80% prediction precision using the Random Forest algorithm, particularly adept at identifying good ecological conditions, leveraging a dataset devoid of chemical data

    Predictive Customer Lifetime value modeling: Improving customer engagement and business performance

    Get PDF
    CookUnity, a meal subscription service, has witnessed substantial annual revenue growth over the past three years. However, this growth has primarily been driven by the acquisition of new users to expand the customer base, rather than an evident increase in customers' spending levels. If it weren't for the raised subscription prices, the company's customer lifetime value (CLV) would have remained the same as it was three years ago. Consequently, the company's leadership recognizes the need to adopt a holistic approach to unlock an enhancement in CLV. The objective of this thesis is to develop a comprehensive understanding of CLV, its implications, and how companies leverage it to inform strategic decisions. Throughout the course of this study, our central focus is to deliver a fully functional and efficient machine learning solution to CookUnity. This solution will possess exceptional predictive capabilities, enabling accurate forecasting of each customer's future CLV. By equipping CookUnity with this powerful tool, our aim is to empower the company to strategically leverage CLV for sustained growth. To achieve this objective, we analyze various methodologies and approaches to CLV analysis, evaluating their applicability and effectiveness within the context of CookUnity. We thoroughly explore available data sources that can serve as predictors of CLV, ensuring the incorporation of the most relevant and meaningful variables in our model. Additionally, we assess different research methodologies to identify the top-performing approach and examine its implications for implementation at CookUnity. By implementing data-driven strategies based on our predictive CLV model, CookUnity will be able to optimize order levels and maximize the lifetime value of its customer base. The outcome of this thesis will be a robust ML solution with remarkable prediction accuracy and practical usability within the company. Furthermore, the insights gained from our research will contribute to a broader understanding of CLV in the subscription-based business context, stimulating further exploration and advancement in this field of study

    A Comprehensive Survey on Rare Event Prediction

    Full text link
    Rare event prediction involves identifying and forecasting events with a low probability using machine learning and data analysis. Due to the imbalanced data distributions, where the frequency of common events vastly outweighs that of rare events, it requires using specialized methods within each step of the machine learning pipeline, i.e., from data processing to algorithms to evaluation protocols. Predicting the occurrences of rare events is important for real-world applications, such as Industry 4.0, and is an active research area in statistical and machine learning. This paper comprehensively reviews the current approaches for rare event prediction along four dimensions: rare event data, data processing, algorithmic approaches, and evaluation approaches. Specifically, we consider 73 datasets from different modalities (i.e., numerical, image, text, and audio), four major categories of data processing, five major algorithmic groupings, and two broader evaluation approaches. This paper aims to identify gaps in the current literature and highlight the challenges of predicting rare events. It also suggests potential research directions, which can help guide practitioners and researchers.Comment: 44 page

    Revolutionizing Global Food Security: Empowering Resilience through Integrated AI Foundation Models and Data-Driven Solutions

    Full text link
    Food security, a global concern, necessitates precise and diverse data-driven solutions to address its multifaceted challenges. This paper explores the integration of AI foundation models across various food security applications, leveraging distinct data types, to overcome the limitations of current deep and machine learning methods. Specifically, we investigate their utilization in crop type mapping, cropland mapping, field delineation and crop yield prediction. By capitalizing on multispectral imagery, meteorological data, soil properties, historical records, and high-resolution satellite imagery, AI foundation models offer a versatile approach. The study demonstrates that AI foundation models enhance food security initiatives by providing accurate predictions, improving resource allocation, and supporting informed decision-making. These models serve as a transformative force in addressing global food security limitations, marking a significant leap toward a sustainable and secure food future

    Exploring Data Mining Techniques for Tree Species Classification Using Co-Registered LiDAR and Hyperspectral Data

    Full text link
    NASA Goddard’s LiDAR, Hyperspectral, and Thermal imager provides co-registered remote sensing data on experimental forests. Data mining methods were used to achieve a final tree species classification accuracy of 68% using a combined LiDAR and hyperspectral dataset, and show promise for addressing deforestation and carbon sequestration on a species-specific level

    Advancements in Multi-temporal Remote Sensing Data Analysis Techniques for Precision Agriculture

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Understanding Emojis for Financial Sentiment Analysis

    Get PDF
    Social media content has been widely used for financial forecasting and sentiment analysis. However, emojis as a new “lingua franca” on social media are often omitted during standard data pre-processing processes, we thus speculate that they may carry additional useful information. In this research, we study the effect of emojis in facilitating financial sentiment analysis and explore the most effective way to handle them during model training. Experiments are conducted on two datasets from stock and crypto markets. Various machine learning models, deep learning models, and the state-of-the-art GPT-based model are used, and we compare their performances across different emoji encodings. Results show a consistent increase in model performances when emojis are converted to their descriptive phrases, and significant enhancements after refining the descriptive terms of the most important emojis before fitting them into the models. Our research shows that emojis are a valuable source for better understanding financial social media texts that cannot be omitted
    • …
    corecore