12,474 research outputs found

    Robust Recommender System: A Survey and Future Directions

    Full text link
    With the rapid growth of information, recommender systems have become integral for providing personalized suggestions and overcoming information overload. However, their practical deployment often encounters "dirty" data, where noise or malicious information can lead to abnormal recommendations. Research on improving recommender systems' robustness against such dirty data has thus gained significant attention. This survey provides a comprehensive review of recent work on recommender systems' robustness. We first present a taxonomy to organize current techniques for withstanding malicious attacks and natural noise. We then explore state-of-the-art methods in each category, including fraudster detection, adversarial training, certifiable robust training against malicious attacks, and regularization, purification, self-supervised learning against natural noise. Additionally, we summarize evaluation metrics and common datasets used to assess robustness. We discuss robustness across varying recommendation scenarios and its interplay with other properties like accuracy, interpretability, privacy, and fairness. Finally, we delve into open issues and future research directions in this emerging field. Our goal is to equip readers with a holistic understanding of robust recommender systems and spotlight pathways for future research and development

    Recipe popularity prediction in Finnish social media by machine learning models

    Get PDF
    Abstract. In recent times, the internet has emerged as a primary source of cooking inspiration, eating experiences and food social gathering with a majority of individuals turning to online recipes, surpassing the usage of traditional cookbooks. However, there is a growing concern about the healthiness of online recipes. This thesis focuses on unraveling the determinants of online recipe popularity by analyzing a dataset comprising more than 5000 recipes from Valio, one of Finland’s leading corporations. Valio’s website serves as a representation of diverse cooking preferences among users in Finland. Through examination of recipe attributes such as nutritional content (energy, fat, salt, etc.), food preparation complexity (cooking time, number of steps, required ingredients, etc.), and user engagement (the number of comments, ratings, sentiment of comments, etc.), we aim to pinpoint the critical elements influencing the popularity of online recipes. Our predictive model-Logistic Regression (classification accuracy and F1 score are 0.93 and 0.9 respectively)- substantiates the existence of pertinent recipe characteristics that significantly influence their rates. The dataset we employ is notably influenced by user engagement features, particularly the number of received ratings and comments. In other words, recipes that garner more attention in terms of comments and ratings tend to have higher rates values (i.e., more popular). Additionally, our findings reveal that a substantial portion of Valio’s recipes falls within the medium health Food Standards Agency (FSA) score range, and intriguingly, recipes deemed less healthy tend to receive higher average ratings from users. This study advances our comprehension of the factors contributing to the popularity of online recipes, providing valuable insights into contemporary cooking preferences in Finland as well as guiding future dietary policy shift.Reseptin suosion ennustaminen suomalaisessa sosiaalisessa mediassa koneoppimismalleilla. Tiivistelmä. Internet on viime aikoina noussut ensisijaiseksi inspiraation lähteeksi ruoanlaitossa, ja suurin osa ihmisistä on siirtynyt käyttämään verkkoreseptejä perinteisten keittokirjojen sijaan. Huoli verkkoreseptien terveellisyydestä on kuitenkin kasvava. Tämä opinnäytetyö keskittyy verkkoreseptien suosioon vaikuttavien tekijöiden selvittämiseen analysoimalla yli 5000 reseptistä koostuvaa aineistoa Suomen johtavalta maitotuoteyritykseltä, Valiolta. Valion verkkosivujen reseptit edustavat monipuolisesti suomalaisten käyttäjien ruoanlaittotottumuksia. Tarkastelemalla reseptin ominaisuuksia, kuten ravintoarvoa (energia, rasva, suola, jne.), valmistuksen monimutkaisuutta (keittoaika, vaiheiden määrä, tarvittavat ainesosat, jne.) ja käyttäjien sitoutumista (kommenttien määrä, arviot, kommenttien mieliala, jne.), pyrimme paikantamaan kriittiset tekijät, jotka vaikuttavat verkkoreseptien suosioon. Ennustava mallimme — Logistic Regression (luokituksen tarkkuus 0,93 ja F1-pisteet 0,9 ) — osoitti merkitsevien reseptiominaisuuksien olemassaolon. Ne vaikuttivat merkittävästi reseptien suosioon. Käyttämiimme tietojoukkoihin vaikuttivat merkittävästi käyttäjien sitoutumisominaisuudet, erityisesti vastaanotettujen arvioiden ja kommenttien määrä. Toisin sanoen reseptit, jotka saivat enemmän huomiota kommenteissa ja arvioissa, olivat yleensä suositumpia. Lisäksi selvisi, että huomattava osa Valion resepteistä kuuluu keskitason terveyspisteiden alueelle (arvioituna FSA Scorella), ja mielenkiintoisesti, vähemmän terveellisiksi katsotut reseptit saavat käyttäjiltä yleensä korkeamman keskiarvon. Tämä tutkimus edistää ymmärrystämme verkkoreseptien suosioon vaikuttavista tekijöistä ja tarjoaa arvokasta näkemystä nykypäivän ruoanlaittotottumuksista Suomessa

    Using Time Clusters for Following Users’ Shifts in Rating Practices

    Full text link

    Designing a Direct Feedback Loop between Humans and Convolutional Neural Networks through Local Explanations

    Full text link
    The local explanation provides heatmaps on images to explain how Convolutional Neural Networks (CNNs) derive their output. Due to its visual straightforwardness, the method has been one of the most popular explainable AI (XAI) methods for diagnosing CNNs. Through our formative study (S1), however, we captured ML engineers' ambivalent perspective about the local explanation as a valuable and indispensable envision in building CNNs versus the process that exhausts them due to the heuristic nature of detecting vulnerability. Moreover, steering the CNNs based on the vulnerability learned from the diagnosis seemed highly challenging. To mitigate the gap, we designed DeepFuse, the first interactive design that realizes the direct feedback loop between a user and CNNs in diagnosing and revising CNN's vulnerability using local explanations. DeepFuse helps CNN engineers to systemically search "unreasonable" local explanations and annotate the new boundaries for those identified as unreasonable in a labor-efficient manner. Next, it steers the model based on the given annotation such that the model doesn't introduce similar mistakes. We conducted a two-day study (S2) with 12 experienced CNN engineers. Using DeepFuse, participants made a more accurate and "reasonable" model than the current state-of-the-art. Also, participants found the way DeepFuse guides case-based reasoning can practically improve their current practice. We provide implications for design that explain how future HCI-driven design can move our practice forward to make XAI-driven insights more actionable.Comment: 32 pages, 6 figures, 5 tables. Accepted for publication in the Proceedings of the ACM on Human-Computer Interaction (PACM HCI), CSCW 202

    Applications of Artificial Intelligence in Healthcare

    Get PDF
    Now in these days, artificial intelligence (AI) is playing a major role in healthcare. It has many applications in diagnosis, robotic surgeries, and research, powered by the growing availability of healthcare facts and brisk improvement of analytical techniques. AI is launched in such a way that it has similar knowledge as a human but is more efficient. A robot has the same expertise as a surgeon; even if it takes a longer time for surgery, its sutures, precision, and uniformity are far better than the surgeon, leading to fewer chances of failure. To make all these things possible, AI needs some sets of algorithms. In Artificial Intelligence, there are two key categories: machine learning (ML) and natural language processing (NPL), both of which are necessary to achieve practically any aim in healthcare. The goal of this study is to keep track of current advancements in science, understand technological availability, recognize the enormous power of AI in healthcare, and encourage scientists to use AI in their related fields of research. Discoveries and advancements will continue to push the AI frontier and expand the scope of its applications, with rapid developments expected in the future

    An assessment of the effectiveness of using data analytics to predict death claim seasonality and protection policy review lapses in a life insurance company

    Get PDF
    Data analytics tools are becoming increasingly common in the life insurance industry. This research considers two use cases for predictive analytics in a life insurance company based in Ireland. The first case study relates to the use of time series models to forecast the seasonality of death claim notifications. The baseline model predicted no seasonal variation in death claim notifications over a calendar year. This reflects the life insurance company’s current approach, whereby it is assumed that claims are notified linearly over a calendar year. More accurate forecasting of death claims seasonality would enhance the life insurance company’s cashflow planning and analysis of financial results. The performance of five time series models was compared against the baseline model. The time series models included a simple historical average model, a classical SARIMA model, the Random Forest Regressor and Prophet machine learning models and the LSTM deep learning model. The models were trained on both the life insurance company’s historical death claims data and on Irish population deaths data for the 25-74 age cohort over the same observation periods. The results demonstrated that machine learning time series models were generally more effective than the baseline model in forecasting death claim seasonality. It was also demonstrated that models trained on both Irish population deaths and the life insurance company’s historical death claims could outperform the baseline model. The best forecaster was Facebook’s Prophet model, trained on the life insurance company’s claims data. Each of the models trained on Irish population deaths data outperformed the baseline model. The SARIMA and LSTM consistently underperformed the baseline model when both were trained on death claims data. All models performed better when claims directly related to Covid-19 were removed from the testing data. The second case study relates to the use of classification models to predict protection policy lapse behaviour following a policy review. The life insurance company currently has no method of predicting individual policy lapses, hence the baseline model assumed that all policies had an equal probability of lapsing. More accurate prediction of policy review lapse outcomes would enhance the life insurance company’s profit forecasting ability. It would also provide the company with the opportunity to potentially reduce lapse rates at policy review by tailoring alternative options for certain groups of policyholders. The performance of 12 classification models was assessed against the baseline model - KNN, Naïve Bayes, Support Vector Machine, Decision Tree, Random Forest, Extra Trees, XGBoost, LightGBM, AdaBoost and Multi-Layer Perceptron (MLP). To address class imbalance in the data, 11 rebalancing techniques were assessed. These included cost-sensitive algorithms (Class Weight Balancing), oversampling (Random Oversampling, ADASYN, SMOTE, Borderline SMOTE), undersampling (Random Undersampling, and Near Miss versions 1 to 3) as well as a combination of oversampling and undersampling (SMOTETomek and SMOTEENN). When combined with rebalancing methods, the predictive capacity of the classification models outperformed the baseline model in almost every case. However, results varied by train/test split and by evaluation metric. Oversampling models performed best on F1 Score and ROC-AUC while SMOTEENN and the undersampling models generated the highest levels of Recall. The top F1 Score was generated by the Naïve Bayes model when combined with SMOTE. The MLP model generated the highest ROC-AUC when combined with BorderlineSMOTE. The results of both case studies demonstrate that data analytics techniques can enhance a life insurance company’s predictive toolkit. It is recommended that further opportunities to enhance the predictive ability of the time series and classification models be explored

    Applying Machine Learning to Biological Status (QValues) from Physio-chemical Conditions of Irish Rivers

    Get PDF
    This thesis evaluates and optimises a variety of predictive models for assessing biological classification status, with an emphasis on water quality monitoring. Grounded in previous pertinent studies, it builds on the findings of (Arrighi and Castelli, 2023) concerning Tuscany’s river catchments, highlighting a solid correlation between river ecological status and parameters like summer climate and land use. They achieved an 80% prediction precision using the Random Forest algorithm, particularly adept at identifying good ecological conditions, leveraging a dataset devoid of chemical data
    • …
    corecore