4 research outputs found

    Popularity prediction of instagram posts

    Get PDF
    Predicting the popularity of posts on social networks has taken on significant importance in recent years, and several social media management tools now offer solutions to improve and optimize the quality of published content and to enhance the attractiveness of companies and organizations. Scientific research has recently moved in this direction, with the aim of exploiting advanced techniques such as machine learning, deep learning, natural language processing, etc., to support such tools. In light of the above, in this work we aim to address the challenge of predicting the popularity of a future post on Instagram, by defining the problem as a classification task and by proposing an original approach based on Gradient Boosting and feature engineering, which led us to promising experimental results. The proposed approach exploits big data technologies for scalability and efficiency, and it is general enough to be applied to other social media as well

    A local feature engineering strategy to improve network anomaly detection

    Get PDF
    The dramatic increase in devices and services that has characterized modern societies in recent decades, boosted by the exponential growth of ever faster network connections and the predominant use of wireless connection technologies, has materialized a very crucial challenge in terms of security. The anomaly-based intrusion detection systems, which for a long time have represented some of the most efficient solutions to detect intrusion attempts on a network, have to face this new and more complicated scenario. Well-known problems, such as the difficulty of distinguishing legitimate activities from illegitimate ones due to their similar characteristics and their high degree of heterogeneity, today have become even more complex, considering the increase in the network activity. After providing an extensive overview of the scenario under consideration, this work proposes a Local Feature Engineering (LFE) strategy aimed to face such problems through the adoption of a data preprocessing strategy that reduces the number of possible network event patterns, increasing at the same time their characterization. Unlike the canonical feature engineering approaches, which take into account the entire dataset, it operates locally in the feature space of each single event. The experiments conducted on real-world data showed that this strategy, which is based on the introduction of new features and the discretization of their values, improves the performance of the canonical state-of-the-art solutions

    Desenvolvimento de frameworks para a modelagem do risco de crédito por meio de algoritmos de classificação

    Get PDF
    Granting credit is a vital activity in the financial industry. For the success of financial institutions, as well as the equilibrium of the credit system as a whole, it is important that credit risk management systems efficiently evaluate the probability of default of potential debtors based on their historical data. Classification algorithms are an interesting approach to this problem in the form of Credit Scoring models. Since the emergence of quantitative analytical methods with this purpose, statistical models persist as the most commonly chosen method, given their easier implementation and inherent interpretability. However, advances in Machine Learning have developed new and more complex algorithms capable of handling a bigger amount of data, often with an increase in predictive power. These new approaches, although not always readily transferable to practical applications in the financial industry, present an opportunity for the development of credit risk modeling and have piqued the interest of researchers in the field. Nonetheless, researchers seem to focus on model performance, not appropriately setting up guidelines to optimize the modeling process or considering the present regulation for model implementation. Thereby, this dissertation establishes frameworks for consumer credit risk modeling based on classification algorithms while guided by a systematic literature review on the topic. The proposed frameworks incorporate ML techniques, data preprocessing and balancing, feature selection (FS), and hyperparameter optimization (HPO). In addition to the bibliographic research, which introduces us to the main classification algorithms and appropriate modeling steps, the development of the frameworks is also based on experiments with hundreds of models for credit risk classification, using Logistic Regression (LR), Decision Trees (DT), Support Vector Machines (SVM), Random Forest (RF), as well as boosting and stacking ensembles, to efficiently guide the construction of robust and parsimonious models for credit risk analysis in consumer lending.Agência 1A concessão de crédito é uma atividade vital da indústria financeira. Para o funcionamento e sucesso das instituições financeiras, assim como a manutenção do equilíbrio do sistema creditício, a modelagem de risco de crédito tem o papel de avaliar a probabilidade de inadimplência de potenciais devedores com base em dados históricos. Algoritmos de classificação apresentam uma abordagem interessante para esta finalidade na elaboração de modelos para Credit Scoring. Desde o surgimento das metodologias analíticas e quantitativas para esta modelagem, persistem na indústria modelos estatísticos, dotados de maior interpretabilidade e fácil implementação. Contudo, com o desenvolvimento na área de Machine Learning (ML), surgiram novos algoritmos capazes de trabalhar com um maior volume de dados e com melhor performance preditiva. Estes algoritmos, apesar de nem sempre prontamente transferíveis da academia para a indústria, apresentam uma oportunidade para o desenvolvimento da modelagem do risco de crédito, tendo consequentemente despertado um interesse de pesquisadores na área. A literatura, por sua vez, se enfoca na performance dos modelos, dificilmente estabelecendo diretrizes para a otimização do processo de modelagem ou se atentando às regulamentações vigentes para a sua aplicação prática na indústria financeira. Desta forma, esta dissertação, embasada por uma revisão sistemática de literatura, propõe frameworks para a modelagem do risco de crédito incorporando o uso de técnicas de ML, pré-processamento e balanceamento de dados, feature selection (FS) e otimização de hiper-parâmetros (OHP). Além da pesquisa bibliográfica, que possibilita uma familiarização com os principais algoritmos de classificação e as etapas de modelagem apropriadas, o desenvolvimento dos frameworks também é fundamentado pela elaboraçao de centenas de modelos para classificação do risco de crédito, partindo dos algoritmos de Regressão Logística (Logistic Regression - LR), Árvores de Decisão (Decision Trees - DT), Support Vector Machines (SVM), Random Forest (RF), assim como ensembles de boosting e stacking, para direcionar de maneira eficiente a construção de modelos robustos e parcimoniosos para a análise do risco na concessão de crédito ao consumidor

    A wavelet-based data analysis to credit scoring

    No full text
    Nowadays, the dramatic growth in consumer credit has made ineffective the methods based on the human intervention, aimed to assess the potential solvency of loan applicants. For this reason, the development of approaches able to automate this operation represents today an active and important research area named Credit Scoring. In such scenario it should be noted how the design of effective approaches represents an hard challenge, due to a series of well-known problems, such as, for instance, the data imbalance, the data heterogeneity, and the cold start. The Centroid wavelet-based approach proposed in this paper faces these issues by moving the data analysis from its canonical domain to a new time-frequency one, where this operation is performed through three different metrics of similarity. Its main objective is to achieve a better characterization of the loan applicants on the basis of the information previously gathered by the Credit Scoring system. The performed experiments demonstrate how such approach outperforms the state-of-the-art solutions
    corecore