2 research outputs found

    Do Fraudulent Companies Employ Different Linguistic Features in Their Annual Reports? An Empirical Study Using Logistic Regression and Random Forest Methodologies

    Get PDF
    The use of textual analysis to uncover fraudulent actions in 10-K filings is widespread. The previous studies have looked at the Management Disclosure and Analysis (MD&A) section of annual reports to predict illicit behaviour by analysing the tone of executives, with the majority of those studies dating back 10 years or more. The primary goal of this research is to find patterns in linguistic features of entire annual reports of convicted public businesses, which were found using the Corporate Prosecution Registry database, and compare them to non-fraudulent equivalents in the same industry. The algorithms of logistic regression and random forest are implemented to discover important factors and make accurate predictions. The accuracy rate, ROC-AUC value, and 10-fold cross-validation tools are performed to validate the success of each method. The results of the logistic regression revealed that corrupt organisations utilise a more negative, uncertain, and litigious tone. Furthermore, these businesses employ more words with a high lexical diversity and minimal complexity. Based on the Random Forest machine learning technique, the litigious variable is the most important variable in the prediction of untruthful corporations. Moreover, each of the validation methods demonstrates that the Random Forest methodology outperforms logistic regression.nhhma

    Usando análises sociais na identificação de nós relevantes em um cenário multirredes: Operação Licitante Fantasma, um estudo de caso / Using social analysis to identify relevant nodes in a multi-network scenario: The Ghost Bidder Operation, a case study

    Get PDF
    Este artigo propõe o modelo NDNS (Nodes Detection using Network Science) que, usando redes complexas, busca encontrar os nós mais relevantes, em um cenário multi-redes, de forma mais eficiente do que medidas de centralidade estabelecidas. O artigo utiliza, como estudo de caso, uma investigação de corrupção em licitações públicas no Brasil – Operação de Licitante Fantasma. Considerando um período de quatro anos de investigações, o NDNS, quando comparado a quatro medidas de centralidade (betweenness, eigenvector, weighted degree, page rank e sua média geométrica normalizada), alcançou uma precisão de 93% e uma revocação de 94% na detecção de valores fraudulentos contra 38% e 51%, respectivamente, das segundas medidas mais bem posicionadas
    corecore