7 research outputs found

    Selective oversampling approach for strongly imbalanced data

    Get PDF
    Challenges posed by imbalanced data are encountered in many real-world applications. One of the possible approaches to improve the classifier performance on imbalanced data is oversampling. In this paper, we propose the new selective oversampling approach (SOA) that first isolates the most representative samples from minority classes by using an outlier detection technique and then utilizes these samples for synthetic oversampling. We show that the proposed approach improves the performance of two state-of-the-art oversampling methods, namely, the synthetic minority oversampling technique and adaptive synthetic sampling. The prediction performance is evaluated on four synthetic datasets and four real-world datasets, and the proposed SOA methods always achieved the same or better performance than other considered existing oversampling methods

    Predikcia úpadku spoločností s ručením obmedzeným využitím metód pre rozpoznanie odľahlých bodov

    Get PDF
    Spoločnosti pôsobiace v rámci obchodného a priemyselného odvet-via sa môžu vplyvom nepriaznivej finančnej situácie, alebo nevhodného obcho-dovania, dostať do finančných ťažkostí, ktoré neskôr vyústia do celkového úpadku spoločnosti. Analyzovali sme dáta obsahujúce tisíce záznamov spoloč-ností s ručením obmedzeným (s.r.o) pôsobiacich na Slovensku v rôznych od-vetviach hospodárstva v období rokov 2013-2016. K nastolenému problému sme pristupovali ako k problému rozpoznania odľahlých hodnôt (outliers), pri-čom bola použitá metóda podporných vektorov pre detekciu odľahlých bodov (OneClassSVM). Dáta pozostávali z 20 štandardných ekonomických ukazova-teľov. V prvotnej analýze sme sa zamerali na predikciu úpadku s.r.o. na základe účtovných údajov z jedného roku a kombináciou dvoch po sebe idúcich rokov. Dosiahnutá presnosť predikcie bola od 60,56% do 77,91 % v závislosti od roku v ktorom sme uvažovali výsledný stav spoločnosti a roku z ktorého boli čerpané ekonomické ukazovatele

    Selective oversampling approach for strongly imbalanced data

    No full text
    Challenges posed by imbalanced data are encountered in many real-world applications. One of the possible approaches to improve the classifier performance on imbalanced data is oversampling. In this paper, we propose the new selective oversampling approach (SOA) that first isolates the most representative samples from minority classes by using an outlier detection technique and then utilizes these samples for synthetic oversampling. We show that the proposed approach improves the performance of two state-of-the-art oversampling methods, namely, the synthetic minority oversampling technique and adaptive synthetic sampling. The prediction performance is evaluated on four synthetic datasets and four real-world datasets, and the proposed SOA methods always achieved the same or better performance than other considered existing oversampling methods.</jats:p

    Bankruptcy prediction using ensemble of autoencoders optimized by genetic algorithm

    No full text
    The prediction of imminent bankruptcy for a company is important to banks, government agencies, business owners, and different business stakeholders. Bankruptcy is influenced by many global and local aspects, so it can hardly be anticipated without deeper analysis and economic modeling knowledge. To make this problem even more challenging, the available bankruptcy datasets are usually imbalanced since even in times of financial crisis, bankrupt companies constitute only a fraction of all operating businesses. In this article, we propose a novel bankruptcy prediction approach based on a shallow autoencoder ensemble that is optimized by a genetic algorithm. The goal of the autoencoders is to learn the distribution of the majority class: going concern businesses. Then, the bankrupt companies are represented by higher autoencoder reconstruction errors. The choice of the optimal threshold value for the reconstruction error, which is used to differentiate between bankrupt and nonbankrupt companies, is crucial and determines the final classification decision. In our approach, the threshold for each autoencoder is determined by a genetic algorithm. We evaluate the proposed method on four different datasets containing small and medium-sized enterprises. The results show that the autoencoder ensemble is able to identify bankrupt companies with geometric mean scores ranging from 71% to 93.7%, (depending on the industry and evaluation year)

    Metatranscriptome Analysis of Nasopharyngeal Swabs across the Varying Severity of COVID-19 Disease Demonstrated Unprecedented Species Diversity

    No full text
    The recent global emergence of the SARS-CoV-2 pandemic has accelerated research in several areas of science whose valuable outputs and findings can help to address future health challenges in the event of emerging infectious agents. We conducted a comprehensive shotgun analysis targeting multiple aspects to compare differences in bacterial spectrum and viral presence through culture-independent RNA sequencing. We conducted a comparative analysis of the microbiome between healthy individuals and those with varying degrees of COVID-19 severity, including a total of 151 participants. Our findings revealed a noteworthy increase in microbial species diversity among patients with COVID-19, irrespective of disease severity. Specifically, our analysis revealed a significant difference in the abundance of bacterial phyla between healthy individuals and those infected with COVID-19. We found that Actinobacteria, among other bacterial phyla, showed a notably higher abundance in healthy individuals compared to infected individuals. Conversely, Bacteroides showed a lower abundance in the latter group. Infected people, regardless of severity and symptoms, have the same proportional representation of Firmicutes, Proteobacteria, Actinobacteria, Bacteroidetes, and Fusobacteriales. In addition to SARS-CoV-2 and numerous phage groups, we identified sequences of clinically significant viruses such as Human Herpes Virus 1, Human Mastadenovirus D, and Rhinovirus A in several samples. Analyses were performed retrospectively, therefore, in the case of SARS-CoV-2 various WHO variants such as Alpha (B.1.1.7), Delta (B.1.617.2), Omicron (B.1.1.529), and 20C strains are represented. Additionally, the presence of specific virus strains has a certain effect on the distribution of individual microbial taxa
    corecore