6 research outputs found

    Sparse data embedding and prediction by tropical matrix factorization

    Get PDF
    Background Matrix factorization methods are linear models, with limited capability to model complex relations. In our work, we use tropical semiring to introduce non-linearity into matrix factorization models. We propose a method called Sparse Tropical Matrix Factorization (STMF) for the estimation of missing (unknown) values in sparse data. Results We evaluate the efficiency of the STMF method on both synthetic data and biological data in the form of gene expression measurements downloaded from The Cancer Genome Atlas (TCGA) database. Tests on unique synthetic data showed that STMF approximation achieves a higher correlation than non-negative matrix factorization (NMF), which is unable to recover patterns effectively. On real data, STMF outperforms NMF on six out of nine gene expression datasets. While NMF assumes normal distribution and tends toward the mean value, STMF can better fit to extreme values and distributions. Conclusion STMF is the first work that uses tropical semiring on sparse data. We show that in certain cases semirings are useful because they consider the structure, which is different and simpler to understand than it is with standard linear algebra.This work is supported by the Slovene Research Agency, Young Researcher Grant (52096) awarded to AO, and research core funding (P1-0222 to PO and P2-0209 to TC)

    Improving power theft detection using efficient clustering and ensemble classification

    Get PDF
    One of the main concerns of power generation systems around the world is power theft. This research proposes a framework that merges clustering and classification together in order to power theft detection. Due to the fact that most datasets do not have abnormal samples or are few, we have added abnormal samples to the original datasets using artificial attacks to create balance in the datasets and increase the correct detection rate. We improved the crow search algorithm (CSA) and used the weight feature of Crows to improve performance of clustering phase. Also, to create balance between diversification and intensification, we calculated the awareness probability parameter (AP) dynamically at iterations of the algorithm. To evaluate the performance, we used the cross validation technique have used the stacking technique in its training phase. The results of extensive experiments on three reference datasets showed high performance to detect power theft. The evaluation results showed that if the data is collected correctly and sufficiently, this framework can effectively detect power theft in any actual power grid. Also, for new attacks, if their patterns can be detected from the data, it is easily possible to implement these types of attacks
    corecore