    Unsupervised Feature Selection Algorithm via Local Structure Learning and Kernel Function

    In order to reduce dimensionality of high-dimensional data, a series of feature selection algorithms have been proposed. But these algorithms have the following disadvantages: (1) they do not fully consider the nonlinear relationship between data features (2) they do not consider the similarity between data features. To solve the above two problems, we propose an unsupervised feature selection algorithm based on local structure learning and kernel function. First, through the kernel function, we map each feature of the data to the kernel space, so that the nonlinear relationship of the data features can be fully exploited. Secondly, we apply the theory of local structure learning to the features of data, so that the similarity of data features is considered. Then we added a low rank constraint to consider the global information of the data. Finally, we add sparse learning to make feature selection. The experimental results show that the proposed algorithm has better results than the comparison methods

    MIAEC: Missing data imputation based on the evidence Chain

    © 2013 IEEE. Missing or incorrect data caused by improper operations can seriously compromise security investigation. Missing data can not only damage the integrity of the information but also lead to the deviation of the data mining and analysis. Therefore, it is necessary to implement the imputation of missing value in the phase of data preprocessing to reduce the possibility of data missing as a result of human error and operations. The performances of existing imputation approaches of missing value cannot satisfy the analysis requirements due to its low accuracy and poor stability, especially the rapid decreasing imputation accuracy with the increasing rate of missing data. In this paper, we propose a novel missing value imputation algorithm based on the evidence chain (MIAEC), which first mines all relevant evidence of missing values in each data tuple and then combines this relevant evidence to build the evidence chain for further estimation of missing values. To extend MIAEC for large-scale data processing, we apply the map-reduce programming model to realize the distribution and parallelization of MIAEC. Experimental results show that the proposed approach can provide higher imputation accuracy compared with the missing data imputation algorithm based on naive Bayes, the mode imputation algorithm, and the proposed missing data imputation algorithm based on K-nearest neighbor. MIAEC has higher imputation accuracy and its imputation accuracy is also assured with the increasing rate of missing value or the position change of missing value. MIAEC is also proved to be suitable for the distributed computing platform and can achieve an ideal speedup ratio


    Missing values is a serious problem that most often found in real data today. The C4.5 method is a popular classification predictive modeling used because of its ease of implementation. However, C4.5 is still weak when testing data that contains large missing. In this study we used a hybrid approach the bootstrap method and k-NN imputation to overcome missing values. The proposed method tested using Chronic Kidney Disease (CKD) data, and evaluated using accuracy and AUC. The results showed that the proposed method was superior in overcoming missing values in CKD. It can be concluded that the proposed method is able to overcome missing values for chronic kidney disease prediction

    30th Anniversary of Applied Intelligence: A combination of bibliometrics and thematic analysis using SciMAT

    Applied Intelligence is one of the most important international scientific journals in the field of artificial intelligence. From 1991, Applied Intelligence has been oriented to support research advances in new and innovative intelligent systems, methodologies, and their applications in solving real-life complex problems. In this way, Applied Intelligence hosts more than 2,400 publications and achieves around 31,800 citations. Moreover, Applied Intelligence is recognized by the industrial, academic, and scientific communities as a source of the latest innovative and advanced solutions in intelligent manufacturing, privacy-preserving systems, risk analysis, knowledge-based management, modern techniques to improve healthcare systems, methods to assist government, and solving industrial problems that are too complex to be solved through conventional approaches. Bearing in mind that Applied Intelligence celebrates its 30th anniversary in 2021, it is appropriate to analyze its bibliometric performance, conceptual structure, and thematic evolution. To do that, this paper conducts a bibliometric performance and conceptual structure analysis of Applied Intelligence from 1991 to 2020 using SciMAT. Firstly, the performance of the journal is analyzed according to the data retrieved from Scopus, putting the focus on the productivity of the authors, citations, countries, organizations, funding agencies, and most relevant publications. Finally, the conceptual structure of the journal is analyzed with the bibliometric software tool SciMAT, identifying the main thematic areas that have been the object of research and their composition, relationship, and evolution during the period analyzed

    Sparse Nonlinear Feature Selection Algorithm via Local Structure Learning

    In this paper, we propose a new unsupervised feature selection algorithm by considering the nonlinear and similarity relationships within the data. To achieve this, we apply the kernel method and local structure learning to consider the nonlinear relationship between features and the local similarity between features. Specifically, we use a kernel function to map each feature of the data into the kernel space. In the high-dimensional kernel space, different features correspond to different weights, and zero weights are unimportant features (e.g. redundant features). Furthermore, we consider the similarity between features through local structure learning, and propose an effective optimization method to solve it. The experimental results show that the proposed algorithm achieves better performance than the comparison algorithm

    Analiza sadržaja revizorskih izveštaja javnih društava: Blagovremenost i vrsta mišljenja

    Zainteresovanost istraživača za oblast revizije finansijskih izveštaja proizilazi iz brojnih specifičnosti sa kojim serevizori susreću u tom procesu, a koji mogu biti predmet istraživanja, kao i značajnosti koju rezultati analizesadržaja revizorskih izveštaja mogu imati za privredu posmatranu u celini. Samim tim, cilj ove disertacije jedvojak, prvi je analiza sadržaja revizorskih izveštaja javnih društava sa aspekta varijabli koje utiču na kašnjenjeu dostavljanju izveštaja, a drugi je analiza vrste mišljenja i potencijalnih varijabli koje su u korelaciji sa vrstommišljenja. Korisnost istraživanja se ogleda u činjenici da bi rezultati trebalo da budu od pomoći upravi javnihdruštava, kao klijenata revizije, u sagledavanju faktora koji su povezani sa izdatom vrstom mišljenja ili savremenskim periodom u okviru kojeg mogu očekivati revizorski izveštaj. Sa druge strane, rezultati istraživanjamogu biti od koristi i potencijalnim i postojećim investitorima u analizi nivoa rizika ulaganja, a vezano zapouzdanost informacija prikazanih u finansijskim izveštajima i značaja blagovremenog finansijskog izveštavanja.Problem istraživanja ogleda se u determinisanju ključnih faktora i prirode uticaja koji oni imaju na periodkašnjenja i vrstu mišljenja revizorskih izveštaja javnih društava iz Republike Srbije. Postavlja se pitanje kojifaktori imaju značajan uticaj na period kašnjenja revizorskih izveštaja i izdatu vrstu mišljenja javnih društava uRepublici Srbiji. Takođe, imajući u vidu da je blagovremenost dostavljanja revizorskih izveštaja zakonskiregulisan aspekt revizorskog izveštaja, postavlja se pitanje da li javna društva u Republici Srbiji dobijajurevizorske izveštaje sa datumom koji je u okviru zakonske regulative i da li dužina perioda dobijanja revizorskihizveštaja odstupa u značajnoj meri od perioda u razvijenim ekonomijama. Na osnovu definisanog cilja i hipotezaistraživanja koncipirano je i sprovedeno empirijsko istraživanje. Uzorak istraživanja se sastojao od 241 javnogdruštva posmatranih po periodima istraživanja (2016-2019), što je ukupno činilo 964 jedinice posmatranja.Vrednosti varijabli za posmatrane periode izračunate su na osnovu podataka preuzetih iz javno obelodanjenihfinansijskih i revizorskih izveštaja uzorkovanih javnih društava. Rezultati istraživanja pokazuju da od ukupno 29varijabli odabranih za ispitivanje, polovina se može dovesti u vezu sa periodom kašnjenja revizorskog mišljenjaili vrstom revizorskog mišljenja javnih društava iz Republike Srbije. U disertaciji su prikazani rezultati opsežnestatističke analize koji otkrivaju smer i jačinu povezanosti između posmatranih varijabli. Rezultati sprovedenogistraživanja pokazuju da su skoro sva uzorkovana društva dobila revizorski izveštaj u zakonski predviđenom roku.Takođe, otprilike polovina izdatih mišljenja je nemodifikovana, dok mišljenje sa rezervom preovlađuje kaomodifikovano mišljenje. Negativno mišljenje je zastupljeno sa oko svega 2% u proseku u posmatranom periodu.Prosečno javno društvo koje dobija nemodifikovano mišljenje je ono koje je profitabilno, likvidno i sa nižomstopom zaduženosti. Ujedno, ovakvo javno društvo može očekivati dobijanje revizorskog izveštaja u kraćemvremenskom periodu u odnosu na druge klijente revizije. Pored toga, rezultati ukazuju da internacionalnarevizorska društva, uključujući Veliku četvorku, verovatno usled činjenice da ih biraju klijenti sa višim nivoomkvaliteta finansijskog izveštavanja, češće izdaju nemodifikovana mišljenja, dok takav uticaj ne postoji kada je upitanju period kašnjenja revizorskog izveštaja. Zanimljiv rezultat istraživanja se odnosi na činjenicu da su revizorimuškog i ženskog pola podjednako zastupljeni, kao i da njihov izbor nema presudni uticaj kako na vrstu mišljenja,tako i na period kašnjenja. Posmatrano sa ova dva aspekta rezultati dokazuju da rotacija revizora (društva) kaoinstrument održavanja profesionalnog skepticizma uspešno realizuju tu ulogu. Naposletku, može se primetiti daizdavanje modifikovanog mišljenja zahteva više resursa u vidu broja dana koje je neophodno utrošiti zaprikupljanje revizorskih dokaza i izražavanje mišljenja. Na osnovu prethodnog, preporuka regulatornim telimabila bi usmerena ka pooštravanju uslova vezano za kotiranje javnih društava u zavisnosti od vrste dobijenogrevizorskog mišljenja. Na taj način bi potencijalni investitori bili u prilici da dobiju ažurnije računovodstveneinformacije koje bi bile kvalitetan input u procesu donošenja investicionih odluka

    Semi-parametric optimization for missing data imputation

    Missing data imputation is an important issue in machine learning and data mining. In this paper, we propose a new and efficient imputation method for a kind of missing data: semi-parametric data. Our imputation method aims at making an optimal evaluation about Root Mean Square Error (RMSE), distribution function and quantile after missing-data are imputed. We evaluate our approaches using both simulated data and real data experimentally, and demonstrate that our stochastic semi-parametric regression imputation is much better than existing deterministic semi-parametric regression imputation in efficiency and effectiveness. © Springer Science+Business Media, LLC 2007