11 research outputs found
Deep heterogeneous ensemble.
In recent years, deep neural networks (DNNs) have emerged as a powerful technique in many areas of machine learning. Although DNNs have achieved great breakthrough in processing images, video, audio and text, it also has some limitations such as needing a large number of labeled data for training and having a large number of parameters. Ensemble learning, meanwhile, provides a learning model by combining many different classifiers such that an ensemble of classifiers is better than using single classifier. In this study, we propose a deep ensemble framework called Deep Heterogeneous Ensemble (DHE) for supervised learning tasks. In each layer of our algorithm, the input data is passed through a feature selection method to remove irrelevant features and prevent overfitting. The cross-validation with K learning algorithms is applied to the selected data, in order to obtain the meta-data and the K base classifiers for the next layer. In this way, one layer will output the meta-data as the input data for the next layer, the base classifiers, and the indices of the selected meta-data. A combining algorithm is then applied on the meta-data of the last layer to obtain the final class prediction. Experiments on 30 datasets confirm that the proposed DHE is better than a number of well-known benchmark algorithms
Stacking-Based Deep Neural Network: Deep Analytic Network for Pattern Classification
Stacking-based deep neural network (S-DNN) is aggregated with pluralities of
basic learning modules, one after another, to synthesize a deep neural network
(DNN) alternative for pattern classification. Contrary to the DNNs trained end
to end by backpropagation (BP), each S-DNN layer, i.e., a self-learnable
module, is to be trained decisively and independently without BP intervention.
In this paper, a ridge regression-based S-DNN, dubbed deep analytic network
(DAN), along with its kernelization (K-DAN), are devised for multilayer feature
re-learning from the pre-extracted baseline features and the structured
features. Our theoretical formulation demonstrates that DAN/K-DAN re-learn by
perturbing the intra/inter-class variations, apart from diminishing the
prediction errors. We scrutinize the DAN/K-DAN performance for pattern
classification on datasets of varying domains - faces, handwritten digits,
generic objects, to name a few. Unlike the typical BP-optimized DNNs to be
trained from gigantic datasets by GPU, we disclose that DAN/K-DAN are trainable
using only CPU even for small-scale training sets. Our experimental results
disclose that DAN/K-DAN outperform the present S-DNNs and also the BP-trained
DNNs, including multiplayer perceptron, deep belief network, etc., without data
augmentation applied.Comment: 14 pages, 7 figures, 11 table
Improving deep forest by confidence screening
Most studies about deep learning are based on neural network models, where many layers of parameterized nonlinear differentiable modules are trained by backpropagation. Recently, it has been shown that deep learning can also be realized by non-differentiable modules without backpropagation training called deep forest. The developed representation learning process is based on a cascade of cascades of decision tree forests, where the high memory requirement and the high time cost inhibit the training of large models. In this paper, we propose a simple yet effective approach to improve the efficiency of deep forest. The key idea is to pass the instances with high confidence directly to the final stage rather than passing through all the levels. We also provide a theoretical analysis suggesting a means to vary the model complexity from low to high as the level increases in the cascade, which further reduces the memory requirement and time cost. Our experiments show that the proposed approach achieves highly competitive predictive performance with significantly reduced time cost and memory requirement by up to one order of magnitude
ΠΠΠΠ«Π ΠΠΠΠ₯ΠΠΠ« Π Π ΠΠΠ ΠΠΠΠ’ΠΠ ΠΠΠΠΠ ΠΠ’ΠΠΠ ΠΠ‘ΠΠ£Π‘Π‘Π’ΠΠΠΠΠΠΠ ΠΠΠ’ΠΠΠΠΠΠ’Π Π ΠΠΠΠΠΠΠ‘Π’ΠΠΠ Π ΠΠΠ ΠΠΠΠΠΠΠ
The relevance of developing an intelligent automated diagnostic system (IADS) for lung cancer (LC) detection stems from the social significance of this disease and its leading position among all cancer diseases. Theoretically, the use of IADS is possible at a stage of screening as well as at a stage of adjusted diagnosis of LC. The recent approaches to training the IADS do not take into account the clinical and radiological classification as well as peculiarities of the LC clinical forms, which are used by the medical community. This defines difficulties and obstacles of using the available IADS. The authors are of the opinion that the closeness of a developed IADS to the Β«doctorβs logicΒ» contributes to a better reproducibility and interpretability of the IADS usage results. Most IADS described in the literature have been developed on the basis of neural networks, which have several disadvantages that affect reproducibility when using the system. This paper proposes a composite algorithm using machine learning methods such as Deep Forest and Siamese neural network, which can be regarded as a more efficient approach for dealing with a small amount of training data and optimal from the reproducibility point of view. The open datasets used for training IADS include annotated objects which in some cases are not confirmed morphologically. The paper provides a description of the LIRA dataset developed by using the diagnostic results of St. Petersburg Clinical Research Center of Specialized Types of Medical Care (Oncology), which includes only computed tomograms of patients with the verified diagnosis. The paper considers stages of the machine learning process on the basis of the shape features, of the internal structure features as well as a new developed system of differential diagnosis of LC based on the Siamese neural networks. A new approach to the feature dimension reduction is also presented in the paper, which aims more efficient and faster learning of the system.ΠΠΊΡΡΠ°Π»ΡΠ½ΠΎΡΡΡ ΡΠ°Π·ΡΠ°Π±ΠΎΡΠΊΠΈ ΠΈΠ½ΡΠ΅Π»Π»Π΅ΠΊΡΡΠ°Π»ΡΠ½ΠΎΠΉ Π°Π²ΡΠΎΠΌΠ°ΡΠΈΠ·ΠΈΡΠΎΠ²Π°Π½Π½ΠΎΠΉ ΡΠΈΡΡΠ΅ΠΌΡ Π΄ΠΈΠ°Π³Π½ΠΎΡΡΠΈΠΊΠΈ (ΠΠΠ‘Π) ΡΠ°ΠΊΠ° Π»Π΅Π³ΠΊΠΎΠ³ΠΎ (Π Π) ΡΠ²ΡΠ·Π°Π½Π° Ρ ΡΠΎΡΠΈΠ°Π»ΡΠ½ΠΎΠΉ Π·Π½Π°ΡΠΈΠΌΠΎΡΡΡΡ ΡΡΠΎΠ³ΠΎ Π·Π°Π±ΠΎΠ»Π΅Π²Π°Π½ΠΈΡ ΠΈ Π΅Π³ΠΎ Π»ΠΈΠ΄ΠΈΡΡΡΡΠ΅ΠΉ ΠΏΠΎΠ·ΠΈΡΠΈΠ΅ΠΉ Π² ΡΡΡΡΠΊΡΡΡΠ΅ ΠΎΠ½ΠΊΠΎΠ»ΠΎΠ³ΠΈΡΠ΅ΡΠΊΠΎΠΉ Π·Π°Π±ΠΎΠ»Π΅Π²Π°Π΅ΠΌΠΎΡΡΠΈ. Π’Π΅ΠΎΡΠ΅ΡΠΈΡΠ΅ΡΠΊΠΈ ΠΏΡΠΈΠΌΠ΅Π½Π΅Π½ΠΈΠ΅ ΠΠΠ‘Π Π²ΠΎΠ·ΠΌΠΎΠΆΠ½ΠΎ ΠΊΠ°ΠΊ Π½Π° ΡΡΠ°ΠΏΠ΅ ΡΠΊΡΠΈΠ½ΠΈΠ½Π³Π°, ΡΠ°ΠΊ ΠΈ Π² ΡΡΠΎΡΠ½Π΅Π½Π½ΠΎΠΉ Π΄ΠΈΠ°Π³Π½ΠΎΡΡΠΈΠΊΠ΅ Π Π. ΠΡΠΈΠΌΠ΅Π½ΡΠ΅ΠΌΡΠ΅ ΠΏΠΎΠ΄Ρ
ΠΎΠ΄Ρ ΠΊ ΠΎΠ±ΡΡΠ΅Π½ΠΈΡ ΠΠΠ‘Π Π½Π΅ ΡΡΠΈΡΡΠ²Π°ΡΡ ΠΊΠ»ΠΈΠ½ΠΈΠΊΠΎ-ΡΠ΅Π½ΡΠ³Π΅Π½ΠΎΠ»ΠΎΠ³ΠΈΡΠ΅ΡΠΊΡΡ ΠΊΠ»Π°ΡΡΠΈΡΠΈΠΊΠ°ΡΠΈΡ ΠΈ ΠΎΡΠΎΠ±Π΅Π½Π½ΠΎΡΡΠΈ ΠΊΠ»ΠΈΠ½ΠΈΡΠ΅ΡΠΊΠΈΡ
ΡΠΎΡΠΌ Π Π, ΠΈΡΠΏΠΎΠ»ΡΠ·ΡΠ΅ΠΌΡΠ΅ ΠΌΠ΅Π΄ΠΈΡΠΈΠ½ΡΠΊΠΈΠΌ ΡΠΎΠΎΠ±ΡΠ΅ΡΡΠ²ΠΎΠΌ. Π‘ ΡΡΠΈΠΌ ΡΠ²ΡΠ·Π°Π½Ρ ΡΡΡΠ΄Π½ΠΎΡΡΠΈ ΠΏΡΠΈΠΌΠ΅Π½Π΅Π½ΠΈΡ ΡΠ°Π·ΡΠ°Π±Π°ΡΡΠ²Π°Π΅ΠΌΡΡ
Π² Π½Π°ΡΡΠΎΡΡΠ΅Π΅ Π²ΡΠ΅ΠΌΡ ΡΠΈΡΡΠ΅ΠΌ. ΠΠ²ΡΠΎΡΡ ΠΏΡΠΈΠ΄Π΅ΡΠΆΠΈΠ²Π°ΡΡΡΡ ΠΌΠ½Π΅Π½ΠΈΡ, ΡΡΠΎ ΠΏΡΠΈΠ±Π»ΠΈΠΆΠ΅Π½Π½ΠΎΡΡΡ ΡΠ°Π·ΡΠ°Π±Π°ΡΡΠ²Π°Π΅ΠΌΠΎΠΉ ΠΠΠ‘Π ΠΊ Β«Π»ΠΎΠ³ΠΈΠΊΠ΅ Π²ΡΠ°ΡΠ°Β» ΡΠΏΠΎΡΠΎΠ±ΡΡΠ²ΡΠ΅Ρ Π»ΡΡΡΠ΅ΠΉ Π²ΠΎΡΠΏΡΠΎΠΈΠ·Π²ΠΎΠ΄ΠΈΠΌΠΎΡΡΠΈ ΠΈ ΠΈΠ½ΡΠ΅ΡΠΏΡΠ΅ΡΠΈΡΡΠ΅ΠΌΠΎΡΡΠΈ ΡΠ΅Π·ΡΠ»ΡΡΠ°ΡΠΎΠ² ΠΏΡΠΈ Π΅Π΅ ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΠΈ. ΠΠΎΠ»ΡΡΠΈΠ½ΡΡΠ²ΠΎ ΠΎΠΏΠΈΡΠ°Π½Π½ΡΡ
Π² Π»ΠΈΡΠ΅ΡΠ°ΡΡΡΠ΅ ΠΠΠ‘Π ΡΠΎΠ·Π΄Π°Π½Ρ Π½Π° ΠΎΡΠ½ΠΎΠ²Π΅ Π½Π΅ΠΉΡΠΎΠ½Π½ΡΡ
ΡΠ΅ΡΠ΅ΠΉ, ΠΊΠΎΡΠΎΡΡΠ΅ ΠΎΠ±Π»Π°Π΄Π°ΡΡ ΡΡΠ΄ΠΎΠΌ Π½Π΅Π΄ΠΎΡΡΠ°ΡΠΊΠΎΠ², Π²Π»ΠΈΡΡΡΠΈΡ
Π½Π° Π²ΠΎΡΠΏΡΠΎΠΈΠ·Π²ΠΎΠ΄ΠΈΠΌΠΎΡΡΡ ΠΏΡΠΈ ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΠΈ ΡΠΈΡΡΠ΅ΠΌΡ. ΠΠ°Π½Π½Π°Ρ ΡΠ°Π±ΠΎΡΠ° ΠΎΡΡΠ°ΠΆΠ°Π΅Ρ ΠΏΡΠΈΠΌΠ΅Π½Π΅Π½ΠΈΠ΅ ΠΊΠΎΠΌΠ±ΠΈΠ½ΠΈΡΠΎΠ²Π°Π½Π½ΠΎΠ³ΠΎ Π°Π»Π³ΠΎΡΠΈΡΠΌΠ° Ρ ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΠ΅ΠΌ ΠΌΠ΅ΡΠΎΠ΄ΠΎΠ² ΠΌΠ°ΡΠΈΠ½Π½ΠΎΠ³ΠΎ ΠΎΠ±ΡΡΠ΅Π½ΠΈΡ, ΡΠ°ΠΊΠΈΡ
ΠΊΠ°ΠΊ Π³Π»ΡΠ±ΠΎΠΊΠΈΠΉ Π»Π΅Ρ ΠΈ ΡΠΈΠ°ΠΌΡΠΊΠ°Ρ Π½Π΅ΠΉΡΠΎΠ½Π½Π°Ρ ΡΠ΅ΡΡ, ΡΡΠΎ ΡΠ²Π»ΡΠ΅ΡΡΡ Π±ΠΎΠ»Π΅Π΅ ΡΡΡΠ΅ΠΊΡΠΈΠ²Π½ΡΠΌ ΠΏΠΎΠ΄Ρ
ΠΎΠ΄ΠΎΠΌ ΠΏΡΠΈ ΠΌΠ°Π»ΠΎΠΉ Π²ΡΠ±ΠΎΡΠΊΠ΅ ΠΎΠ±ΡΡΠ°ΡΡΠΈΡ
Π΄Π°Π½Π½ΡΡ
ΠΈ ΠΎΠΏΡΠΈΠΌΠ°Π»ΡΠ½ΡΠΌ Ρ ΡΠΎΡΠΊΠΈ Π·ΡΠ΅Π½ΠΈΡ Π²ΠΎΡΠΏΡΠΎΠΈΠ·Π²ΠΎΠ΄ΠΈΠΌΠΎΡΡΠΈ. ΠΡΠΊΡΡΡΡΠ΅ Π±Π°Π·Ρ Π΄Π°Π½Π½ΡΡ
, ΠΏΡΠΈΠΌΠ΅Π½ΡΠ΅ΠΌΡΠ΅ ΠΏΡΠΈ ΡΠ°Π·ΡΠ°Π±ΠΎΡΠΊΠ΅ ΠΠΠ‘Π, Π²ΠΊΠ»ΡΡΠ°ΡΡ ΡΠ°Π·ΠΌΠ΅ΡΠ΅Π½Π½ΡΠ΅, Π½ΠΎ Π² ΡΡΠ΄Π΅ ΡΠ»ΡΡΠ°Π΅Π² Π½Π΅ ΠΏΠΎΠ΄ΡΠ²Π΅ΡΠΆΠ΄Π΅Π½Π½ΡΠ΅ ΠΌΠΎΡΡΠΎΠ»ΠΎΠ³ΠΈΡΠ΅ΡΠΊΠΈ Π½Π°Ρ
ΠΎΠ΄ΠΊΠΈ. Π ΡΡΠ°ΡΡΠ΅ ΠΏΡΠΈΠ²ΠΎΠ΄ΠΈΡΡΡ ΠΎΠΏΠΈΡΠ°Π½ΠΈΠ΅ Π±Π°Π·Ρ Π΄Π°Π½Π½ΡΡ
LIRA, ΡΠΎΠ·Π΄Π°Π½Π½ΠΎΠΉ Π½Π° ΠΌΠ°ΡΠ΅ΡΠΈΠ°Π»Π΅ Π‘Π°Π½ΠΊΡ-ΠΠ΅ΡΠ΅ΡΠ±ΡΡΠ³ΡΠΊΠΎΠ³ΠΎ ΠΊΠ»ΠΈΠ½ΠΈΡΠ΅ΡΠΊΠΎΠ³ΠΎ Π½Π°ΡΡΠ½ΠΎ-ΠΏΡΠ°ΠΊΡΠΈΡΠ΅ΡΠΊΠΎΠ³ΠΎ ΡΠ΅Π½ΡΡΠ° ΡΠΏΠ΅ΡΠΈΠ°Π»ΠΈΠ·ΠΈΡΠΎΠ²Π°Π½Π½ΡΡ
Π²ΠΈΠ΄ΠΎΠ² ΠΌΠ΅Π΄ΠΈΡΠΈΠ½ΡΠΊΠΎΠΉ ΠΏΠΎΠΌΠΎΡΠΈ (ΠΎΠ½ΠΊΠΎΠ»ΠΎΠ³ΠΈΡΠ΅ΡΠΊΠΈΠΉ), ΠΊΠΎΡΠΎΡΠ°Ρ Π²ΠΊΠ»ΡΡΠ°Π΅Ρ ΡΠΎΠ»ΡΠΊΠΎ ΠΊΠΎΠΌΠΏΡΡΡΠ΅ΡΠ½ΡΠ΅ ΡΠΎΠΌΠΎΠ³ΡΠ°ΠΌΠΌΡ ΠΏΠ°ΡΠΈΠ΅Π½ΡΠΎΠ² Ρ Π²Π΅ΡΠΈΡΠΈΡΠΈΡΠΎΠ²Π°Π½Π½ΡΠΌ Π΄ΠΈΠ°Π³Π½ΠΎΠ·ΠΎΠΌ. Π ΡΡΠ°ΡΡΠ΅ ΠΎΠΏΠΈΡΠ°Π½Ρ ΡΡΠ°ΠΏΡ ΠΌΠ°ΡΠΈΠ½Π½ΠΎΠ³ΠΎ ΠΎΠ±ΡΡΠ΅Π½ΠΈΡ ΠΏΠΎ ΠΏΡΠΈΠ·Π½Π°ΠΊΠ°ΠΌ ΡΠΎΡΠΌΡ, Π²Π½ΡΡΡΠ΅Π½Π½Π΅ΠΉ ΡΡΡΡΠΊΡΡΡΡ, Π° ΡΠ°ΠΊΠΆΠ΅ Π½ΠΎΠ²Π°Ρ ΡΠ°Π·ΡΠ°Π±ΠΎΡΠ°Π½Π½Π°Ρ Π°ΡΡ
ΠΈΡΠ΅ΠΊΡΡΡΠ° Π΄ΠΈΡΡΠ΅ΡΠ΅Π½ΡΠΈΠ°Π»ΡΠ½ΠΎΠΉ Π΄ΠΈΠ°Π³Π½ΠΎΡΡΠΈΠΊΠΈ ΠΎΠ±ΡΠ°Π·ΠΎΠ²Π°Π½ΠΈΠΉ Π½Π° ΠΎΡΠ½ΠΎΠ²Π΅ ΡΠΈΠ°ΠΌΡΠΊΠΈΡ
Π½Π΅ΠΉΡΠΎΠ½Π½ΡΡ
ΡΠ΅ΡΠ΅ΠΉ. Π’Π°ΠΊΠΆΠ΅ ΠΎΡΡΠ°ΠΆΠ΅Π½ ΡΠΏΠΎΡΠΎΠ± ΠΏΠΎΠ½ΠΈΠΆΠ΅Π½ΠΈΡ ΡΠ°Π·ΠΌΠ΅ΡΠ½ΠΎΡΡΠΈ Π΄Π°Π½Π½ΡΡ
Π΄Π»Ρ Π±ΠΎΠ»Π΅Π΅ ΡΡΡΠ΅ΠΊΡΠΈΠ²Π½ΠΎΠ³ΠΎ ΠΈ Π±ΡΡΡΡΠΎΠ³ΠΎ ΠΎΠ±ΡΡΠ΅Π½ΠΈΡ ΡΠΈΡΡΠ΅ΠΌΡ