12 research outputs found

    Optimization of K-NN algorithm by clustering and reliability coefficients: application to breast-cancer diagnosis

    Get PDF
    Abstract There is a growing trend towards data mining applications in medicine. Different algorithms have been explored by medical practitioners in an attempt to assist their work; the diagnosis of breast cancer is one of those applications. Machine learning algorithms are of vital importance to many medical problems, they can help to diagnose a disease, to detect its causes, to predict the outcome of a treatment, etc. K-Nearest Neighbors algorithm (KNN) is one of the simplest algorithms; it is widely used in predictive analysis. To optimize its performance and to accelerate its process, this paper proposes a new solution to speed up KNN algorithm based on clustering and attributes filtering. It also includes another improvement based on reliability coefficients which insures a more accurate classification. Thus, the contributions of this paper are three-fold: (i) the clustering of class instances, (ii) the selection of most significant attributes, and (iii) the ponderation of similarities by reliability coefficients. Results of the proposed approach exceeded most known classification techniques with an average f-measure exceeding 94% on the considered breast-cancer Dataset

    Cancer subtype identification pipeline: a classifusion approach

    Get PDF
    Classification of cancer patients into treatment groups is essential for appropriate diagnosis to increase survival. Previously, a series of papers, largely published in the breast cancer domain have leveraged Computational Intelligence (CI) developments and tools, resulting in ground breaking advances such as the classification of cancer into newly identified classes - leading to improved treatment options. However, the current literature on the use of CI to achieve this is fragmented, making further advances challenging. This paper captures developments in this area so far, with the goal to establish a clear, step-by-step pipeline for cancer subtype identification. Based on establishing the pipeline, the paper identifies key potential advances in CI at the individual steps, thus establishing a roadmap for future research. As such, it is the aim of the paper to engage the CI community to address the research challenges and leverage the strong potential of CI in this important area. Finally, we present a small set of recent findings on the Nottingham Tenovus Primary Breast Carcinoma Series enabling the classification of a higher number of patients into one of the identified breast cancer groups, and introduce Classifusion: a combination of results of multiple classifiers

    IMPLEMENTASI ALGORITMA K-NEAREST NEIGHBORS UNTUK MENENTUKAN KEMUNGKINAN TINGKAT KETERCAPAIAN KOMPETENSI PEMBELAJARAN KETERAMPILAN KOMPUTER DAN PENGELOLAAN INFORMASI

    Get PDF
    Nilai keterampilan komputer, nilai matematika dan bahasa Inggris pada saat  kelas 3 SMP  yang tersimpan dalam basis data digunakan untuk menentukan tingkat ketercapaian kompetensi pembelajaran Keterampilan Komputer dan pengelolaan Informasi (KKPI)  di kelas I SMK pada masa lampau. Nilai keterampilan komputer, nilai matematika dan bahasa Inggris pada masa sekarang  digunakan untuk  menentukan kemungkinan tingkat ketercapaian kompetensi pembelajaran KKPI di kelas I SMK pada masa sekarang sebagai informasi untuk meminimalisir dan mendiagnosa awal faktor-faktor yang bisa diantisipasi lebih dini. Sistem digunakan sebagai salah satu media komunikasi antar guru pembina dan guru muda dalam mengelola pembelajaran. Algoritma K-Nearest Neigbohr digunakan untuk menentukan kemiripan kondisi masa lampau dan kondisi sekarang dengan menggunakan rumus similarity. Kasus sekarang dibandingkan kasus lampau kemudian diukur similarity-nya. Nilai similarity tertinggi menunjukkan kemiripan antara kasus yang dibandingkan, sehingga menjadi dasar pengambilan kesimpulan kemungkinan tingkat pencapaian kompetensi. Hasil penelitian menunjukkan sistem dapat menghitung nilai similarity dan membuat kesimpulan kemungkinan pencapaian tingkat kompetensi KKPI berdasarkan pada nilai keterampilan komputer, nilai matematika dan bahasa Inggris. Sistem dapat digunakan sebagai model  komunikasi guru muda dan guru pembina dalam proses pembelajaran

    Finger Vein Recognition Using Principle Component Analysis and Adaptive k-Nearest Centroid Neighbor Classifier

    Get PDF
    The k-nearest centroid neighbor kNCN classifier is one of the non-parametric classifiers which provide a powerful decision based on the geometrical surrounding neighborhood. Essentially, the main challenge in the kNCN is due to slow classification time that utilizing all training samples to find each nearest centroid neighbor. In this work, an adaptive k-nearest centroid neighbor (akNCN) is proposed as an improvement to the kNCN classifier. Two new rules are introduced to adaptively select the neighborhood size of the test sample. The neighborhood size for the test sample is changed through the following ways: 1) The neighborhood size, k will be adapted to j if the centroid distance of j-th nearest centroid neighbor is greater than the predefined boundary. 2) There is no need to look for further nearest centroid neighbors if the maximum number of samples of the same class is found among jth nearest centroid neighbor. Thus, the size of neighborhood is adaptively changed to j. Experimental results on theFinger Vein USM (FV-USM) image database demonstrate the promising results in which the classification time of the akNCN classifier is significantly reduced to 51.56% in comparison to the closest competitors, kNCN and limited-kNCN. It also outperforms its competitors by achieving the best reduction ratio of 12.92% whilemaintaining the classification accuracy

    A model‐based protocol for the diagnosis of von Willebrand disease

    Get PDF
    Von Willebrand disease (VWD) is one of the main inherited coagulation disorders. It is caused by a deficiency and/or a dysfunction of the von Willebrand factor (VWF), a fundamental multimeric glycoprotein involved in the hemostasis process. Correct detection of the disease is not an easy task because the disease manifests itself in many variants and a high intra-subject variability is observed. For these reasons, the diagnostic clinical trials typically rely on a 24-h sampling protocol, which makes the overall test long, stressful, and costly. Using a new pharmacokinetic model derived from Galvanin et al.'s 2014 study, this study aims at i) assessing the theoretical possibility to perform a shorter clinical test and ii) proposing a set of model-based diagnostic methods as a support for the clinical team. A preliminary information analysis is performed in order to understand which sampling instants are more informative for model identification. This allowed us to propose a novel, 8-h diagnostic protocol, which is still able to ensure model identifiability. Three alternative diagnostic methods are then proposed based on this short-length clinical protocol. One of them directly uses the pharmacokinetic model, whereas the other two are based on the use of three indices (two pharmacokinetic indices, namely clearance, total VWF released, and as third index the basal multimer ratio) to formulate the diagnosis problem as a classification one. The classification problem is then solved using K-nearest neighbours and linear discriminant analysis. Results show the theoretical feasibility of a VWD diagnosis based on a shorter protocol

    DEEP LEARNING BASED SEGMENTATION AND CLASSIFICATION FOR IMPROVED BREAST CANCER DETECTION

    Get PDF
    Breast Cancer is a leading killer of women globally. It is a serious health concern caused by calcifications or abnormal tissue growth in the breast. Doing a screening and identifying the nature of the tumor as benign or malignant is important to facilitate early intervention, which drastically decreases the mortality rate. Usually, it uses ultrasound images, since they are easily accessible to most people and have no drawbacks as such, unlike in the other most famous screening technique of mammograms where in some cases you may not get a clear scan. In this thesis, the approach to this problem is to build a stacked model which makes predictions on the basis of the shape, pattern, and spread of the tumor. To achieve this, typical steps are pre-processing of images followed by segmentation of the image and classification. For pre-processing, the proposed approach in this thesis uses histogram equalization that helps in improving the contrast of the image, making the tumor stand out from its surroundings, and making it easier for the segmentation step. Through segmentation, the approach uses UNet architecture with a ResNet backbone. The UNet architecture is made specifically for biomedical imaging. The aim of segmentation is to separate the tumor from the ultrasound image so that the classification model can make its predictions from this mask. The metric result of the F1-score for the segmentation model turned out to be 97.30%. For classification, the CNN base model is used for feature extraction from provided masks. These are then fed into a network and the predictions are done. The base CNN model used is ResNet50 and the neural network used for the output layer is a simple 8-layer network with ReLU activation in the hidden layers and softmax in the final decision-making layer. The ResNet weights are initialized from training on ImageNet. The ResNet50 returns 2048 features from each mask. These are then fed into the network for decision-making. The hidden layers of the neural network have 1024, 512, 256, 128, 64, 32, and 10 neurons respectively. The classification accuracy achieved for the proposed model was 98.61% with an F1 score of 98.41%. The detailed experimental results are presented along with comparative data

    Improving a wireless localization system via machine learning techniques and security protocols

    Get PDF
    The recent advancements made in Internet of Things (IoT) devices have brought forth new opportunities for technologies and systems to be integrated into our everyday life. In this work, we investigate how edge nodes can effectively utilize 802.11 wireless beacon frames being broadcast from pre-existing access points in a building to achieve room-level localization. We explain the needed hardware and software for this system and demonstrate a proof of concept with experimental data analysis. Improvements to localization accuracy are shown via machine learning by implementing the random forest algorithm. Using this algorithm, historical data can train the model and make more informed decisions while tracking other nodes in the future. We also include multiple security protocols that can be taken to reduce the threat of both physical and digital attacks on the system. These threats include access point spoofing, side channel analysis, and packet sniffing, all of which are often overlooked in IoT devices that are rushed to market. Our research demonstrates the comprehensive combination of affordability, accuracy, and security possible in an IoT beacon frame-based localization system that has not been fully explored by the localization research community

    Application of K-nearest neighbors algorithm on breast cancer diagnosis problem.

    No full text
    This paper addresses the Breast Cancer diagnosis problem as a pattern classification problem. Specifically, this problem is studied using the Wisconsin-Madison Breast Cancer data set. The K-nearest neighbors algorithm is employed as the classifier. Conceptually and implementation-wise, the K-nearest neighbors algorithm is simpler than the other techniques that have been applied to this problem. In addition, the Knearest neighbors algorithm produces the overall classification result 1.17% better than the best result known for this problem
    corecore