Search CORE

1,069 research outputs found

Melanoma Detection Using Mobile Technology and Feature-Based Classification Techniques

Author: Ibrahim Salim Al Halaby
Publication venue: الجامعة الإسلامية - غزة
Publication date: 01/01/2014
Field of study

Melanoma is one of the most dangerous types of skin cancer in terms of the ratio of death cases. Probability of death increases when it is diagnosed late. However, it is possible to treat melanoma successfully when diagnosed in its early stages. One of the most common medical methods for diagnosing melanoma is the ABCD (Asymmetry, Border irregularity, Color, and Diameter) method that involves the measurement of four features of skin lesions. The main disadvantage of this method is that estimation error and subjectivity affects the accuracy of diagnosis, especially when performed by non-specialists. Scarcity of specialists makes the problem worse. This has led to the development of computer systems to help in melanoma diagnosis. However, while most computer systems can achieve high accuracy with adequate speed, they have problems in the usability and flexibility. The emergence of smart phones with increasing image capture and processing capabilities has made it more possible to use such devices to perform medical image analysis such as the diagnosis of melanoma. Our research work combines existing melanoma diagnosis method and the image capture and processing capabilities of smart phones to achieve fast, affordable, easily available and highly accurate melanoma diagnosis. In this work, we propose a complete smart phone application to capture, and process an image of the suspicious region of the skin in order to estimate its probability of being melanoma. The system can use historical cases to improve its diagnosis accuracy. The system was tested on 164 sample images. 14 images were not well-captured and could not be diagnosed, while the remaining 150 cases were successfully processed. In each of these 150 images, the lesion was correctly segmented and their ABCD feature set extracted. Diagnosis accuracy of the analyzed images ranged between 88%-94 with best results using SVM classifier, and worst is the KNN classifier.الميلانوما أحد أخطر أنواع سرطان الجلد من حيث نسبة عدد الوفيات الى حالات الاصابة ،تزداد الخطورة في الحالات التي يتم معالجتها في مراحل متأخرة ، ولكن يمكن علاج الميلانوما بنجاح اذا تم اكتشاف المرض في مراحله الاولى ، لذلك هناك العديد من الطرق المتبعة لتشخيص المرض في المراحل الأولى لتوجيه المريض الى الطبيب المختص مباشرة في حالة الشك في وجود المرض ، اشهر هذه الطرق هي طريقة ABCD، وأهم العقبات التي تواجه هذه الطريقة هي عدم دقة التنفيذ من الاشخاص غير المتخصصين ، لأن اعتماد الطريقة على خصائص مثل الحجم واللون والشكل يجعلها عرضة للكثير من التقدير والنسبية في التشخيص مما يفقد هذه الطريقة الكثير من الدقة في النتائج ، لذلك تم خلال الاعوام السابقة العمل على العديد من الانظمة المحوسبة التي تسعي الى المساعدة في التشخيص وتقليل نسبة الخطأ، وأتمتة التشخيص بحيث لا يخضع للتقدير الشخصي ، ومع بداية ظهور الاجهزة الذكية والتي تجاوز استخدامها حدود التواصل ليتم استخدامها في التقاط الصور بدقة عالية ومعالجة البيانات بكفاءة والتواصل مع شبكة الانترنت، كان اتجاه للاستفادة من مرونة الأجهزة الذكية ودقتها في التقاط الصور ومعالجتها من اجل توفير انظمة لتشخيص الأمراض المختلفة، وكان نصيب من هذه الابحاث والتطبيقات لتسهيل عملية تشخيص الميلانوما، ورغم توفر العديد من الأنظمة والابحاث الا أن النتائج لا تزال في المراحل الاولى حيث هناك العديد من العقبات، فلا تزال دقة هذه الانظمة لا تصل الى المستوى المطلوب مما يجعلها في بعض الحالات خطر على حياة المريض اذا تم تشخيص الحالات المصابة بشكل سلبي، كما ان امكانات معالجة الصور وتخزين البيانات وتصنيفها يعتبر من المجالات الجديدة والتي لم يتم اختبارها بشكل كافي في الابحاث السابقة ولم يتم التعامل مع الامكانات الجديدة للهواتف الذكية ، وخلال فترة قريبة كان مثلا تحليل كميات كبيرة من البيانات على الهواتف الذكية ومعالجتها وتصنيفها من الامور الغير ممكنة والتي اصبحت الان من الامور الممكنة في ظل تطور وحدات التخزين والمعالجة بشكل كبير للهواتف الذكية. لذلك جاء هذا العمل لمواصلة الجهد المبذول في توفير حل لمشكلة تشخيص مرض الميلانوما بشكل دقيق وفعال ومرن باستخدام الهواتف الذكية ، حيث تم استخدام امكانات الهواتف الذكية في التقاط الصور ومعالجتها بالإضافة لإمكانية تخزين المعلومات واستخدامها في التصنيف وتوقع الحالات الجديدة المصابة بالمرض، وتم بناء واختبار نظام كامل لتحقيق ذلك ، وقد كانت نتائج العمل مُرضِية جدا حيث تم فحص البرنامج على عينة مكونة من 164 صورة حيث نجح البرنامج في مرحلة معالجة الصورة في معالجة 150 صورة وعزل منطقة الآفة من أصل 164 صورة كما ذكرنا، علما ان نسبة النجاح في معالجة الصور تم تحسينها لتجاوز الأخطاء اثناء التقاط الصور من خلال استخدام واجهة تفاعلية للمستخدم ، اما في مرحلة التصنيف علي 150 صورة الناتجة من معالجة الصور فكانت دقة نتائج التصنيف بين 88-94% حسب نظام التصنيف المتبع ، وهي نتائج جيدة و يمكن البناء عليها في استخدام الموبايل في التشخيص الأولى لمرضى الميلانوم

Institutional Repository of the Islamic University of Gaza

Delineating Knowledge Domains in Scientific Domains in Scientific Literature using Machine Learning (ML)

Author: Choudhury Smarajit Paul, Mr.
Jaiswal Kshitij, Mr.
Maurya Abhay
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 28/01/2021
Field of study

The recent years have witnessed an upsurge in the number of published documents. Organizations are showing an increased interest in text classification for effective use of the information. Manual procedures for text classification can be fruitful for a handful of documents, but the same lack in credibility when the number of documents increases besides being laborious and time-consuming. Text mining techniques facilitate assigning text strings to categories rendering the process of classification fast, accurate, and hence reliable. This paper classifies chemistry documents using machine learning and statistical methods. The procedure of text classification has been described in chronological order like data preparation followed by processing, transformation, and application of classification techniques culminating in the validation of the results

DigitalCommons@University of Nebraska

Recommended from our members

GENETIC PROGRAMMING TO OPTIMIZE PERFORMANCE OF MACHINE LEARNING ALGORITHMS ON UNBALANCED DATA SET

Author: Thumpati Asitha
Publication venue: CSUSB ScholarWorks
Publication date: 01/08/2023
Field of study

Data collected from the real world is often imbalanced, meaning that the distribution of data across known classes is biased or skewed. When using machine learning classification models on such imbalanced data, predictive performance tends to be lower because these models are designed with the assumption of balanced classes or a relatively equal number of instances for each class. To address this issue, we employ data preprocessing techniques such as SMOTE (Synthetic Minority Oversampling Technique) for oversampling data and random undersampling for undersampling data on unbalanced datasets. Once the dataset is balanced, genetic programming is utilized for feature selection to enhance performance and efficiency. For this experiment, we consider an imbalanced bank marketing dataset from the UCI Machine Learning Repository. To assess the effectiveness of the technique, it is implemented on four different classification algorithms: Decision Tree, Logistic Regression, KNN (K-Nearest Neighbors), and SVM (Support Vector Machines). Various metrics including accuracy, balanced accuracy, recall, F-score, ROC (Receiver Operating Characteristics) curve, and PR (Precision-Recall) curve are compared for unbalanced data, oversampled data, undersampled data, and cleaned data with Tomek-Links for each algorithm. The results indicate that all four algorithms perform better when oversampling the minority class to half of the majority class and undersampling the majority class examples to match the minority class, followed by performing Tomek-Links on the balanced dataset

CSUSB ScholarWorks

Effect of Term Weighting on Keyword Extraction in Hierarchical Category Structure

Author: Boonbrahm Salin
Chiraratanasopha Boonthida
Theeramunkong Thanaruk
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 03/08/2021
Field of study

While there have been several studies related to the effect of term weighting on classification accuracy, relatively few works have been conducted on how term weighting affects the quality of keywords extracted for characterizing a document or a category (i.e., document collection). Moreover, many tasks require more complicated category structure, such as hierarchical and network category structure, rather than a flat category structure. This paper presents a qualitative and quantitative study on how term weighting affects keyword extraction in the hierarchical category structure, in comparison to the flat category structure. A hierarchical structure triggers special characteristic in assigning a set of keywords or tags to represent a document or a document collection, with support of statistics in a hierarchy, including category itself, its parent category, its child categories, and sibling categories. An enhancement of term weighting is proposed particularly in the form of a series of modified TFIDF's, for improving keyword extraction. A text collection of public-hearing opinions is used to evaluate variant TFs and IDFs to identify which types of information in hierarchical category structure are useful. By experiments, we found that the most effective IDF family, namely TF-IDFr, is identity>sibling>child>parent in order. The TF-IDFr outperforms the vanilla version of TFIDF with a centroid-based classifier

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Shape recognition through multi-level fusion of features and classifiers

Author: A Cutler
A Giangreco Maidana
A Wang
C Lin
D Zhang
DW Aha
F Mokhtarian
G Lin
G Lu
G Nakai
H Liu
H Liu
H Liu
H Liu
HY Wang
I Sekita
J Friedman
J Platt
J Yang
K Chatterjee
L Kuncheva
L Kurnianggoro
L Wan
L Zadeh
M Dash
M Mohandes
M Reed Teague
MK Hu
MR Berthold
MR Berthold
N Thakoor
P Arjun
P Bermejo
P Liu
QS Sun
R Quinlan
S Belongie
S Maloň
S Priyanka
S Yerima
SM Chen
SM Chen
SM Chen
SM Chen
SM Chen
SM Chen
V Bolón-Canedo
W Ding
W Pedrycz
W Pedrycz
W Pedrycz
W Pedrycz
W Shen
X Wang
Y Zulueta-Veliz
Z Xu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 31/10/2020
Field of study

Shape recognition is a fundamental problem and a special type of image classification, where each shape is considered as a class. Current approaches to shape recognition mainly focus on designing low-level shape descriptors, and classify them using some machine learning approaches. In order to achieve effective learning of shape features, it is essential to ensure that a comprehensive set of high quality features can be extracted from the original shape data. Thus we have been motivated to develop methods of fusion of features and classifiers for advancing the classification performance. In this paper, we propose a multi-level framework for fusion of features and classifiers in the setting of gran-ular computing. The proposed framework involves creation of diversity among classifiers, through adopting feature selection and fusion to create diverse feature sets and to train diverse classifiers using different learn-Xinming Wang algorithms. The experimental results show that the proposed multi-level framework can effectively create diversity among classifiers leading to considerable advances in the classification performance

Crossref

Online Research @ Cardiff

On the cyber security issues of the internet infrastructure

Author: VITALI DOMENICO
Publication venue
Publication date: 18/02/2013
Field of study

The Internet network has received huge attentions by the research community. At a first glance, the network optimization and scalability issues dominate the efforts of researchers and vendors. Many results have been obtained in the last decades: the Internet’s architecture is optimized to be cheap, robust and ubiquitous. In contrast, such a network has never been perfectly secure. During all its evolution, the security threats of the Internet persist as a transversal and endless topic. Nowadays, the Internet network hosts a multitude of mission critical activities. The electronic voting systems and financial services are carried out through it. Governmental institutions, financial and business organizations depend on the performance and the security of the Internet. This role confers to the Internet network a critical characterization. At the same time, the Internet network is a vector of malicious activities, like Denial of Service attacks; many reports of attacks can be found in both academic outcomes and daily news. In order to mitigate this wide range of issues, many research efforts have been carried out in the past decades; unfortunately, the complex architecture and the scale of the Internet make hard the evaluation and the adoption of such proposals. In order to improve the security of the Internet, the research community can benefit from sharing real network data. Unfortunately, privacy and security concerns inhibit the release of these data: its suffices to imagine the big amount of private information (e.g., political preferences or religious belief) it is possible to get while reading the Internet packets exchanged between users and web services. This scenario motivates my research, and represents the context of this dissertation which contributes to the analysis of the security issues of the Internet infrastructures and describes relevant security proposals. In particular, the main outcomes described in this dissertation are: • the definition of a secure routing protocol for the Internet network able to provide cryptographic guarantees against false route announcement and invalid path attack; • the definition of a new obfuscation technique that allow the research community to publicly release their real network flows with formal guarantees of security and privacy; • the evidence of a new kind of leakage of sensitive informations obtained hacking the models used by sundry Machine Learning Algorithms

Archivio della ricerca- Università di Roma La Sapienza

Machine Learning and Big Data Methodologies for Network Traffic Monitoring

Author: Giordano Danilo
Publication venue: Politecnico di Torino
Publication date: 01/01/2017
Field of study

Over the past 20 years, the Internet saw an exponential grown of traffic, users, services and applications. Currently, it is estimated that the Internet is used everyday by more than 3.6 billions users, who generate 20 TB of traffic per second. Such a huge amount of data challenge network managers and analysts to understand how the network is performing, how users are accessing resources, how to properly control and manage the infrastructure, and how to detect possible threats. Along with mathematical, statistical, and set theory methodologies machine learning and big data approaches have emerged to build systems that aim at automatically extracting information from the raw data that the network monitoring infrastructures offer. In this thesis I will address different network monitoring solutions, evaluating several methodologies and scenarios. I will show how following a common workflow, it is possible to exploit mathematical, statistical, set theory, and machine learning methodologies to extract meaningful information from the raw data. Particular attention will be given to machine learning and big data methodologies such as DBSCAN, and the Apache Spark big data framework. The results show that despite being able to take advantage of mathematical, statistical, and set theory tools to characterize a problem, machine learning methodologies are very useful to discover hidden information about the raw data. Using DBSCAN clustering algorithm, I will show how to use YouLighter, an unsupervised methodology to group caches serving YouTube traffic into edge-nodes, and latter by using the notion of Pattern Dissimilarity, how to identify changes in their usage over time. By using YouLighter over 10-month long races, I will pinpoint sudden changes in the YouTube edge-nodes usage, changes that also impair the end users’ Quality of Experience. I will also apply DBSCAN in the deployment of SeLINA, a self-tuning tool implemented in the Apache Spark big data framework to autonomously extract knowledge from network traffic measurements. By using SeLINA, I will show how to automatically detect the changes of the YouTube CDN previously highlighted by YouLighter. Along with these machine learning studies, I will show how to use mathematical and set theory methodologies to investigate the browsing habits of Internauts. By using a two weeks dataset, I will show how over this period, the Internauts continue discovering new websites. Moreover, I will show that by using only DNS information to build a profile, it is hard to build a reliable profiler. Instead, by exploiting mathematical and statistical tools, I will show how to characterize Anycast-enabled CDNs (A-CDNs). I will show that A-CDNs are widely used either for stateless and stateful services. That A-CDNs are quite popular, as, more than 50% of web users contact an A-CDN every day. And that, stateful services, can benefit of A-CDNs, since their paths are very stable over time, as demonstrated by the presence of only a few anomalies in their Round Trip Time. Finally, I will conclude by showing how I used BGPStream an open-source software framework for the analysis of both historical and real-time Border Gateway Protocol (BGP) measurement data. By using BGPStream in real-time mode I will show how I detected a Multiple Origin AS (MOAS) event, and how I studies the black-holing community propagation, showing the effect of this community in the network. Then, by using BGPStream in historical mode, and the Apache Spark big data framework over 16 years of data, I will show different results such as the continuous growth of IPv4 prefixes, and the growth of MOAS events over time. All these studies have the aim of showing how monitoring is a fundamental task in different scenarios. In particular, highlighting the importance of machine learning and of big data methodologies

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino