    An effective evidence theory based k-nearest neighbor (knn) classification

    Abstract In this paper, we study various K nearest neighbor (KNN

    Analyzing the influence of the sampling rate in the detection of malicious traffic on flow data

    [EN] Cyberattacks are a growing concern for companies and public administrations. The literature shows that analyzing network-layer traffic can detect intrusion attempts. However, such detection usually implies studying every datagram in a computer network. Therefore, routers routing a significant volume of network traffic do not perform an in-depth analysis of every packet. Instead, they analyze traffic patterns based on network flows. However, even gathering and analyzing flow data has a high-computational cost, and therefore routers usually apply a sampling rate to generate flow data. Adjusting the sampling rate is a tricky problem. If the sampling rate is low, much information is lost and some cyberattacks may be neglected, but if the sampling rate is high, routers cannot deal with it. This paper tries to characterize the influence of this parameter in different detection methods based on machine learning. To do so, we trained and tested malicious-traffic detection models using synthetic flow data gathered with several sampling rates. Then, we double-check the above models with flow data from the public BoT-IoT dataset and with actual flow data collected on RedCAYLE, the Castilla y León regional academic network.S

    SQL injection attack detection in network flow data

    [EN] SQL injections rank in the OWASP Top 3. The literature shows that analyzing network datagrams allows for detecting or preventing such attacks. Unfortunately, such detection usually implies studying all packets flowing in a computer network. Therefore, routers in charge of routing significant traffic loads usually cannot apply the solutions proposed in the literature. This work demonstrates that detecting SQL injection attacks on flow data from lightweight protocols is possible. For this purpose, we gathered two datasets collecting flow data from several SQL injection attacks on the most popular database engines. After evaluating several machine learning-based algorithms, we get a detection rate of over 97% with a false alarm rate of less than 0.07% with a Logistic Regression-based model.SIInstituto Nacional de Ciberseguridad de España (INCIBE)Universidad de Leó

    Graph-based Estimation of Information Divergence Functions

    abstract: Information divergence functions, such as the Kullback-Leibler divergence or the Hellinger distance, play a critical role in statistical signal processing and information theory; however estimating them can be challenge. Most often, parametric assumptions are made about the two distributions to estimate the divergence of interest. In cases where no parametric model fits the data, non-parametric density estimation is used. In statistical signal processing applications, Gaussianity is usually assumed since closed-form expressions for common divergence measures have been derived for this family of distributions. Parametric assumptions are preferred when it is known that the data follows the model, however this is rarely the case in real-word scenarios. Non-parametric density estimators are characterized by a very large number of parameters that have to be tuned with costly cross-validation. In this dissertation we focus on a specific family of non-parametric estimators, called direct estimators, that bypass density estimation completely and directly estimate the quantity of interest from the data. We introduce a new divergence measure, the DpD_p-divergence, that can be estimated directly from samples without parametric assumptions on the distribution. We show that the DpD_p-divergence bounds the binary, cross-domain, and multi-class Bayes error rates and, in certain cases, provides provably tighter bounds than the Hellinger divergence. In addition, we also propose a new methodology that allows the experimenter to construct direct estimators for existing divergence measures or to construct new divergence measures with custom properties that are tailored to the application. To examine the practical efficacy of these new methods, we evaluate them in a statistical learning framework on a series of real-world data science problems involving speech-based monitoring of neuro-motor disorders.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201

    MR görüntüleri ve MR spektroskopi verileri ile yapay öğrenme tabanlı beyin tümörü tespit yöntemi ve uygulaması

    06.03.2018 tarihli ve 30352 sayılı Resmi Gazetede yayımlanan “Yükseköğretim Kanunu İle Bazı Kanun Ve Kanun Hükmünde Kararnamelerde Değişiklik Yapılması Hakkında Kanun” ile 18.06.2018 tarihli “Lisansüstü Tezlerin Elektronik Ortamda Toplanması, Düzenlenmesi ve Erişime Açılmasına İlişkin Yönerge” gereğince tam metin erişime açılmıştır.Beyinde büyüyen ve gelişen kötü huylu tümörler son zamanlarda insan ölümlerinin en önde gelen nedenlerinden birisi olmaya başlamıştır. Beyin tümörleri için en uygun tedavi yönteminin belirlenmesi hekim tarafından tümörün türünün ve evresinin belirlenmesine bağlıdır. Beyin tümörünün tecrübeli radyologlar tarafından tam olarak teşhis edilebilmesi, Manyetik Rezonans (MR görüntüleri), MR spektroskopi verileri ve patolojik değerlendirmeleri içerisine alan karmaşık bir süreçtir. Genel olarak bir radyolog bu süreçle ilgili olarak önemli doğruluk ve hassaslıkta karar verebiliyor olsa da, hataları en aza indirebilmek için sürekli yeni yöntemler araştırılmaktadır. Bu yüzden radyolog ya da hekimlerin beyin tümörlerinin ayrımını yüksek oranda yapabilecek Bilgisayar Destekli Teşhis (Computer-Aided Detection, CAD / BDT) sistemlerinden yararlanması oldukça önemlidir. Bu tez çalışmasında, hem MR görüntüleri ile hem de MR Spektroskopi (MRS) verileri kullanarak, radyologların karar verme aşamalarında yardımcı olabilecek, beyin tümörlerinin tespitini başarılı bir şekilde yapan yeni bilgisayar destekli yaklaşımlar önerilmiştir. Tez kapsamında geliştirilen ilk yöntem MR görüntüleri üzerinde çalışmakta ve beyin tümörlerinin iyi/kötü huylu ayrımlarını görüntü işleme ve örüntü tanıma teknikleri ile gerçekleştirmektedir. Bu işlemi gerçekleştirmek amacıyla MR görüntüleri üzerinde kafatası kısmını çıkarma için yeni bir görüntü ön-işleme tekniği önerilmiştir. Ayrıca, tümör ayrımlarında sınıflandırıcı etkisini görebilmek için farklı sınıflandırıcıların başarımları kıyaslanmıştır. 188 adet MR görüntüsü üzerinde yapılan detaylı deney sonuçlarına göre, önerilen yöntem ile %96.81 doğruluk oranı ile beyin tümörlerinin iyi / kötü huylu ayrımı gerçekleştirilebilmiştir. Tez kapsamında önerilen bir diğer yöntemde ise, MR spektroskopi sinyalleri üzerinde çalışan ve Yapay Bağışıklık Sistemi (YBS) tabanlı yeni bir BDT yaklaşımı geliştirilmiştir. Önerilen yöntem ile MRS verileri kullanılarak iyi huylu / kötü huylu tümör ayrımı, beyin tümörünün evrelemesi, normal beyin dokusu ile beyin tümörünün ayrımı, metastaz beyin tümörleri ile birincil beyin tümörlerinin ayrımı ve sahte tümörlerin belirlenmesi yüksek başarımla mümkün olmuştur. Çok uluslu ve merkezli bir proje kapsamında elde edilen geniş bir veri seti ile gerçekleştirilen deney sonuçlarına göre sırasıyla %96.97, %100, %100, %98.33 ve %98.44 başarım elde edilmiştir.Malignant tumors growing and developing in the brain have recently become one of the leading causes of death in humans. Determination of the most suitable treatment for brain tumors depends on accurate detection of malignancy, type and grade of the tumor by the physician. Diagnosis of brain tumors by radiologists is a complex process which includes MR images, MR spectroscopy data and pathological assessments. Generally, a radiologist makes a decision with reasonable accuracy and specifity rates. However new methods have been investigated by the researchers to minimize the diagnosis mistakes. Therefore, it is crucial for radiologists or physicians to use a Computer-Aided Diagnosis (CAD) system which will help detection of brain tumors with high success rates. In this thesis, novel computer aided methods, which use MR images and MR Spectroscopy data, have been proposed for the detection of brain tumors to support decision process of the radiologists. The first method developed in the thesis differentiates brain tumors as benign or malignant by image processing and pattern recognition techniques on MR images. To perform this operation, a new image pre-processing technique has been proposed to strip the skull region. Moreover, to evaluate the effect of classifier performance on tumor differentiation, different classifiers have been compared. According to detailed test results performed on 188 MR images, benign or malignant differentiation of brain tumors can be detected with 96.81% accuracy rate by proposed method. In the second method, a novel Artificial Immune System (AIS) based computer-aided diagnosis system has been proposed. This system utilizes MR Spectroscopy signals to make a decision about brain tumors. The system can perform differentiation of benign / malign, metastatic / primary, pseudo / normal tumors and grading of brain tumors with high accuracy rates. According to the experimental results performed on large dataset obtained from an international and multi-center project, the detection performance has been achieved 96.97%, 100%, 100%, 98.33% and 98.44% success rates respectively

    Learning From Almost No Data

    The tremendous recent growth in the fields of artificial intelligence and machine learning has largely been tied to the availability of big data and massive amounts of compute. The increasingly popular approach of training large neural networks on large datasets has provided great returns, but it leaves behind the multitude of researchers, companies, and practitioners who do not have access to sufficient funding, compute power, or volume of data. This thesis aims to rectify this growing imbalance by probing the limits of what machine learning and deep learning methods can achieve with small data. What knowledge does a dataset contain? At the highest level, a dataset is just a collection of samples: images, text, etc. Yet somehow, when we train models on these datasets, they are able to find patterns, make inferences, detect similarities, and otherwise generalize to samples that they have previously never seen. This suggests that datasets may contain some kind of intrinsic knowledge about the systems or distributions from which they are sampled. Moreover, it appears that this knowledge is somehow distributed and duplicated across the samples; we intuitively expect that removing an image from a large training set will have virtually no impact on the final model performance. We develop a framework to explain efficient generalization around three principles: information sharing, information repackaging, and information injection. We use this framework to propose `less than one'-shot learning, an extreme form of few-shot learning where a learner must recognize N classes from M < N training examples. To achieve this extreme level of efficiency, we develop new framework-consistent methods and theory for lost data restoration, for dataset size reduction, and for few-shot learning with deep neural networks and other popular machine learning models

    Specialized IoT systems: Models, Structures, Algorithms, Hardware, Software Tools

    Монография включает анализ проблем, модели, алгоритмы и программно- аппаратные средства специализированных сетей интернета вещей. Рассмотрены результаты проектирования и моделирования сети интернета вещей, мониторинга качества продукции, анализа звуковой информации окружающей среды, а также технология выявления заболеваний легких на базе нейронных сетей. Монография предназначена для специалистов в области инфокоммуникаций, может быть полезна студентам соответствующих специальностей, слушателям факультетов повышения квалификации, магистрантам и аспирантам