4 research outputs found

    Challenges and Open Questions of Machine Learning in Computer Security

    Get PDF
    This habilitation thesis presents advancements in machine learning for computer security, arising from problems in network intrusion detection and steganography. The thesis put an emphasis on explanation of traits shared by steganalysis, network intrusion detection, and other security domains, which makes these domains different from computer vision, speech recognition, and other fields where machine learning is typically studied. Then, the thesis presents methods developed to at least partially solve the identified problems with an overall goal to make machine learning based intrusion detection system viable. Most of them are general in the sense that they can be used outside intrusion detection and steganalysis on problems with similar constraints. A common feature of all methods is that they are generally simple, yet surprisingly effective. According to large-scale experiments they almost always improve the prior art, which is likely caused by being tailored to security problems and designed for large volumes of data. Specifically, the thesis addresses following problems: anomaly detection with low computational and memory complexity such that efficient processing of large data is possible; multiple-instance anomaly detection improving signal-to-noise ration by classifying larger group of samples; supervised classification of tree-structured data simplifying their encoding in neural networks; clustering of structured data; supervised training with the emphasis on the precision in top p% of returned data; and finally explanation of anomalies to help humans understand the nature of anomaly and speed-up their decision. Many algorithms and method presented in this thesis are deployed in the real intrusion detection system protecting millions of computers around the globe

    Applying novel machine learning technology to optimize computer-aided detection and diagnosis of medical images

    Get PDF
    The purpose of developing Computer-Aided Detection (CAD) schemes is to assist physicians (i.e., radiologists) in interpreting medical imaging findings and reducing inter-reader variability more accurately. In developing CAD schemes, Machine Learning (ML) plays an essential role because it is widely used to identify effective image features from complex datasets and optimally integrate them with the classifiers, which aims to assist the clinicians to more accurately detect early disease, classify disease types and predict disease treatment outcome. In my dissertation, in different studies, I assess the feasibility of developing several novel CAD systems in the area of medical imaging for different purposes. The first study aims to develop and evaluate a new computer-aided diagnosis (CADx) scheme based on analysis of global mammographic image features to predict the likelihood of cases being malignant. CADx scheme is applied to pre-process mammograms, generate two image maps in the frequency domain using discrete cosine transform and fast Fourier transform, compute bilateral image feature differences from left and right breasts, and apply a support vector machine (SVM) method to predict the likelihood of the case being malignant. This study demonstrates the feasibility of developing a new global image feature analysis based CADx scheme of mammograms with high performance. This new CADx approach is more efficient in development and potentially more robust in future applications by avoiding difficulty and possible errors in breast lesion segmentation. In the second study, to automatically identify a set of effective mammographic image features and build an optimal breast cancer risk stratification model, I investigate advantages of applying a machine learning approach embedded with a locally preserving projection (LPP) based feature combination and regeneration algorithm to predict short-term breast cancer risk. To this purpose, a computer-aided image processing scheme is applied to segment fibro-glandular tissue depicted on mammograms and initially compute 44 features related to the bilateral asymmetry of mammographic tissue density distribution between left and right breasts. Next, an embedded LLP algorithm optimizes the feature space and regenerates a new operational vector with 4 features using a maximal variance approach. This study demonstrates that applying the LPP algorithm effectively reduces feature dimensionality, and yields higher and potentially more robust performance in predicting short-term breast cancer risk. In the third study, to more precisely classify malignant lesions, I investigate the feasibility of applying a random projection algorithm to build an optimal feature vector from the initially CAD-generated large feature pool and improve the performance of the machine learning model. In this process, a CAD scheme is first applied to segment mass regions and initially compute 181 features. An SVM model embedded with the feature dimensionality reduction method is then built to predict the likelihood of lesions being malignant. This study demonstrates that the random project algorithm is a promising method to generate optimal feature vectors to improve the performance of machine learning models of medical images. The last study aims to develop and test a new CAD scheme of chest X-ray images to detect coronavirus (COVID-19) infected pneumonia. To this purpose, the CAD scheme first applies two image preprocessing steps to remove the majority of diaphragm regions, process the original image using a histogram equalization algorithm, and a bilateral low-pass filter. Then, the original image and two filtered images are used to form a pseudo color image. This image is fed into three input channels of a transfer learning-based convolutional neural network (CNN) model to classify chest X-ray images into 3 classes of COVID-19 infected pneumonia, other community-acquired no-COVID-19 infected pneumonia, and normal (non-pneumonia) cases. This study demonstrates that adding two image preprocessing steps and generating a pseudo color image plays an essential role in developing a deep learning CAD scheme of chest X-ray images to improve accuracy in detecting COVID-19 infected pneumonia. In summary, I developed and presented several image pre-processing algorithms, feature extraction methods, and data optimization techniques to present innovative approaches for quantitative imaging markers based on machine learning systems in all these studies. The studies' simulation and results show the discriminative performance of the proposed CAD schemes on different application fields helpful to assist radiologists on their assessments in diagnosing disease and improve their overall performance

    Image and Video Forensics

    Get PDF
    Nowadays, images and videos have become the main modalities of information being exchanged in everyday life, and their pervasiveness has led the image forensics community to question their reliability, integrity, confidentiality, and security. Multimedia contents are generated in many different ways through the use of consumer electronics and high-quality digital imaging devices, such as smartphones, digital cameras, tablets, and wearable and IoT devices. The ever-increasing convenience of image acquisition has facilitated instant distribution and sharing of digital images on digital social platforms, determining a great amount of exchange data. Moreover, the pervasiveness of powerful image editing tools has allowed the manipulation of digital images for malicious or criminal ends, up to the creation of synthesized images and videos with the use of deep learning techniques. In response to these threats, the multimedia forensics community has produced major research efforts regarding the identification of the source and the detection of manipulation. In all cases (e.g., forensic investigations, fake news debunking, information warfare, and cyberattacks) where images and videos serve as critical evidence, forensic technologies that help to determine the origin, authenticity, and integrity of multimedia content can become essential tools. This book aims to collect a diverse and complementary set of articles that demonstrate new developments and applications in image and video forensics to tackle new and serious challenges to ensure media authenticity
    corecore