1,136 research outputs found

    Spectral Clustering with Imbalanced Data

    Full text link
    Spectral clustering is sensitive to how graphs are constructed from data particularly when proximal and imbalanced clusters are present. We show that Ratio-Cut (RCut) or normalized cut (NCut) objectives are not tailored to imbalanced data since they tend to emphasize cut sizes over cut values. We propose a graph partitioning problem that seeks minimum cut partitions under minimum size constraints on partitions to deal with imbalanced data. Our approach parameterizes a family of graphs, by adaptively modulating node degrees on a fixed node set, to yield a set of parameter dependent cuts reflecting varying levels of imbalance. The solution to our problem is then obtained by optimizing over these parameters. We present rigorous limit cut analysis results to justify our approach. We demonstrate the superiority of our method through unsupervised and semi-supervised experiments on synthetic and real data sets.Comment: 24 pages, 7 figures. arXiv admin note: substantial text overlap with arXiv:1302.513

    Clustering and Community Detection with Imbalanced Clusters

    Full text link
    Spectral clustering methods which are frequently used in clustering and community detection applications are sensitive to the specific graph constructions particularly when imbalanced clusters are present. We show that ratio cut (RCut) or normalized cut (NCut) objectives are not tailored to imbalanced cluster sizes since they tend to emphasize cut sizes over cut values. We propose a graph partitioning problem that seeks minimum cut partitions under minimum size constraints on partitions to deal with imbalanced cluster sizes. Our approach parameterizes a family of graphs by adaptively modulating node degrees on a fixed node set, yielding a set of parameter dependent cuts reflecting varying levels of imbalance. The solution to our problem is then obtained by optimizing over these parameters. We present rigorous limit cut analysis results to justify our approach and demonstrate the superiority of our method through experiments on synthetic and real datasets for data clustering, semi-supervised learning and community detection.Comment: Extended version of arXiv:1309.2303 with new applications. Accepted to IEEE TSIP

    Deep learning detection of types of water-bodies using optical variables and ensembling

    Get PDF
    Water features are one of the most crucial environmental elements for strengthening climate-change adaptation. Remote sensing (RS) technologies driven by artificial intelligence (AI) have emerged as one of the most sought-after approaches for automating water information extraction and indeed. In this paper, a stacked ensemble model approach is proposed on AquaSat dataset (more than 500,000 images collection via satellite and Google Earth Engine). A one-way Analysis of variance (ANOVA) test and the Kruskal Wallis test are conducted for various optical-based variables at 99% significance level to understand how these vary for different water bodies. An oversampling is done on the training data using Synthetic Minority Oversampling Technique (SMOTE) to solve the problem of class imbalance while the model is tested on an imbalanced data, replicating the real-life situation. To enhance state-of-the-art, the pros of standalone machine learning classifiers and neural networks have been utilized. The stacked model obtained 100% accuracy on the testing data when using the decision tree classifier as the meta model. This study has been cross validated five-fold and will help researchers working in in-situ water bodies detection with the use of stacked model classification

    Object Detection in 20 Years: A Survey

    Full text link
    Object detection, as of one the most fundamental and challenging problems in computer vision, has received great attention in recent years. Its development in the past two decades can be regarded as an epitome of computer vision history. If we think of today's object detection as a technical aesthetics under the power of deep learning, then turning back the clock 20 years we would witness the wisdom of cold weapon era. This paper extensively reviews 400+ papers of object detection in the light of its technical evolution, spanning over a quarter-century's time (from the 1990s to 2019). A number of topics have been covered in this paper, including the milestone detectors in history, detection datasets, metrics, fundamental building blocks of the detection system, speed up techniques, and the recent state of the art detection methods. This paper also reviews some important detection applications, such as pedestrian detection, face detection, text detection, etc, and makes an in-deep analysis of their challenges as well as technical improvements in recent years.Comment: This work has been submitted to the IEEE TPAMI for possible publicatio

    Realistic adversarial machine learning to improve network intrusion detection

    Get PDF
    Modern organizations can significantly benefit from the use of Artificial Intelligence (AI), and more specifically Machine Learning (ML), to tackle the growing number and increasing sophistication of cyber-attacks targeting their business processes. However, there are several technological and ethical challenges that undermine the trustworthiness of AI. One of the main challenges is the lack of robustness, which is an essential property to ensure that ML is used in a secure way. Improving robustness is no easy task because ML is inherently susceptible to adversarial examples: data samples with subtle perturbations that cause unexpected behaviors in ML models. ML engineers and security practitioners still lack the knowledge and tools to prevent such disruptions, so adversarial examples pose a major threat to ML and to the intelligent Network Intrusion Detection (NID) systems that rely on it. This thesis presents a methodology for a trustworthy adversarial robustness analysis of multiple ML models, and an intelligent method for the generation of realistic adversarial examples in complex tabular data domains like the NID domain: Adaptative Perturbation Pattern Method (A2PM). It is demonstrated that a successful adversarial attack is not guaranteed to be a successful cyber-attack, and that adversarial data perturbations can only be realistic if they are simultaneously valid and coherent, complying with the domain constraints of a real communication network and the class-specific constraints of a certain cyber-attack class. A2PM can be used for adversarial attacks, to iteratively cause misclassifications, and adversarial training, to perform data augmentation with slightly perturbed data samples. Two case studies were conducted to evaluate its suitability for the NID domain. The first verified that the generated perturbations preserved both validity and coherence in Enterprise and Internet-of Things (IoT) network scenarios, achieving realism. The second verified that adversarial training with simple perturbations enables the models to retain a good generalization to regular IoT network traffic flows, in addition to being more robust to adversarial examples. The key takeaway of this thesis is: ML models can be incredibly valuable to improve a cybersecurity system, but their own vulnerabilities must not be disregarded. It is essential to continue the research efforts to improve the security and trustworthiness of ML and of the intelligent systems that rely on it.Organizações modernas podem beneficiar significativamente do uso de Inteligência Artificial (AI), e mais especificamente Aprendizagem Automática (ML), para enfrentar a crescente quantidade e sofisticação de ciberataques direcionados aos seus processos de negócio. No entanto, há vários desafios tecnológicos e éticos que comprometem a confiabilidade da AI. Um dos maiores desafios é a falta de robustez, que é uma propriedade essencial para garantir que se usa ML de forma segura. Melhorar a robustez não é uma tarefa fácil porque ML é inerentemente suscetível a exemplos adversos: amostras de dados com perturbações subtis que causam comportamentos inesperados em modelos ML. Engenheiros de ML e profissionais de segurança ainda não têm o conhecimento nem asferramentas necessárias para prevenir tais disrupções, por isso os exemplos adversos representam uma grande ameaça a ML e aos sistemas de Deteção de Intrusões de Rede (NID) que dependem de ML. Esta tese apresenta uma metodologia para uma análise da robustez de múltiplos modelos ML, e um método inteligente para a geração de exemplos adversos realistas em domínios de dados tabulares complexos como o domínio NID: Método de Perturbação com Padrões Adaptativos (A2PM). É demonstrado que um ataque adverso bem-sucedido não é garantidamente um ciberataque bem-sucedido, e que as perturbações adversas só são realistas se forem simultaneamente válidas e coerentes, cumprindo as restrições de domínio de uma rede de computadores real e as restrições específicas de uma certa classe de ciberataque. A2PM pode ser usado para ataques adversos, para iterativamente causar erros de classificação, e para treino adverso, para realizar aumento de dados com amostras ligeiramente perturbadas. Foram efetuados dois casos de estudo para avaliar a sua adequação ao domínio NID. O primeiro verificou que as perturbações preservaram tanto a validade como a coerência em cenários de redes Empresariais e Internet-das-Coisas (IoT), alcançando o realismo. O segundo verificou que o treino adverso com perturbações simples permitiu aos modelos reter uma boa generalização a fluxos de tráfego de rede IoT, para além de serem mais robustos contra exemplos adversos. A principal conclusão desta tese é: os modelos ML podem ser incrivelmente valiosos para melhorar um sistema de cibersegurança, mas as suas próprias vulnerabilidades não devem ser negligenciadas. É essencial continuar os esforços de investigação para melhorar a segurança e a confiabilidade de ML e dos sistemas inteligentes que dependem de ML

    WayFAST: Navigation with Predictive Traversability in the Field

    Full text link
    We present a self-supervised approach for learning to predict traversable paths for wheeled mobile robots that require good traction to navigate. Our algorithm, termed WayFAST (Waypoint Free Autonomous Systems for Traversability), uses RGB and depth data, along with navigation experience, to autonomously generate traversable paths in outdoor unstructured environments. Our key inspiration is that traction can be estimated for rolling robots using kinodynamic models. Using traction estimates provided by an online receding horizon estimator, we are able to train a traversability prediction neural network in a self-supervised manner, without requiring heuristics utilized by previous methods. We demonstrate the effectiveness of WayFAST through extensive field testing in varying environments, ranging from sandy dry beaches to forest canopies and snow covered grass fields. Our results clearly demonstrate that WayFAST can learn to avoid geometric obstacles as well as untraversable terrain, such as snow, which would be difficult to avoid with sensors that provide only geometric data, such as LiDAR. Furthermore, we show that our training pipeline based on online traction estimates is more data-efficient than other heuristic-based methods.Comment: Project website with code and videos: https://mateusgasparino.com/wayfast-traversability-navigation/ Published in the IEEE Robotics and Automation Letters (RA-L, 2022) Accepted for presentation in the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2022

    Unresolved Object Detection Using Synthetic Data Generation and Artificial Neural Networks

    Get PDF
    This research presents and solves constrained real-world problems of using synthetic data to train artificial neural networks (ANNs) to detect unresolved moving objects in wide field of view (WFOV) electro-optical/infrared (EO/IR) satellite motion imagery. Objectives include demonstrating the use of the Air Force Institute of Technology (AFIT) Sensor and Scene Emulation Tool (ASSET) as an effective tool for generating EO/IR motion imagery representative of real WFOV sensors and describing the ANN architectures, training, and testing results obtained. Deep learning using a 3-D convolutional neural network (3D ConvNet), long short term memory (LSTM) network, and U-Net are used to solve the problem of EO/IR unresolved object detection. U-Net is shown to be a promising ANN architecture for performing EO/IR unresolved object detection. In two of the experiments, U-Net achieved 90% and 88% pixel prediction accuracy. In addition, the results show ASSET is capable of generating sufficient information needed to train deep learning models

    DIAmante TESS AutoRegressive Planet Search (DTARPS): I. Analysis of 0.9 Million Light Curves

    Full text link
    Nearly one million light curves from the TESS Year 1 southern hemisphere extracted from Full Frame Images with the DIAmante pipeline are processed through the AutoRegressive Planet Search statistical procedure. ARIMA models remove trends and lingering autocorrelated noise, the Transit Comb Filter identifies the strongest periodic signal in the light curve, and a Random Forest machine learning classifier is trained and applied to identify the best potential candidates. Classifier training sets include injections of both planetary transit signals and contaminating eclipsing binaries. The optimized classifier has a True Positive Rate of 92.8% and a False Positive Rate of 0.37% from the labeled training set. The result of this DIAmante TESS autoregressive planet search (DTARPS) analysis is a list of 7,377 potential exoplanet candidates. The classifier has a False Positive Rate of 0.3%, a 64% recall rate for previously confirmed exoplanets, and a 78% negative recall rate for known False Positives. The completeness map of the injected planetary signals shows high recall rates for planets with 8 - 30 R(Earth) radii and periods 0.6-13 days and poor completeness for planets with radii < 2 R(Earth) or periods < 1 day. The list has many False Alarms and False Positives that need to be culled with multifaceted vetting operations (Paper II).Comment: 46 pages, 21 figures, submitted to AAS Journals. A Machine Readable Table for Table 3 is available at https://drive.google.com/drive/folders/1DyxNcNlfcHHAoCdsaipxxIbP5A2FPey
    corecore