4 research outputs found

    A framework for feature selection through boosting

    Get PDF
    As dimensions of datasets in predictive modelling continue to grow, feature selection becomes increasingly practical. Datasets with complex feature interactions and high levels of redundancy still present a challenge to existing feature selection methods. We propose a novel framework for feature selection that relies on boosting, or sample re-weighting, to select sets of informative features in classification problems. The method uses as its basis the feature rankings derived from fast and scalable tree-boosting models, such as XGBoost. We compare the proposed method to standard feature selection algorithms on 9 benchmark datasets. We show that the proposed approach reaches higher accuracies with fewer features on most of the tested datasets, and that the selected features have lower redundancy

    Improving the performance of a radio-frequency localization system in adverse outdoor applications

    Get PDF
    In outdoor RF localization systems, particularly where line of sight can not be guaranteed or where multipath effects are severe, information about the terrain may improve the position estimate's performance. Given the difficulties in obtaining real data, a ray-tracing fingerprint is a viable option. Nevertheless, although presenting good simulation results, the performance of systems trained with simulated features only suffer degradation when employed to process real-life data. This work intends to improve the localization accuracy when using ray-tracing fingerprints and a few field data obtained from an adverse environment where a large number of measurements is not an option. We employ a machine learning (ML) algorithm to explore the multipath information. We selected algorithms random forest and gradient boosting; both considered efficient tools in the literature. In a strict simulation scenario (simulated data for training, validating, and testing), we obtained the same good results found in the literature (error around 2 m). In a real-world system (simulated data for training, real data for validating and testing), both ML algorithms resulted in a mean positioning error around 100 ,m. We have also obtained experimental results for noisy (artificially added Gaussian noise) and mismatched (with a null subset of) features. From the simulations carried out in this work, our study revealed that enhancing the ML model with a few real-world data improves localization’s overall performance. From the machine ML algorithms employed herein, we also observed that, under noisy conditions, the random forest algorithm achieved a slightly better result than the gradient boosting algorithm. However, they achieved similar results in a mismatch experiment. This work’s practical implication is that multipath information, once rejected in old localization techniques, now represents a significant source of information whenever we have prior knowledge to train the ML algorithm

    Application of XGBoost Algorithm in Fingerprinting Localisation Task

    No full text
    Part 7: Various Aspects of Computer SecurityInternational audienceAn Indoor Positioning System (IPS) issues regression and classification challenges in form of an horizontal localisation and a floor detection. We propose to apply the XGBoost algorithm for both tasks. The algorithm uses vectors of Received Signal Strengths from Wi–Fi access points to map the obtained fingerprints into horizontal coordinates and a current floor number. The original application schema for the algorithm to create IPS was proposed. The algorithm was tested using real data from an academic building. The testing data were split into two datasets. The first data set contains signals from all observed access points. The second dataset consist of signals from the academic network infrastructure. The second dataset was created to eliminate temporary hotspots and to improve a stability of the positioning system. The tested algorithm got similar results as reference methods on the wider set of access points. On the limited set the algorithm obtained the best results
    corecore