20 research outputs found

    On The Reliability Of Machine Learning Applications In Manufacturing Environments

    Get PDF
    The increasing deployment of advanced digital technologies such as Internet of Things (IoT) devices and Cyber-Physical Systems (CPS) in industrial environments is enabling the productive use of machine learning (ML) algorithms in the manufacturing domain. As ML applications transcend from research to productive use in real-world industrial environments, the question of reliability arises. Since the majority of ML models are trained and evaluated on static datasets, continuous online monitoring of their performance is required to build reliable systems. Furthermore, concept and sensor drift can lead to degrading accuracy of the algorithm over time, thus compromising safety, acceptance and economics if undetected and not properly addressed. In this work, we exemplarily highlight the severity of the issue on a publicly available industrial dataset which was recorded over the course of 36 months and explain possible sources of drift. We assess the robustness of ML algorithms commonly used in manufacturing and show, that the accuracy strongly declines with increasing drift for all tested algorithms. We further investigate how uncertainty estimation may be leveraged for online performance estimation as well as drift detection as a first step towards continually learning applications. The results indicate, that ensemble algorithms like random forests show the least decay of confidence calibration under drift.publishedVersio

    A classification approach for detecting cross-lingual biomedical term translations

    Get PDF

    Tagvisor: A Privacy Advisor for Sharing Hashtags

    Get PDF
    Hashtag has emerged as a widely used concept of popular culture and campaigns, but its implications on people's privacy have not been investigated so far. In this paper, we present the first systematic analysis of privacy issues induced by hashtags. We concentrate in particular on location, which is recognized as one of the key privacy concerns in the Internet era. By relying on a random forest model, we show that we can infer a user's precise location from hashtags with accuracy of 70% to 76%, depending on the city. To remedy this situation, we introduce a system called Tagvisor that systematically suggests alternative hashtags if the user-selected ones constitute a threat to location privacy. Tagvisor realizes this by means of three conceptually different obfuscation techniques and a semantics-based metric for measuring the consequent utility loss. Our findings show that obfuscating as little as two hashtags already provides a near-optimal trade-off between privacy and utility in our dataset. This in particular renders Tagvisor highly time-efficient, and thus, practical in real-world settings

    Design of Probabilistic Random Forests with Applications to Anticancer Drug Sensitivity Prediction

    Get PDF
    Random forests consisting of an ensemble of regression trees with equal weights are frequently used for design of predictive models. In this article, we consider an extension of the methodology by representing the regression trees in the form of probabilistic trees and analyzing the nature of heteroscedasticity. The probabilistic tree representation allows for analytical computation of confidence intervals (CIs), and the tree weight optimization is expected to provide stricter CIs with comparable performance in mean error. We approached the ensemble of probabilistic trees’ prediction from the perspectives of a mixture distribution and as a weighted sum of correlated random variables. We applied our methodology to the drug sensitivity predic- tion problem on synthetic and cancer cell line encyclopedia dataset and illustrated that tree weights can be selected to reduce the average length of the CI without increase in mean error

    Design of Probabilistic Random Forests with Applications to Anticancer Drug Sensitivity Prediction- 2016

    Get PDF
    Random forests consisting of an ensemble of regression trees with equal weights are frequently used for design of predictive models. In this article, we consider an extension of the methodology by representing the regression trees in the form of probabilistic trees and analyzing the nature of heteroscedasticity. The probabilistic tree representation allows for analytical computation of confidence intervals (CIs), and the tree weight optimization is expected to provide stricter CIs with comparable performance in mean error. We approached the ensemble of probabilistic trees’ prediction from the perspectives of a mixture distribution and as a weighted sum of correlated random variables. We applied our methodology to the drug sensitivity predic- tion problem on synthetic and cancer cell line encyclopedia dataset and illustrated that tree weights can be selected to reduce the average length of the CI without increase in mean error

    Design of Probabilistic Random Forests with Applications to Anticancer Drug Sensitivity Prediction- 2016

    Get PDF
    Random forests consisting of an ensemble of regression trees with equal weights are frequently used for design of predictive models. In this article, we consider an extension of the methodology by representing the regression trees in the form of probabilistic trees and analyzing the nature of heteroscedasticity. The probabilistic tree representation allows for analytical computation of confidence intervals (CIs), and the tree weight optimization is expected to provide stricter CIs with comparable performance in mean error. We approached the ensemble of probabilistic trees’ prediction from the perspectives of a mixture distribution and as a weighted sum of correlated random variables. We applied our methodology to the drug sensitivity predic- tion problem on synthetic and cancer cell line encyclopedia dataset and illustrated that tree weights can be selected to reduce the average length of the CI without increase in mean error

    Estimating habitat extent and carbon loss from an eroded northern blanket bog using UAV derived imagery and topography

    Get PDF
    Peatlands are important reserves of terrestrial carbon and biodiversity, and given that many peatlands across the UK and Europe exist in a degraded state, their conservation is a major area of concern and a focus of considerable research. Aerial surveys are valuable tools for habitat mapping and conservation and provide useful insights into their condition. We investigate how SfM photogrammetry-derived topography and habitat classes may be used to construct an estimate of carbon loss from erosion features in a remote blanket bog habitat. An autonomous, unmanned, aerial, fixed-wing remote sensing platform (Quest UAV 300™) collected imagery over Moor House, in the Upper Teesdale National Nature Reserve, a site with a high degree of peatland erosion. The images were used to generate point clouds into orthomosaics and digital surface models using SfM photogrammetry techniques, georeferenced and subsequently used to classify vegetation and peatland features. A classification of peatbog feature types was developed using a random forest classification model trained on field survey data and applied to UAV-captured products including the orthomosaic, digital surface model and derived surfaces such as topographic index, slope and aspect maps. Using the area classified as eroded peat and the derived digital surface model, we estimated a loss of 438 tonnes of carbon from a single gully. The UAV system was relatively straightforward to deploy in such a remote and unimproved area. SfM photogrammetry, imagery and random forest modelling obtained classification accuracies of between 42% and 100%, and was able to discern between bare peat, saturated bog and sphagnum habitats. This paper shows what can be achieved with low-cost UAVs equipped with consumer grade camera equipment and relatively straightforward ground control, and demonstrates their potential for the carbon and peatland conservation research community

    Finding Respondents in the Forest: A Comparison of Logistic Regression and Random Forest Models for Response Propensity Weighting and Stratification

    Get PDF
    Survey response rates for modern surveys using many different modes are trending downward leaving the potential for nonresponse biases in estimates derived from using only the respondents. The reasons for nonresponse may be complex functions of known auxiliary variables or unknown latent variables not measured by practitioners. The degree to which the propensity to respond is associated with survey outcomes casts light on the overall potential for nonresponse biases for estimates of means and totals. The most common method for nonresponse adjustments to compensate for the potential bias in estimates has been logistic and probit regression models. However, for more complex nonresponse mechanisms that may be nonlinear or involve many interaction effects, these methods may fail to converge and thus fail to generate nonresponse adjustments for the sampling weights. In this paper we compare these traditional techniques to a relatively new data mining technique- random forests – under a simple and complex nonresponse propensity population model using both direct and propensity stratification nonresponse adjustments. Random forests appear to offer marginal improvements for the complex response model over logistic regression in direct propensity adjustment, but have some surprising results for propensity stratification across both response models
    corecore