Search CORE

20 research outputs found

On The Reliability Of Machine Learning Applications In Manufacturing Environments

Author: Biegel Tobias
Garcia-Ceja Enrique
Husom Erik Johannes
Jourdan Nicolas
Metternich Joachim
Sen Sagar
Publication venue: Neural Information Processing Systems
Publication date: 01/01/2021
Field of study

The increasing deployment of advanced digital technologies such as Internet of Things (IoT) devices and Cyber-Physical Systems (CPS) in industrial environments is enabling the productive use of machine learning (ML) algorithms in the manufacturing domain. As ML applications transcend from research to productive use in real-world industrial environments, the question of reliability arises. Since the majority of ML models are trained and evaluated on static datasets, continuous online monitoring of their performance is required to build reliable systems. Furthermore, concept and sensor drift can lead to degrading accuracy of the algorithm over time, thus compromising safety, acceptance and economics if undetected and not properly addressed. In this work, we exemplarily highlight the severity of the issue on a publicly available industrial dataset which was recorded over the course of 36 months and explain possible sources of drift. We assess the robustness of ML algorithms commonly used in manufacturing and show, that the accuracy strongly declines with increasing drift for all tested algorithms. We further investigate how uncertainty estimation may be leveraged for online performance estimation as well as drift detection as a first step towards continually learning applications. The results indicate, that ensemble algorithms like random forests show the least decay of confidence calibration under drift.publishedVersio

arXiv.org e-Print Archive

TUbiblio

SINTEF Open

NORA - Norwegian Open Research Archives

A classification approach for detecting cross-lingual biomedical term translations

Author: Bollegala D
Hakami H
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2017
Field of study

University of Liverpool Repository

Tagvisor: A Privacy Advisor for Sharing Hashtags

Author: Backes Michael
Humbert Mathias
Li Cheng-Te
Pang Jun
Rahman Tahleen A.
Zhang Yang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

Hashtag has emerged as a widely used concept of popular culture and campaigns, but its implications on people's privacy have not been investigated so far. In this paper, we present the first systematic analysis of privacy issues induced by hashtags. We concentrate in particular on location, which is recognized as one of the key privacy concerns in the Internet era. By relying on a random forest model, we show that we can infer a user's precise location from hashtags with accuracy of 70% to 76%, depending on the city. To remedy this situation, we introduce a system called Tagvisor that systematically suggests alternative hashtags if the user-selected ones constitute a threat to location privacy. Tagvisor realizes this by means of three conceptually different obfuscation techniques and a semantics-based metric for measuring the consequent utility loss. Our findings show that obfuscating as little as two hashtags already provides a near-optimal trade-off between privacy and utility in our dataset. This in particular renders Tagvisor highly time-efficient, and thus, practical in real-world settings

Infoscience - École polytechnique fédérale de Lausanne

arXiv.org e-Print Archive

Repository for Publications and Research Data

CISPA – Helmholtz-Zentrum für Informationssicherheit

Crossref

Serveur académique lausannois

Open Repository and Bibliography - Luxembourg

Design of Probabilistic Random Forests with Applications to Anticancer Drug Sensitivity Prediction

Author: Gosh Souparno
Haider Saad
Pal Ranadip
Rahman Raziur
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 01/01/2015
Field of study

Random forests consisting of an ensemble of regression trees with equal weights are frequently used for design of predictive models. In this article, we consider an extension of the methodology by representing the regression trees in the form of probabilistic trees and analyzing the nature of heteroscedasticity. The probabilistic tree representation allows for analytical computation of confidence intervals (CIs), and the tree weight optimization is expected to provide stricter CIs with comparable performance in mean error. We approached the ensemble of probabilistic trees’ prediction from the perspectives of a mixture distribution and as a weighted sum of correlated random variables. We applied our methodology to the drug sensitivity predic- tion problem on synthetic and cancer cell line encyclopedia dataset and illustrated that tree weights can be selected to reduce the average length of the CI without increase in mean error

DigitalCommons@University of Nebraska

Design of Probabilistic Random Forests with Applications to Anticancer Drug Sensitivity Prediction- 2016

Author: Ghosh Souparno
Haider Saad
Pal Ranadip
Rahman Raziur
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 01/01/2016
Field of study

DigitalCommons@University of Nebraska

Design of Probabilistic Random Forests with Applications to Anticancer Drug Sensitivity Prediction- 2016

Author: Ghosh Souparno
Haider Saad
Pal Ranadip
Rahman Raziur
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 01/01/2016
Field of study

Estimating habitat extent and carbon loss from an eroded northern blanket bog using UAV derived imagery and topography

Author: Carrasco L.
Dodd B.
McShane G.
Monteith D.
Morton D.
Rose R.
Rowland C.
Scholefield P.
Tebbs E.
Whitfield M.G.
Wood C.
Publication venue: 'SAGE Publications'
Publication date: 01/04/2019
Field of study

Peatlands are important reserves of terrestrial carbon and biodiversity, and given that many peatlands across the UK and Europe exist in a degraded state, their conservation is a major area of concern and a focus of considerable research. Aerial surveys are valuable tools for habitat mapping and conservation and provide useful insights into their condition. We investigate how SfM photogrammetry-derived topography and habitat classes may be used to construct an estimate of carbon loss from erosion features in a remote blanket bog habitat. An autonomous, unmanned, aerial, fixed-wing remote sensing platform (Quest UAV 300™) collected imagery over Moor House, in the Upper Teesdale National Nature Reserve, a site with a high degree of peatland erosion. The images were used to generate point clouds into orthomosaics and digital surface models using SfM photogrammetry techniques, georeferenced and subsequently used to classify vegetation and peatland features. A classification of peatbog feature types was developed using a random forest classification model trained on field survey data and applied to UAV-captured products including the orthomosaic, digital surface model and derived surfaces such as topographic index, slope and aspect maps. Using the area classified as eroded peat and the derived digital surface model, we estimated a loss of 438 tonnes of carbon from a single gully. The UAV system was relatively straightforward to deploy in such a remote and unimproved area. SfM photogrammetry, imagery and random forest modelling obtained classification accuracies of between 42% and 100%, and was able to discern between bare peat, saturated bog and sphagnum habitats. This paper shows what can be achieved with low-cost UAVs equipped with consumer grade camera equipment and relatively straightforward ground control, and demonstrates their potential for the carbon and peatland conservation research community

Lancaster E-Prints

King's Research Portal

NERC Open Research Archive

Finding Respondents in the Forest: A Comparison of Logistic Regression and Random Forest Models for Response Propensity Weighting and Stratification

Author: Buskirk Trent D.
Kolenikov Stanislav
Publication venue: DEU
Publication date: 01/01/2015
Field of study

Survey response rates for modern surveys using many different modes are trending downward leaving the potential for nonresponse biases in estimates derived from using only the respondents. The reasons for nonresponse may be complex functions of known auxiliary variables or unknown latent variables not measured by practitioners. The degree to which the propensity to respond is associated with survey outcomes casts light on the overall potential for nonresponse biases for estimates of means and totals. The most common method for nonresponse adjustments to compensate for the potential bias in estimates has been logistic and probit regression models. However, for more complex nonresponse mechanisms that may be nonlinear or involve many interaction effects, these methods may fail to converge and thus fail to generate nonresponse adjustments for the sampling weights. In this paper we compare these traditional techniques to a relatively new data mining technique- random forests – under a simple and complex nonresponse propensity population model using both direct and propensity stratification nonresponse adjustments. Random forests appear to offer marginal improvements for the complex response model over logistic regression in direct propensity adjustment, but have some surprising results for propensity stratification across both response models

SSOAR - Social Science Open Access Repository