5,946 research outputs found

    A Contextual-Bandit Approach to Personalized News Article Recommendation

    Full text link
    Personalized web services strive to adapt their services (advertisements, news articles, etc) to individual users by making use of both content and user information. Despite a few recent advances, this problem remains challenging for at least two reasons. First, web service is featured with dynamically changing pools of content, rendering traditional collaborative filtering methods inapplicable. Second, the scale of most web services of practical interest calls for solutions that are both fast in learning and computation. In this work, we model personalized recommendation of news articles as a contextual bandit problem, a principled approach in which a learning algorithm sequentially selects articles to serve users based on contextual information about the users and articles, while simultaneously adapting its article-selection strategy based on user-click feedback to maximize total user clicks. The contributions of this work are three-fold. First, we propose a new, general contextual bandit algorithm that is computationally efficient and well motivated from learning theory. Second, we argue that any bandit algorithm can be reliably evaluated offline using previously recorded random traffic. Finally, using this offline evaluation method, we successfully applied our new algorithm to a Yahoo! Front Page Today Module dataset containing over 33 million events. Results showed a 12.5% click lift compared to a standard context-free bandit algorithm, and the advantage becomes even greater when data gets more scarce.Comment: 10 pages, 5 figure

    A Clustering System for Dynamic Data Streams Based on Metaheuristic Optimisation

    Get PDF
    This article presents the Optimised Stream clustering algorithm (OpStream), a novel approach to cluster dynamic data streams. The proposed system displays desirable features, such as a low number of parameters and good scalability capabilities to both high-dimensional data and numbers of clusters in the dataset, and it is based on a hybrid structure using deterministic clustering methods and stochastic optimisation approaches to optimally centre the clusters. Similar to other state-of-the-art methods available in the literature, it uses “microclusters” and other established techniques, such as density based clustering. Unlike other methods, it makes use of metaheuristic optimisation to maximise performances during the initialisation phase, which precedes the classic online phase. Experimental results show that OpStream outperforms the state-of-the-art methods in several cases, and it is always competitive against other comparison algorithms regardless of the chosen optimisation method. Three variants of OpStream, each coming with a different optimisation algorithm, are presented in this study. A thorough sensitive analysis is performed by using the best variant to point out OpStream’s robustness to noise and resiliency to parameter change

    Resampling: an improvement of Importance Sampling in varying population size models

    Get PDF
    Sequential importance sampling algorithms have been defined to estimate likelihoods in models of ancestral population processes. However, these algorithms are based on features of the models with constant population size, and become inefficient when the population size varies in time, making likelihood-based inferences difficult in many demographic situations. In this work, we modify a previous sequential importance sampling algorithm to improve the efficiency of the likelihood estimation. Our procedure is still based on features of the model with constant size, but uses a resampling technique with a new resampling probability distribution depending on the pairwise composite likelihood. We tested our algorithm, called sequential importance sampling with resampling (SISR) on simulated data sets under different demographic cases. In most cases, we divided the computational cost by two for the same accuracy of inference, in some cases even by one hundred. This study provides the first assessment of the impact of such resampling techniques on parameter inference using sequential importance sampling, and extends the range of situations where likelihood inferences can be easily performed

    Development of a breakthrough cancer pain assessment tool

    Get PDF
    Breakthrough cancer pain (BTCP) is a type of pain characterised by transient pain exacerbations on the background of stable and well-controlled pain. It is a significant problem in cancer patients, however, there are no fully validated diagnostic or measurement instruments to identify and assess this type of pain. The aim of this study was to develop and validate a clinical tool to diagnose and quantify BTCP. This study consisted of two stages. Stage one involved the development of a BTCP diagnostic algorithm, which was tested for diagnostic accuracy in 135 cancer patients. The ‘gold-standard’ BTCP diagnostic test for comparison was a comprehensive clinical assessment with a cancer pain expert. The sensitivity of the diagnostic algorithm to detect ‘true cases’ of BTCP was 0.54 (i.e. 54% of expert diagnosed BTCP cases screened positively), specificity 0.78 (78% of non-BTCP patients screened negatively), positive predictive value 0.84 (84% of cases that screened positively had the condition of BTCP), and negative predictive value 0.60 (60% of those that screened negatively did not have the condition). Stage two involved the development of a BTCP measurement instrument from first principles according to international standards. This instrument was then tested on 100 BTCP patients to assess for measurement properties of validity, reliability, responsiveness and acceptability. Reliability testing confirmed that there was an acceptable degree of measurement error. Validity testing confirmed two underlying BTCP dimensions in the instrument. All items and summary scores correlated appropriately with external measures of BTCP. The instrument demonstrated responsiveness by correlating with the patient impression of change and clinical measures of change. In summary, this is the first measurement instrument with robust validity and reliability data for the clinical diagnosis and quantification of BTCP. The measurement instrument met all required standards to recommend its general use however, the diagnostic tool had a lower than expected ability to detect ‘true’ cases of BTCP. The clinical implications of this study mean that once BTCP has been identified the measurement tool could be used to quantify the severity of BTCP, and monitor BTCP experience over time

    Optimal treatment allocations in space and time for on-line control of an emerging infectious disease

    Get PDF
    A key component in controlling the spread of an epidemic is deciding where, whenand to whom to apply an intervention.We develop a framework for using data to informthese decisionsin realtime.We formalize a treatment allocation strategy as a sequence of functions, oneper treatment period, that map up-to-date information on the spread of an infectious diseaseto a subset of locations where treatment should be allocated. An optimal allocation strategyoptimizes some cumulative outcome, e.g. the number of uninfected locations, the geographicfootprint of the disease or the cost of the epidemic. Estimation of an optimal allocation strategyfor an emerging infectious disease is challenging because spatial proximity induces interferencebetween locations, the number of possible allocations is exponential in the number oflocations, and because disease dynamics and intervention effectiveness are unknown at outbreak.We derive a Bayesian on-line estimator of the optimal allocation strategy that combinessimulation–optimization with Thompson sampling.The estimator proposed performs favourablyin simulation experiments. This work is motivated by and illustrated using data on the spread ofwhite nose syndrome, which is a highly fatal infectious disease devastating bat populations inNorth America

    A New K means Grey Wolf Algorithm for Engineering Problems

    Full text link
    Purpose: The development of metaheuristic algorithms has increased by researchers to use them extensively in the field of business, science, and engineering. One of the common metaheuristic optimization algorithms is called Grey Wolf Optimization (GWO). The algorithm works based on imitation of the wolves' searching and the process of attacking grey wolves. The main purpose of this paper to overcome the GWO problem which is trapping into local optima. Design or Methodology or Approach: In this paper, the K-means clustering algorithm is used to enhance the performance of the original Grey Wolf Optimization by dividing the population into different parts. The proposed algorithm is called K-means clustering Grey Wolf Optimization (KMGWO). Findings: Results illustrate the efficiency of KMGWO is superior to GWO. To evaluate the performance of the KMGWO, KMGWO applied to solve 10 CEC2019 benchmark test functions. Results prove that KMGWO is better compared to GWO. KMGWO is also compared to Cat Swarm Optimization (CSO), Whale Optimization Algorithm-Bat Algorithm (WOA-BAT), and WOA, so, KMGWO achieves the first rank in terms of performance. Statistical results proved that KMGWO achieved a higher significant value compared to the compared algorithms. Also, the KMGWO is used to solve a pressure vessel design problem and it has outperformed results. Originality/value: Results prove that KMGWO is superior to GWO. KMGWO is also compared to cat swarm optimization (CSO), whale optimization algorithm-bat algorithm (WOA-BAT), WOA, and GWO so KMGWO achieved the first rank in terms of performance. Also, the KMGWO is used to solve a classical engineering problem and it is superiorComment: 15 pages. World Journal of Engineering, 202

    Decision Support System for Bat Identification using Random Forest and C5.0

    Get PDF
    Morphometric and morphological bat identification are a conventional method of identification and requires precision, significant experience, and encyclopedic knowledge. Morphological features of a species may sometimes similar to that of another species and this causes several problems for the beginners working with bat taxonomy. The purpose of the study was to implement and conduct the random forest and C5.0 algorithm analysis in order to decide characteristics and carry out identification of bat species. It also aims at developing supporting decision-making system based on the model to find out the characteristics and identification of the bat species. The study showed that C5.0 algorithm prevailed and was selected with the mean score of accuracy of 98.98%, while the mean score of accuracy for the random forest was 97.26%. As many 50 rules were implemented in the DSS to identify common and rare bat species with morphometric and morphological attributes
    • 

    corecore