67,672 research outputs found

    Probabilistic Perspectives on Collecting Human Uncertainty in Predictive Data Mining

    Full text link
    In many areas of data mining, data is collected from humans beings. In this contribution, we ask the question of how people actually respond to ordinal scales. The main problem observed is that users tend to be volatile in their choices, i.e. complex cognitions do not always lead to the same decisions, but to distributions of possible decision outputs. This human uncertainty may sometimes have quite an impact on common data mining approaches and thus, the question of effective modelling this so called human uncertainty emerges naturally. Our contribution introduces two different approaches for modelling the human uncertainty of user responses. In doing so, we develop techniques in order to measure this uncertainty at the level of user inputs as well as the level of user cognition. With support of comprehensive user experiments and large-scale simulations, we systematically compare both methodologies along with their implications for personalisation approaches. Our findings demonstrate that significant amounts of users do submit something completely different (action) than they really have in mind (cognition). Moreover, we demonstrate that statistically sound evidence with respect to algorithm assessment becomes quite hard to realise, especially when explicit rankings shall be built

    Psychometrics in Practice at RCEC

    Get PDF
    A broad range of topics is dealt with in this volume: from combining the psychometric generalizability and item response theories to the ideas for an integrated formative use of data-driven decision making, assessment for learning and diagnostic testing. A number of chapters pay attention to computerized (adaptive) and classification testing. Other chapters treat the quality of testing in a general sense, but for topics like maintaining standards or the testing of writing ability, the quality of testing is dealt with more specifically.\ud All authors are connected to RCEC as researchers. They present one of their current research topics and provide some insight into the focus of RCEC. The selection of the topics and the editing intends that the book should be of special interest to educational researchers, psychometricians and practitioners in educational assessment

    Failure prediction models: performance, disagreements, and internal rating systems. NBB Working Papers. No. 123, 13 December 2007

    Get PDF
    We address a number of comparative issues relating to the performance of failure prediction models for small, private firms. We use two models provided by vendors, a model developed by the National Bank of Belgium, and the Altman Z-score model to investigate model power, the extent of disagreement between models in the ranking of firms, and the design of internal rating systems. We also examine the potential gains from combining the output of multiple models. We find that the power of all four models in predicting bankruptcies is very good at the one-year horizon, even though not all of the models were developed using bankruptcy data and the models use different statistical methodologies. Disagreements in firm rankings are nevertheless significant across models, and model choice will have an impact on loan pricing and origination decisions. We find that it is possible to realize important gains from combining models with similar power. In addition, we show that it can also be beneficial to combine a weaker model with a stronger one if disagreements across models with respect to failing firms are high enough. Finally, the number of classes in an internal rating system appears to be more important than the distribution of borrowers across classes

    Comment on \u201cCan assimilation of crowdsourced data in hydrological modelling improve flood prediction?\u201d by Mazzoleni et al. (2017)

    Get PDF
    Citizen science and crowdsourcing are gaining increasing attention among hydrologists. In a recent contribution, Mazzoleni et al. (2017) investigated the integration of crowdsourced data (CSD) into hydrological models to improve the accuracy of real-time flood forecasts. The authors used synthetic CSD (i.e. not actually measured), because real CSD were not available at the time of the study. In their work, which is a proof-of-concept study, Mazzoleni et al. (2017) showed that assimilation of CSD improves the overall model performance; the impact of irregular frequency of available CSD, and that of data uncertainty, were also deeply assessed. However, the use of synthetic CSD in conjunction with (semi-)distributed hydrological models deserves further discussion. As a result of equifinality, poor model identifiability, and deficiencies in model structure, internal states of (semi-)distributed models can hardly mimic the actual states of complex systems away from calibration points. Accordingly, the use of synthetic CSD that are drawn from model internal states under best-fit conditions can lead to overestimation of the effectiveness of CSD assimilation in improving flood prediction. Operational flood forecasting, which results in decisions of high societal value, requires robust knowledge of the model behaviour and an in-depth assessment of both model structure and forcing data. Additional guidelines are given that are useful for the a priori evaluation of CSD for real-time flood forecasting and, hopefully, for planning apt design strategies for both model calibration and collection of CSD
    • 

    corecore