37,498 research outputs found

    Comparing software prediction techniques using simulation

    Get PDF
    The need for accurate software prediction systems increases as software becomes much larger and more complex. We believe that the underlying characteristics: size, number of features, type of distribution, etc., of the data set influence the choice of the prediction system to be used. For this reason, we would like to control the characteristics of such data sets in order to systematically explore the relationship between accuracy, choice of prediction system, and data set characteristic. It would also be useful to have a large validation data set. Our solution is to simulate data allowing both control and the possibility of large (1000) validation cases. The authors compare four prediction techniques: regression, rule induction, nearest neighbor (a form of case-based reasoning), and neural nets. The results suggest that there are significant differences depending upon the characteristics of the data set. Consequently, researchers should consider prediction context when evaluating competing prediction systems. We observed that the more "messy" the data and the more complex the relationship with the dependent variable, the more variability in the results. In the more complex cases, we observed significantly different results depending upon the particular training set that has been sampled from the underlying data set. However, our most important result is that it is more fruitful to ask which is the best prediction system in a particular context rather than which is the "best" prediction system

    Automatic coding of short text responses via clustering in educational assessment

    Full text link
    Automatic coding of short text responses opens new doors in assessment. We implemented and integrated baseline methods of natural language processing and statistical modelling by means of software components that are available under open licenses. The accuracy of automatic text coding is demonstrated by using data collected in the Programme for International Student Assessment (PISA) 2012 in Germany. Free text responses of 10 items with Formula responses in total were analyzed. We further examined the effect of different methods, parameter values, and sample sizes on performance of the implemented system. The system reached fair to good up to excellent agreement with human codings Formula Especially items that are solved by naming specific semantic concepts appeared properly coded. The system performed equally well with Formula and somewhat poorer but still acceptable down to Formula Based on our findings, we discuss potential innovations for assessment that are enabled by automatic coding of short text responses. (DIPF/Orig.

    The role of social interaction in farmers' climate adaptation choice

    Get PDF
    Adaptation to climate change might not always occur, with potentially\ud catastrophic results. Success depends on coordinated actions at both\ud governmental and individual levels (public and private adaptation). Even for a “wet” country like the Netherlands, climate change projections show that the frequency and severity of droughts are likely to increase. Freshwater is an important factor for agricultural production. A deficit causes damage to crop production and consequently to a loss of income. Adaptation is the key to decrease farmers’ vulnerability at the micro level and the sector’s vulnerability at the macro level. Individual adaptation decision-making is determined by the behavior of economic agents and social interaction among them. This can be best studied with agentbased modelling. Given the uncertainty about future weather conditions and the costs and effectiveness of adaptation strategies, a farmer in the model uses a cognitive process (or heuristic) to make adaptation decisions. In this process, he can rely on his experiences and on information from interactions within his social network. Interaction leads to the spread of information and knowledge that causes learning. Learning changes the conditions for individual adaptation decisionmaking. All these interactions cause emergent phenomena: the diffusion of adaptation strategies and a change of drought vulnerability of the agricultural sector. In this paper, we present a conceptual model and the first implementation of an agent-based model. The aim is to study the role of interaction in a farmer’s social network on adaptation decisions and on the diffusion of adaptation strategies\ud and vulnerability of the agricultural sector. Micro-level survey data will be used to parameterize agents’ behavioral and interaction rules at a later stage. This knowledge is necessary for the successful design of public adaptation strategies, since governmental adaptation actions need to be fine-tuned to private adaptation behavior

    Dissemination of Health Information within Social Networks

    Full text link
    In this paper, we investigate, how information about a common food born health hazard, known as Campylobacter, spreads once it was delivered to a random sample of individuals in France. The central question addressed here is how individual characteristics and the various aspects of social network influence the spread of information. A key claim of our paper is that information diffusion processes occur in a patterned network of social ties of heterogeneous actors. Our percolation models show that the characteristics of the recipients of the information matter as much if not more than the characteristics of the sender of the information in deciding whether the information will be transmitted through a particular tie. We also found that at least for this particular advisory, it is not the perceived need of the recipients for the information that matters but their general interest in the topic

    From complex questionnaire and interviewing data to intelligent Bayesian network models for medical decision support

    Get PDF
    OBJECTIVES: 1) To develop a rigorous and repeatable method for building effective Bayesian network (BN) models for medical decision support from complex, unstructured and incomplete patient questionnaires and interviews that inevitably contain examples of repetitive, redundant and contradictory responses; 2) To exploit expert knowledge in the BN development since further data acquisition is usually not possible; 3) To ensure the BN model can be used for interventional analysis; 4) To demonstrate why using data alone to learn the model structure and parameters is often unsatisfactory even when extensive data is available. METHOD: The method is based on applying a range of recent BN developments targeted at helping experts build BNs given limited data. While most of the components of the method are based on established work, its novelty is that it provides a rigorous consolidated and generalised framework that addresses the whole life-cycle of BN model development. The method is based on two original and recent validated BN models in forensic psychiatry, known as DSVM-MSS and DSVM-P. RESULTS: When employed with the same datasets, the DSVM-MSS demonstrated competitive to superior predictive performance (AUC scores 0.708 and 0.797) against the state-of-the-art (AUC scores ranging from 0.527 to 0.705), and the DSVM-P demonstrated superior predictive performance (cross-validated AUC score of 0.78) against the state-of-the-art (AUC scores ranging from 0.665 to 0.717). More importantly, the resulting models go beyond improving predictive accuracy and into usefulness for risk management purposes through intervention, and enhanced decision support in terms of answering complex clinical questions that are based on unobserved evidence. CONCLUSIONS: This development process is applicable to any application domain which involves large-scale decision analysis based on such complex information, rather than based on data with hard facts, and in conjunction with the incorporation of expert knowledge for decision support via intervention. The novelty extends to challenging the decision scientists to reason about building models based on what information is really required for inference, rather than based on what data is available and hence, forces decision scientists to use available data in a much smarter way

    Applied statistics: A review

    Full text link
    The main phases of applied statistical work are discussed in general terms. The account starts with the clarification of objectives and proceeds through study design, measurement and analysis to interpretation. An attempt is made to extract some general notions.Comment: Published at http://dx.doi.org/10.1214/07-AOAS113 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Corruption and Inequality as Correlates of Social Trust: Fairness Matters More Than Similarity

    Get PDF
    Argued is that the fairness of a society affects its level of social trust more than does its homogeneity. Societies with fair procedural rules (democracy), fair administration of rules (freedom from corruption), and fair (relatively equal and unskewed) income distribution produce incentives for trustworthy behavior, develop norms of trustworthiness, and enhance interpersonal trust. Based on a multi-level analysis using the World Values Surveys data that cover 80 countries, I find that (1) freedom from corruption, income equality, and mature democracy are positively associated with trust, while ethnic diversity loses significance once these factors are accounted for; (2) corruption and inequality have an adverse impact on norms and perceptions of trustworthiness; (3) the negative effect of inequality on trust is due to the skewness of income rather than its simple heterogeneity; and (4) the negative effect of minority status is greater in more unequal and undemocratic countries, consistent with the fairness explanation.This publication is Hauser Center Working Paper No. 29. The Hauser Center Working Paper Series was launched during the summer of 2000. The Series enables the Hauser Center to share with a broad audience important works-in-progress written by Hauser Center scholars and researchers
    • 

    corecore