2,409 research outputs found

    Automatic coding of short text responses via clustering in educational assessment

    Full text link
    Automatic coding of short text responses opens new doors in assessment. We implemented and integrated baseline methods of natural language processing and statistical modelling by means of software components that are available under open licenses. The accuracy of automatic text coding is demonstrated by using data collected in the Programme for International Student Assessment (PISA) 2012 in Germany. Free text responses of 10 items with Formula responses in total were analyzed. We further examined the effect of different methods, parameter values, and sample sizes on performance of the implemented system. The system reached fair to good up to excellent agreement with human codings Formula Especially items that are solved by naming specific semantic concepts appeared properly coded. The system performed equally well with Formula and somewhat poorer but still acceptable down to Formula Based on our findings, we discuss potential innovations for assessment that are enabled by automatic coding of short text responses. (DIPF/Orig.

    Redistribution spurs growth by using a portfolio effect on human capital

    Get PDF
    We demonstrate by mathematical analysis and systematic computer simulations that redistribution can lead to sustainable growth in a society. The human capital dynamics of each agent is described by a stochastic multiplicative process which, in the long run, leads to the destruction of individual human capital and the extinction of the individualistic society. When agents are linked by fully-redistributive taxation the situation might turn to individual growth in the long run. We consider that a government collects a proportion of income and reduces it by a fraction as costs for administration (efficiency losses). The remaining public good is equally redistributed to all agents. We derive conditions under which the destruction of human capital can be turned into sustainable growth, despite the losses from the random growth process and despite the administrative costs. Sustainable growth is induced by redistribution. This effect could be explained by a simple portfolio-effect which re-balances individual stochastic processes. The findings are verified for three different tax schemes: proportional tax, taking proportional more from the rich, and proportionally more from the poor. We discuss which of these tax schemes is optimal with respect to maximize growth under a fixed rate of administrative costs, or with respect to maximize the governmental income. This leads us to some general conclusions about governmental decisions, the relation to public good games, and the use of taxation in a risk taking society.Comment: 12 pages, plus 8 Figures, plus matlab-code to run simulation and produce figur

    Fast conditional density estimation for quantitative structure-activity relationships

    Get PDF
    Many methods for quantitative structure-activity relationships (QSARs) deliver point estimates only, without quantifying the uncertainty inherent in the prediction. One way to quantify the uncertainy of a QSAR prediction is to predict the conditional density of the activity given the structure instead of a point estimate. If a conditional density estimate is available, it is easy to derive prediction intervals of activities. In this paper, we experimentally evaluate and compare three methods for conditional density estimation for their suitability in QSAR modeling. In contrast to traditional methods for conditional density estimation, they are based on generic machine learning schemes, more specifically, class probability estimators. Our experiments show that a kernel estimator based on class probability estimates from a random forest classifier is highly competitive with Gaussian process regression, while taking only a fraction of the time for training. Therefore, generic machine-learning based methods for conditional density estimation may be a good and fast option for quantifying uncertainty in QSAR modeling.http://www.aaai.org/ocs/index.php/AAAI/AAAI10/paper/view/181

    Theory of proximity-induced exchange coupling in graphene on hBN/(Co, Ni)

    Get PDF
    We perform systematic first-principles calculations of the proximity exchange coupling, induced by cobalt (Co) and nickel (Ni) in graphene, via a few (up to three) layers of hexagonal boron nitride (hBN). We find that the induced spin splitting of the graphene bands is of the order of 10 meV for a monolayer of hBN, decreasing in magnitude but alternating in sign by adding each new insulating layer. We find that the proximity exchange can be giant if there is a resonant dd level of the transition metal close to the Dirac point. Our calculations suggest that this effect could be present in Co heterostructures, in which a dd level strongly hybridizes with the valence-band orbitals of graphene. Since this hybridization is spin dependent, the proximity spin splitting is unusually large, about 10 meV even for two layers of hBN. An external electric field can change the offset of the graphene and transition-metal orbitals and can lead to a reversal of the sign of the exchange parameter. This we predict to happen for the case of two monolayers of hBN, enabling electrical control of proximity spin polarization (but also spin injection) in graphene/hBN/Co structures. Nickel-based heterostructures show weaker proximity effects than cobalt heterostructures. We introduce two phenomenological models to describe the first-principles data. The minimal model comprises the graphene (effective) pzp_z orbitals and can be used to study transport in graphene with proximity exchange, while the pzp_z-dd model also includes hybridization with dd orbitals, which is important to capture the giant proximity exchange. Crucial to both models is the pseudospin-dependent exchange coupling, needed to describe the different spin splittings of the valence and conduction bands.Comment: 14 pages, 17 figures, 2 table

    A study of hierarchical and flat classification of proteins

    Get PDF
    Automatic classification of proteins using machine learning is an important problem that has received significant attention in the literature. One feature of this problem is that expert-defined hierarchies of protein classes exist and can potentially be exploited to improve classification performance. In this article we investigate empirically whether this is the case for two such hierarchies. We compare multi-class classification techniques that exploit the information in those class hierarchies and those that do not, using logistic regression, decision trees, bagged decision trees, and support vector machines as the underlying base learners. In particular, we compare hierarchical and flat variants of ensembles of nested dichotomies. The latter have been shown to deliver strong classification performance in multi-class settings. We present experimental results for synthetic, fold recognition, enzyme classification, and remote homology detection data. Our results show that exploiting the class hierarchy improves performance on the synthetic data, but not in the case of the protein classification problems. Based on this we recommend that strong flat multi-class methods be used as a baseline to establish the benefit of exploiting class hierarchies in this area

    Estimation and uncertainty of reversible Markov models

    Get PDF
    Reversibility is a key concept in Markov models and Master-equation models of molecular kinetics. The analysis and interpretation of the transition matrix encoding the kinetic properties of the model relies heavily on the reversibility property. The estimation of a reversible transition matrix from simulation data is therefore crucial to the successful application of the previously developed theory. In this work we discuss methods for the maximum likelihood estimation of transition matrices from finite simulation data and present a new algorithm for the estimation if reversibility with respect to a given stationary vector is desired. We also develop new methods for the Bayesian posterior inference of reversible transition matrices with and without given stationary vector taking into account the need for a suitable prior distribution preserving the meta- stable features of the observed process during posterior inference. All algorithms here are implemented in the PyEMMA software - http://pyemma.org - as of version 2.0
    corecore