2,409 research outputs found
Automatic coding of short text responses via clustering in educational assessment
Automatic coding of short text responses opens new doors in assessment. We implemented and integrated baseline methods of natural language processing and statistical modelling by means of software components that are available under open licenses. The accuracy of automatic text coding is demonstrated by using data collected in the Programme for International Student Assessment (PISA) 2012 in Germany. Free text responses of 10 items with Formula responses in total were analyzed. We further examined the effect of different methods, parameter values, and sample sizes on performance of the implemented system. The system reached fair to good up to excellent agreement with human codings Formula Especially items that are solved by naming specific semantic concepts appeared properly coded. The system performed equally well with Formula and somewhat poorer but still acceptable down to Formula Based on our findings, we discuss potential innovations for assessment that are enabled by automatic coding of short text responses. (DIPF/Orig.
Redistribution spurs growth by using a portfolio effect on human capital
We demonstrate by mathematical analysis and systematic computer simulations
that redistribution can lead to sustainable growth in a society. The human
capital dynamics of each agent is described by a stochastic multiplicative
process which, in the long run, leads to the destruction of individual human
capital and the extinction of the individualistic society. When agents are
linked by fully-redistributive taxation the situation might turn to individual
growth in the long run. We consider that a government collects a proportion of
income and reduces it by a fraction as costs for administration (efficiency
losses). The remaining public good is equally redistributed to all agents. We
derive conditions under which the destruction of human capital can be turned
into sustainable growth, despite the losses from the random growth process and
despite the administrative costs. Sustainable growth is induced by
redistribution. This effect could be explained by a simple portfolio-effect
which re-balances individual stochastic processes.
The findings are verified for three different tax schemes: proportional tax,
taking proportional more from the rich, and proportionally more from the poor.
We discuss which of these tax schemes is optimal with respect to maximize
growth under a fixed rate of administrative costs, or with respect to maximize
the governmental income. This leads us to some general conclusions about
governmental decisions, the relation to public good games, and the use of
taxation in a risk taking society.Comment: 12 pages, plus 8 Figures, plus matlab-code to run simulation and
produce figur
Fast conditional density estimation for quantitative structure-activity relationships
Many methods for quantitative structure-activity relationships (QSARs) deliver point estimates only, without quantifying the uncertainty inherent in the prediction. One way to quantify the uncertainy of a QSAR prediction is to predict the conditional density of the activity given the structure instead of a point estimate. If a conditional density estimate is available, it is easy to derive prediction intervals of activities. In this paper, we experimentally evaluate and compare three methods for conditional density estimation for their suitability in QSAR modeling. In contrast to traditional methods for conditional density estimation, they are based on generic machine learning schemes, more specifically, class probability estimators. Our experiments show that a kernel estimator based on class probability estimates from a random forest classifier is highly competitive with Gaussian process regression, while taking only a fraction of the time for training. Therefore, generic machine-learning based methods for conditional density estimation may be a good and fast option for quantifying uncertainty in QSAR modeling.http://www.aaai.org/ocs/index.php/AAAI/AAAI10/paper/view/181
Theory of proximity-induced exchange coupling in graphene on hBN/(Co, Ni)
We perform systematic first-principles calculations of the proximity exchange
coupling, induced by cobalt (Co) and nickel (Ni) in graphene, via a few (up to
three) layers of hexagonal boron nitride (hBN). We find that the induced spin
splitting of the graphene bands is of the order of 10 meV for a monolayer of
hBN, decreasing in magnitude but alternating in sign by adding each new
insulating layer. We find that the proximity exchange can be giant if there is
a resonant level of the transition metal close to the Dirac point. Our
calculations suggest that this effect could be present in Co heterostructures,
in which a level strongly hybridizes with the valence-band orbitals of
graphene. Since this hybridization is spin dependent, the proximity spin
splitting is unusually large, about 10 meV even for two layers of hBN. An
external electric field can change the offset of the graphene and
transition-metal orbitals and can lead to a reversal of the sign of the
exchange parameter. This we predict to happen for the case of two monolayers of
hBN, enabling electrical control of proximity spin polarization (but also spin
injection) in graphene/hBN/Co structures. Nickel-based heterostructures show
weaker proximity effects than cobalt heterostructures. We introduce two
phenomenological models to describe the first-principles data. The minimal
model comprises the graphene (effective) orbitals and can be used to
study transport in graphene with proximity exchange, while the - model
also includes hybridization with orbitals, which is important to capture
the giant proximity exchange. Crucial to both models is the
pseudospin-dependent exchange coupling, needed to describe the different spin
splittings of the valence and conduction bands.Comment: 14 pages, 17 figures, 2 table
A study of hierarchical and flat classification of proteins
Automatic classification of proteins using machine learning is an important problem that has received significant attention in the literature. One feature of this problem is that expert-defined hierarchies of protein classes exist and can potentially be exploited to improve classification performance. In this article we investigate empirically whether this is the case for two such hierarchies. We compare multi-class classification techniques that exploit the information in those class hierarchies and those that do not, using logistic regression, decision trees, bagged decision trees, and support vector machines as the underlying base learners. In particular, we compare hierarchical and flat variants of ensembles of nested dichotomies. The latter have been shown to deliver strong classification performance in multi-class settings. We present experimental results for synthetic, fold recognition, enzyme classification, and remote homology detection data. Our results show that exploiting the class hierarchy improves performance on the synthetic data, but not in the case of the protein classification problems. Based on this we recommend that strong flat multi-class methods be used as a baseline to establish the benefit of exploiting class hierarchies in this area
Estimation and uncertainty of reversible Markov models
Reversibility is a key concept in Markov models and Master-equation models of
molecular kinetics. The analysis and interpretation of the transition matrix
encoding the kinetic properties of the model relies heavily on the
reversibility property. The estimation of a reversible transition matrix from
simulation data is therefore crucial to the successful application of the
previously developed theory. In this work we discuss methods for the maximum
likelihood estimation of transition matrices from finite simulation data and
present a new algorithm for the estimation if reversibility with respect to a
given stationary vector is desired. We also develop new methods for the
Bayesian posterior inference of reversible transition matrices with and without
given stationary vector taking into account the need for a suitable prior
distribution preserving the meta- stable features of the observed process
during posterior inference. All algorithms here are implemented in the PyEMMA
software - http://pyemma.org - as of version 2.0
- …