6 research outputs found
Reinforcement learning in large, structured action spaces: A simulation study of decision support for spinal cord injury rehabilitation
Reinforcement learning (RL) has helped improve decision-making in several
applications. However, applying traditional RL is challenging in some
applications, such as rehabilitation of people with a spinal cord injury (SCI).
Among other factors, using RL in this domain is difficult because there are
many possible treatments (i.e., large action space) and few patients (i.e.,
limited training data). Treatments for SCIs have natural groupings, so we
propose two approaches to grouping treatments so that an RL agent can learn
effectively from limited data. One relies on domain knowledge of SCI
rehabilitation and the other learns similarities among treatments using an
embedding technique. We then use Fitted Q Iteration to train an agent that
learns optimal treatments. Through a simulation study designed to reflect the
properties of SCI rehabilitation, we find that both methods can help improve
the treatment decisions of physiotherapists, but the approach based on domain
knowledge offers better performance. Our findings provide a "proof of concept"
that RL can be used to help improve the treatment of those with an SCI and
indicates that continued efforts to gather data and apply RL to this domain are
worthwhile.Comment: 31 pages, 7 figure
Selecting the Number of Clusters with a Stability Trade-off: an Internal Validation Criterion
Model selection is a major challenge in non-parametric clustering. There is
no universally admitted way to evaluate clustering results for the obvious
reason that there is no ground truth against which results could be tested, as
in supervised learning. The difficulty to find a universal evaluation criterion
is a direct consequence of the fundamentally ill-defined objective of
clustering. In this perspective, clustering stability has emerged as a natural
and model-agnostic principle: an algorithm should find stable structures in the
data. If data sets are repeatedly sampled from the same underlying
distribution, an algorithm should find similar partitions. However, it turns
out that stability alone is not a well-suited tool to determine the number of
clusters. For instance, it is unable to detect if the number of clusters is too
small. We propose a new principle for clustering validation: a good clustering
should be stable, and within each cluster, there should exist no stable
partition. This principle leads to a novel internal clustering validity
criterion based on between-cluster and within-cluster stability, overcoming
limitations of previous stability-based methods. We empirically show the
superior ability of additive noise to discover structures, compared with
sampling-based perturbation. We demonstrate the effectiveness of our method for
selecting the number of clusters through a large number of experiments and
compare it with existing evaluation methods.Comment: 43 page
Reinforcement learning in large, structured action spaces: A simulation study of decision support for spinal cord injury rehabilitation
Reinforcement learning (RL) has helped improve decision-making in several applications. However, applying traditional RL is challenging in some applications, such as rehabilitation of people with a spinal cord injury (SCI). Among other factors, using RL in this domain is difficult because there are many possible treatments (i.e., large action space) and few patients (i.e., limited training data). Treatments for SCIs have natural groupings, so we propose two approaches to grouping treatments so that an RL agent can learn effectively from limited data. One relies on domain knowledge of SCI rehabilitation and the other learns similarities among treatments using an embedding technique. We then use Fitted Q Iteration to train an agent that learns optimal treatments. Through a simulation study designed to reflect the properties of SCI rehabilitation, we find that both methods can help improve the treatment decisions of physiotherapists, but the approach based on domain knowledge offers better performance