1,532 research outputs found

    Clustering student skill set profiles in a unit hypercube using mixtures of multivariate betas

    Get PDF
    <br>This paper presents a finite mixture of multivariate betas as a new model-based clustering method tailored to applications where the feature space is constrained to the unit hypercube. The mixture component densities are taken to be conditionally independent, univariate unimodal beta densities (from the subclass of reparameterized beta densities given by Bagnato and Punzo 2013). The EM algorithm used to fit this mixture is discussed in detail, and results from both this beta mixture model and the more standard Gaussian model-based clustering are presented for simulated skill mastery data from a common cognitive diagnosis model and for real data from the Assistment System online mathematics tutor (Feng et al 2009). The multivariate beta mixture appears to outperform the standard Gaussian model-based clustering approach, as would be expected on the constrained space. Fewer components are selected (by BIC-ICL) in the beta mixture than in the Gaussian mixture, and the resulting clusters seem more reasonable and interpretable.</br> <br>This article is in technical report form, the final publication is available at http://www.springerlink.com/openurl.asp?genre=article &id=doi:10.1007/s11634-013-0149-z</br&gt

    A survey of popular R packages for cluster analysis

    Get PDF
    Cluster analysis is a set of statistical methods for discovering new group/class structure when exploring datasets. This article reviews the following popular libraries/commands in the R software language for applying different types of cluster analysis: from the stats library, the kmeans and hclust functions; the mclust library; the poLCA library; and the clustMD library. The packages/functions cover a variety of cluster analysis methods for continuous data, categorical data or a collection of the two. The contrasting methods in the different packages are briefly introduced and basic usage of the functions is discussed. The use of the different methods is compared and contrasted and then illustrated on example data. In the discussion, links to information on other available libraries for different clustering methods and extensions beyond basic clustering methods are given. The code for the worked examples in Section 2 is available at http://www.stats.gla.ac.uk/~nd29c/Software/ClusterReviewCode.

    Identifying Clusters in Bayesian Disease Mapping

    Full text link
    Disease mapping is the field of spatial epidemiology interested in estimating the spatial pattern in disease risk across nn areal units. One aim is to identify units exhibiting elevated disease risks, so that public health interventions can be made. Bayesian hierarchical models with a spatially smooth conditional autoregressive prior are used for this purpose, but they cannot identify the spatial extent of high-risk clusters. Therefore we propose a two stage solution to this problem, with the first stage being a spatially adjusted hierarchical agglomerative clustering algorithm. This algorithm is applied to data prior to the study period, and produces nn potential cluster structures for the disease data. The second stage fits a separate Poisson log-linear model to the study data for each cluster structure, which allows for step-changes in risk where two clusters meet. The most appropriate cluster structure is chosen by model comparison techniques, specifically by minimising the Deviance Information Criterion. The efficacy of the methodology is established by a simulation study, and is illustrated by a study of respiratory disease risk in Glasgow, Scotland

    Bayesian cluster detection via adjacency modelling

    Get PDF
    Disease mapping aims to estimate the spatial pattern in disease risk across an area, identifying units which have elevated disease risk. Existing methods use Bayesian hierarchical models with spatially smooth conditional autoregressive priors to estimate risk, but these methods are unable to identify the geographical extent of spatially contiguous high-risk clusters of areal units. Our proposed solution to this problem is a two-stage approach, which produces a set of potential cluster structures for the data and then chooses the optimal structure via a Bayesian hierarchical model. The first stage uses a spatially adjusted hierarchical agglomerative clustering algorithm. The second stage fits a Poisson log-linear model to the data to estimate the optimal cluster structure and the spatial pattern in disease risk. The methodology was applied to a study of chronic obstructive pulmonary disease (COPD) in local authorities in England, where a number of high risk clusters were identified

    Spatial clustering of average risks and risk trends in Bayesian disease mapping

    Get PDF
    Spatiotemporal disease mapping focuses on estimating the spatial pattern in disease risk across a set of nonoverlapping areal units over a fixed period of time. The key aim of such research is to identify areas that have a high average level of disease risk or where disease risk is increasing over time, thus allowing public health interventions to be focused on these areas. Such aims are well suited to the statistical approach of clustering, and while much research has been done in this area in a purely spatial setting, only a handful of approaches have focused on spatiotemporal clustering of disease risk. Therefore, this paper outlines a new modeling approach for clustering spatiotemporal disease risk data, by clustering areas based on both their mean risk levels and the behavior of their temporal trends. The efficacy of the methodology is established by a simulation study, and is illustrated by a study of respiratory disease risk in Glasgow, Scotland

    Variable selection and updating in model-based discriminant analysis for high dimensional data with food authenticity applications

    Get PDF
    Food authenticity studies are concerned with determining if food samples have been correctly labelled or not. Discriminant analysis methods are an integral part of the methodology for food authentication. Motivated by food authenticity applications, a model-based discriminant analysis method that includes variable selection is presented. The discriminant analysis model is fitted in a semi-supervised manner using both labeled and unlabeled data. The method is shown to give excellent classification performance on several high-dimensional multiclass food authenticity datasets with more variables than observations. The variables selected by the proposed method provide information about which variables are meaningful for classification purposes. A headlong search strategy for variable selection is shown to be efficient in terms of computation and achieves excellent classification performance. In applications to several food authenticity datasets, our proposed method outperformed default implementations of Random Forests, AdaBoost, transductive SVMs and Bayesian Multinomial Regression by substantial margins

    Q-learning: flexible learning about useful utilities

    Get PDF
    Dynamic treatment regimes are fast becoming an important part of medicine, with the corresponding change in emphasis from treatment of the disease to treatment of the individual patient. Because of the limited number of trials to evaluate personally tailored treatment sequences, inferring optimal treatment regimes from observational data has increased importance. Q-learning is a popular method for estimating the optimal treatment regime, originally in randomized trials but more recently also in observational data. Previous applications of Q-learning have largely been restricted to continuous utility end-points with linear relationships. This paper is the first attempt at both extending the framework to discrete utilities and implementing the modelling of covariates from linear to more flexible modelling using the generalized additive model (GAM) framework. Simulated data results show that the GAM adapted Q-learning typically outperforms Q-learning with linear models and other frequently-used methods based on propensity scores in terms of coverage and bias/MSE. This represents a promising step toward a more fully general Q-learning approach to estimating optimal dynamic treatment regimes

    One Step Forward, Two Step Backwards: Addressing Objections to the ICCā€™s Prescriptive and Adjudicative Powers

    Get PDF
    The Rome Statute of the International Criminal Court (ICC) permits the ICC to exercise subject-matter jurisdiction over individuals who engage in war crimes, genocide, crimes against humanity, and crimes of aggression. However, under Article 13, the ICC may only exercise personal jurisdiction over persons referred by the Security Council under Chapter VII, or over nationals of a state party, or persons whose alleged criminal conduct occurred on the territory of a state party This article evaluates the interplay between principles of public international law and international criminal law in determining whether the ICCā€™s grant of jurisdiction under the Rome Statute is performed within the limits of international law when exercised against non-party nationals. The importance of this inquiry is ever-increasing in light of greater and more expansive breaches of international humanitarian law in the modern world, committed by nations and individuals who have refused to become parties to the Rome Statute. This paper suggests that the ICCā€™s grant of authority to exercise jurisdiction over non-state nationals is consistent with customary norms of international law, even where the state of nationality has not consented to the Courtā€™s jurisdiction. This paper concludes that the ICC was established to prevent impunity by reinvigorating national institutions. It is the culmination of historical lessons that teach against non-cooperation. The Nuremburg and Tokyo tribunals along with the tribunals in Rwanda and Yugoslavia were built for the precise purpose of accounting for crimes which transcend national borders. Objections to the Rome Statute, based on national interests and sovereignty fail in light of the crimes sought to be prevented by an International Criminal Court. Without the safety net provided by international cooperation and prevention of crimes, the world risks facing the dangers it promised ā€œnever again.

    Appeasing the International Conscience or Providing Post-Conflict Justice: Expanding the Khmer Rouge Tribunalā€™s Restorative Role

    Get PDF
    Three decades after the Cambodian civil war, the leaders of the Khmer Rouge will finally be brought before an internationalized domestic tribunal. While the majority of those most responsible have died off or received immunity for their conduct, the Khmer Rouge Tribunal has the historic possibility of reaffirming the importance of international criminal justice and providing an historical narrative of the crimes committed and victims created. This commentary evaluates the importance of restoration in transitional justice and the importance victims and witnesses play in post-conflict justice. This article will argue that previous post-conflict remedies required a balance of restorative and retribution in order to effectuate transitional justice. In turn, the incorporation and protection of witnesses and victims was vital to reconciliation. This article summarizes the importance of victims and witnesses in the context of Cambodia and describes mechanisms the Khmer Rouge Tribunal can use to enhance their participation and protection. By expanding the Khmer Rouge Tribunalā€™s restorative role, it can bring provide post-conflict justice rather then appease international guilt

    Diversity driven Attention Model for Query-based Abstractive Summarization

    Full text link
    Abstractive summarization aims to generate a shorter version of the document covering all the salient points in a compact and coherent fashion. On the other hand, query-based summarization highlights those points that are relevant in the context of a given query. The encode-attend-decode paradigm has achieved notable success in machine translation, extractive summarization, dialog systems, etc. But it suffers from the drawback of generation of repeated phrases. In this work we propose a model for the query-based summarization task based on the encode-attend-decode paradigm with two key additions (i) a query attention model (in addition to document attention model) which learns to focus on different portions of the query at different time steps (instead of using a static representation for the query) and (ii) a new diversity based attention model which aims to alleviate the problem of repeating phrases in the summary. In order to enable the testing of this model we introduce a new query-based summarization dataset building on debatepedia. Our experiments show that with these two additions the proposed model clearly outperforms vanilla encode-attend-decode models with a gain of 28% (absolute) in ROUGE-L scores.Comment: Accepted at ACL 201
    • ā€¦
    corecore