1,579 research outputs found

    Clustering student skill set profiles in a unit hypercube using mixtures of multivariate betas

    Get PDF
    <br>This paper presents a finite mixture of multivariate betas as a new model-based clustering method tailored to applications where the feature space is constrained to the unit hypercube. The mixture component densities are taken to be conditionally independent, univariate unimodal beta densities (from the subclass of reparameterized beta densities given by Bagnato and Punzo 2013). The EM algorithm used to fit this mixture is discussed in detail, and results from both this beta mixture model and the more standard Gaussian model-based clustering are presented for simulated skill mastery data from a common cognitive diagnosis model and for real data from the Assistment System online mathematics tutor (Feng et al 2009). The multivariate beta mixture appears to outperform the standard Gaussian model-based clustering approach, as would be expected on the constrained space. Fewer components are selected (by BIC-ICL) in the beta mixture than in the Gaussian mixture, and the resulting clusters seem more reasonable and interpretable.</br> <br>This article is in technical report form, the final publication is available at http://www.springerlink.com/openurl.asp?genre=article &id=doi:10.1007/s11634-013-0149-z</br&gt

    A survey of popular R packages for cluster analysis

    Get PDF
    Cluster analysis is a set of statistical methods for discovering new group/class structure when exploring datasets. This article reviews the following popular libraries/commands in the R software language for applying different types of cluster analysis: from the stats library, the kmeans and hclust functions; the mclust library; the poLCA library; and the clustMD library. The packages/functions cover a variety of cluster analysis methods for continuous data, categorical data or a collection of the two. The contrasting methods in the different packages are briefly introduced and basic usage of the functions is discussed. The use of the different methods is compared and contrasted and then illustrated on example data. In the discussion, links to information on other available libraries for different clustering methods and extensions beyond basic clustering methods are given. The code for the worked examples in Section 2 is available at http://www.stats.gla.ac.uk/~nd29c/Software/ClusterReviewCode.

    Identifying Clusters in Bayesian Disease Mapping

    Full text link
    Disease mapping is the field of spatial epidemiology interested in estimating the spatial pattern in disease risk across nn areal units. One aim is to identify units exhibiting elevated disease risks, so that public health interventions can be made. Bayesian hierarchical models with a spatially smooth conditional autoregressive prior are used for this purpose, but they cannot identify the spatial extent of high-risk clusters. Therefore we propose a two stage solution to this problem, with the first stage being a spatially adjusted hierarchical agglomerative clustering algorithm. This algorithm is applied to data prior to the study period, and produces nn potential cluster structures for the disease data. The second stage fits a separate Poisson log-linear model to the study data for each cluster structure, which allows for step-changes in risk where two clusters meet. The most appropriate cluster structure is chosen by model comparison techniques, specifically by minimising the Deviance Information Criterion. The efficacy of the methodology is established by a simulation study, and is illustrated by a study of respiratory disease risk in Glasgow, Scotland

    Spatial clustering of average risks and risk trends in Bayesian disease mapping

    Get PDF
    Spatiotemporal disease mapping focuses on estimating the spatial pattern in disease risk across a set of nonoverlapping areal units over a fixed period of time. The key aim of such research is to identify areas that have a high average level of disease risk or where disease risk is increasing over time, thus allowing public health interventions to be focused on these areas. Such aims are well suited to the statistical approach of clustering, and while much research has been done in this area in a purely spatial setting, only a handful of approaches have focused on spatiotemporal clustering of disease risk. Therefore, this paper outlines a new modeling approach for clustering spatiotemporal disease risk data, by clustering areas based on both their mean risk levels and the behavior of their temporal trends. The efficacy of the methodology is established by a simulation study, and is illustrated by a study of respiratory disease risk in Glasgow, Scotland

    Bayesian cluster detection via adjacency modelling

    Get PDF
    Disease mapping aims to estimate the spatial pattern in disease risk across an area, identifying units which have elevated disease risk. Existing methods use Bayesian hierarchical models with spatially smooth conditional autoregressive priors to estimate risk, but these methods are unable to identify the geographical extent of spatially contiguous high-risk clusters of areal units. Our proposed solution to this problem is a two-stage approach, which produces a set of potential cluster structures for the data and then chooses the optimal structure via a Bayesian hierarchical model. The first stage uses a spatially adjusted hierarchical agglomerative clustering algorithm. The second stage fits a Poisson log-linear model to the data to estimate the optimal cluster structure and the spatial pattern in disease risk. The methodology was applied to a study of chronic obstructive pulmonary disease (COPD) in local authorities in England, where a number of high risk clusters were identified

    Variable selection and updating in model-based discriminant analysis for high dimensional data with food authenticity applications

    Get PDF
    Food authenticity studies are concerned with determining if food samples have been correctly labelled or not. Discriminant analysis methods are an integral part of the methodology for food authentication. Motivated by food authenticity applications, a model-based discriminant analysis method that includes variable selection is presented. The discriminant analysis model is fitted in a semi-supervised manner using both labeled and unlabeled data. The method is shown to give excellent classification performance on several high-dimensional multiclass food authenticity datasets with more variables than observations. The variables selected by the proposed method provide information about which variables are meaningful for classification purposes. A headlong search strategy for variable selection is shown to be efficient in terms of computation and achieves excellent classification performance. In applications to several food authenticity datasets, our proposed method outperformed default implementations of Random Forests, AdaBoost, transductive SVMs and Bayesian Multinomial Regression by substantial margins

    Q-learning: flexible learning about useful utilities

    Get PDF
    Dynamic treatment regimes are fast becoming an important part of medicine, with the corresponding change in emphasis from treatment of the disease to treatment of the individual patient. Because of the limited number of trials to evaluate personally tailored treatment sequences, inferring optimal treatment regimes from observational data has increased importance. Q-learning is a popular method for estimating the optimal treatment regime, originally in randomized trials but more recently also in observational data. Previous applications of Q-learning have largely been restricted to continuous utility end-points with linear relationships. This paper is the first attempt at both extending the framework to discrete utilities and implementing the modelling of covariates from linear to more flexible modelling using the generalized additive model (GAM) framework. Simulated data results show that the GAM adapted Q-learning typically outperforms Q-learning with linear models and other frequently-used methods based on propensity scores in terms of coverage and bias/MSE. This represents a promising step toward a more fully general Q-learning approach to estimating optimal dynamic treatment regimes

    One Step Forward, Two Step Backwards: Addressing Objections to the ICC’s Prescriptive and Adjudicative Powers

    Get PDF
    The Rome Statute of the International Criminal Court (ICC) permits the ICC to exercise subject-matter jurisdiction over individuals who engage in war crimes, genocide, crimes against humanity, and crimes of aggression. However, under Article 13, the ICC may only exercise personal jurisdiction over persons referred by the Security Council under Chapter VII, or over nationals of a state party, or persons whose alleged criminal conduct occurred on the territory of a state party This article evaluates the interplay between principles of public international law and international criminal law in determining whether the ICC’s grant of jurisdiction under the Rome Statute is performed within the limits of international law when exercised against non-party nationals. The importance of this inquiry is ever-increasing in light of greater and more expansive breaches of international humanitarian law in the modern world, committed by nations and individuals who have refused to become parties to the Rome Statute. This paper suggests that the ICC’s grant of authority to exercise jurisdiction over non-state nationals is consistent with customary norms of international law, even where the state of nationality has not consented to the Court’s jurisdiction. This paper concludes that the ICC was established to prevent impunity by reinvigorating national institutions. It is the culmination of historical lessons that teach against non-cooperation. The Nuremburg and Tokyo tribunals along with the tribunals in Rwanda and Yugoslavia were built for the precise purpose of accounting for crimes which transcend national borders. Objections to the Rome Statute, based on national interests and sovereignty fail in light of the crimes sought to be prevented by an International Criminal Court. Without the safety net provided by international cooperation and prevention of crimes, the world risks facing the dangers it promised “never again.

    Compaction and strength characteristics of Lime activated flyash with GGBS as an admixture

    Get PDF
    A very fine by-product generated through coal combustion process at thermal power plants is known as flyash and a part of ash falls down at the bottom of the boiler is known as bottom ash. Out of the total production of waste material flyash generated is approximately 80% whereas bottom ash generated is 20% (by weight of total generated waste). In India, the total production of flyash was 184.14MT in the year of 2014-2015. Out of which total utilization of flyash was 102.59MT or 60.94% and in the year of 2015-16, the production of flyash was 176.74MT. Out of which total utilization of flyash was 107.77MT or 60.97%. Here it can be seen that the production and utilization both are increasing but there is still 40% of flyash that producing as a waste. The flyash that remained unused will deposited as landfills and brings environmental problems. From these landfills, some of the heavy metals like mercury, cadmium and boron and the very fine particles of flyash leach to groundwater and cause the ground water contamination. And also unused flyash is the major cause of air pollution. In the present study, a try has been made for effective utilization of flyash as a geoengineering material. Material that has been used in the study was class-F flyash and brought up from Adhunik Metaliks Limited, Sundergarh. The geotechnical properties like specific gravity, OMC, MDD, and UCS strength has been evaluated, of this virgin flyash. To enhance the properties of the flyash, it was mixed with lime and slag at different proportion. Lime was mixed with flyash at 0%, 2%, 4%, 8% and 12% whereas slag was mixed at 0%, 5%, 10%, 15%, and 20%. A number of combinations of flyash, lime, and slag have been made for testing. The light compaction test has been done to determine the OMC and MDD of different mixes of flyash-GGBS-lime. In total 25 numbers of compaction test has been conducted to find out the OMC and MDD of the above mixes. Further UCS test has been done with different combinations of flyash-GGBS-lime compacted to their respective MDD at OMC. These samples were cured under an average temperature of 28ºC with samples sealed in wax for curing periods of 0, 7, 14 and 28 days and the UCS values were determined. In the hydrometer analysis, it was found that the flyash particles are uniformly graded and the size of the particles lies between fine sand to silt size. The MDD determined was low at higher OMC. After treatment of flyash with lime and slag, the OMC reduced and MDD increased. The UCS determined for virgin flyash was very less and when treated with lime, it increased immediately marginally. UCS for the lime treated flyash samples were increased with increase in curing period. UCS for the slag-treated flyash samples was very low when tested immediately and with increasing curing periods the UCS values increased up to some extent. The strength of flyash treated with lime and slag was found to be highest when cured for 28 days of curing period. At a given curing period flyash samples mixed with slag and lime shows a higher UCS value then flyash treated with the same percentage of lime without slag. This indicates a defiant advantage in adding slag to flyash. Slag which is rich in pozzolanic material like silica and alumina and it also contains a substantial amount of lime. However GGBS possess latent hydraulic properties which have to be activated by an alkali environment, here lime has used to provide an alkali environment to initiate the pozzolanic reaction of slag

    Diversity driven Attention Model for Query-based Abstractive Summarization

    Full text link
    Abstractive summarization aims to generate a shorter version of the document covering all the salient points in a compact and coherent fashion. On the other hand, query-based summarization highlights those points that are relevant in the context of a given query. The encode-attend-decode paradigm has achieved notable success in machine translation, extractive summarization, dialog systems, etc. But it suffers from the drawback of generation of repeated phrases. In this work we propose a model for the query-based summarization task based on the encode-attend-decode paradigm with two key additions (i) a query attention model (in addition to document attention model) which learns to focus on different portions of the query at different time steps (instead of using a static representation for the query) and (ii) a new diversity based attention model which aims to alleviate the problem of repeating phrases in the summary. In order to enable the testing of this model we introduce a new query-based summarization dataset building on debatepedia. Our experiments show that with these two additions the proposed model clearly outperforms vanilla encode-attend-decode models with a gain of 28% (absolute) in ROUGE-L scores.Comment: Accepted at ACL 201
    corecore