101 research outputs found

    Summary Statistics for Partitionings and Feature Allocations

    Full text link
    Infinite mixture models are commonly used for clustering. One can sample from the posterior of mixture assignments by Monte Carlo methods or find its maximum a posteriori solution by optimization. However, in some problems the posterior is diffuse and it is hard to interpret the sampled partitionings. In this paper, we introduce novel statistics based on block sizes for representing sample sets of partitionings and feature allocations. We develop an element-based definition of entropy to quantify segmentation among their elements. Then we propose a simple algorithm called entropy agglomeration (EA) to summarize and visualize this information. Experiments on various infinite mixture posteriors as well as a feature allocation dataset demonstrate that the proposed statistics are useful in practice.Comment: Accepted to NIPS 2013: https://nips.cc/Conferences/2013/Program/event.php?ID=376

    School accountability: (how) can we reward schools and avoid cream-skimming?

    Get PDF
    Introducing school accountability may create incentives for efficiency. However, if the performance measure used does not correct for pupil characteristics, it will lead to an inequitable treatment of schools and create perverse incentives for cream-skimming. We apply the theory of fair allocation to show how to integrate empirical information about the educational production function in a coherent theoretical framework. The requirements of rewarding performance and correcting for pupil characteristics are incompatible if we want the funding scheme to be applicable for all educational production functions. However, we characterize an attractive subsidy scheme under specific restrictions on the educational production function. This subsidy scheme uses only information which can be controlled easily by the regulator. We show with Flemish data how the proposed funding scheme can be implemented. Correcting for pupil characteristics has a strong impact on the subsidies (and on the underlying performance ranking) of schools.

    School accountability: (how) can we reward schools and avoid cream-skimming?.

    Get PDF
    Introducing school accountability may create incentives for efficiency. However, if the performance measure used does not correct for pupil characteristics, it will lead to an inequitable treatment of schools and create perverse incentives for cream-skimming. We apply the theory of fair allocation to show how to integrate empirical information about the educational production function in a coherent theoretical framework. The requirements of rewarding performance and correcting for pupil characteristics are incompatible if we want the funding scheme to be applicable for all educational production functions. However, we characterize an attractive subsidy scheme under specific restrictions on the educational production function. This subsidy scheme uses only information which can be controlled easily by the regulator. We show with Flemish data how the proposed funding scheme can be implemented. Correcting for pupil characteristics has a strong impact on the subsidies (and on the underlying performance ranking) of schools.

    Allocation Strategies for Data-Oriented Architectures

    Get PDF
    Data orientation is a common design principle in distributed data management systems. In contrast to process-oriented or transaction-oriented system designs, data-oriented architectures are based on data locality and function shipping. The tight coupling of data and processing thereon is implemented in different systems in a variety of application scenarios such as data analysis, database-as-a-service, and data management on multiprocessor systems. Data-oriented systems, i.e., systems that implement a data-oriented architecture, bundle data and operations together in tasks which are processed locally on the nodes of the distributed system. Allocation strategies, i.e., methods that decide the mapping from tasks to nodes, are core components in data-oriented systems. Good allocation strategies can lead to balanced systems while bad allocation strategies cause skew in the load and therefore suboptimal application performance and infrastructure utilization. Optimal allocation strategies are hard to find given the complexity of the systems, the complicated interactions of tasks, and the huge solution space. To ensure the scalability of data-oriented systems and to keep them manageable with hundreds of thousands of tasks, thousands of nodes, and dynamic workloads, fast and reliable allocation strategies are mandatory. In this thesis, we develop novel allocation strategies for data-oriented systems based on graph partitioning algorithms. Therefore, we show that systems from different application scenarios with different abstraction levels can be generalized to generic infrastructure and workload descriptions. We use weighted graph representations to model infrastructures with bounded and unbounded, i.e., overcommited, resources and possibly non-linear performance characteristics. Based on our generalized infrastructure and workload model, we formalize the allocation problem, which seeks valid and balanced allocations that minimize communication. Our allocation strategies partition the workload graph using solution heuristics that work with single and multiple vertex weights. Novel extensions to these solution heuristics can be used to balance penalized and secondary graph partition weights. These extensions enable the allocation strategies to handle infrastructures with non-linear performance behavior. On top of the basic algorithms, we propose methods to incorporate heterogeneous infrastructures and to react to changing workloads and infrastructures by incrementally updating the partitioning. We evaluate all components of our allocation strategy algorithms and show their applicability and scalability with synthetic workload graphs. In end-to-end--performance experiments in two actual data-oriented systems, a database-as-a-service system and a database management system for multiprocessor systems, we prove that our allocation strategies outperform alternative state-of-the-art methods

    School accountability : (how) can we reward schools and avoid cream-skimming

    Get PDF
    Introducing school accountability may create incentives for efficiency. However, if the performance measure used does not correct for pupil characteristics, it will lead to an inequitable treatment of schools and create perverse incentives for cream-skimming. We apply the theory of fair allocation to show how to integrate empirical information about the educational production function in a coherent theoretical framework. The requirements of rewarding performance and correcting for pupil characteristics are incompatible if we want the funding scheme to be applicable for all educational production functions. However, we characterize an attractive subsidy scheme under specific restrictions on the educational production function. This subsidy scheme uses only information which can be controlled easily by the regulator. We show with Flemish data how the proposed funding scheme can be implemented. Correcting for pupil characteristics has a strong impact on the subsidies (and on the underlying performance ranking) of schools

    Applications of remote sensing, volume 2

    Get PDF
    The author has identified the following significant results. The overall spectral response of the strata measured by a mean vector and covariance matrix for each stratum did not show differences among the LACIE phase 3 strata using the machine clustering procedures. This was expected since the large strata gave rise to broad normal distributions with a great deal of overlap. The static stratification of Kansas contains strata which are small in size. The distributions for these strata are not as broad as those based on the LACIE phase 3 partitions, but there is still some confusion since strata from different categories are not spectrally distinct

    Identification of a novel clinical phenotype of severe malaria using a network-based clustering approach

    Get PDF
    The parasite Plasmodium falciparum is the main cause of severe malaria (SM). Despite treatment with antimalarial drugs, more than 400,000 deaths are reported every year, mainly in African children. The diversity of clinical presentations associated with SM highlights important differences in disease pathogenesis that often require specific therapeutic options. The clinical heterogeneity of SM is largely unresolved. Here we report a network-based analysis of clinical phenotypes associated with SM in 2,915 Gambian children admitted to hospital with Plasmodium falciparum malaria. We used a network-based clustering method which revealed a strong correlation between disease heterogeneity and mortality. The analysis identified four distinct clusters of SM and respiratory distress that departed from the WHO definition. Patients in these clusters characteristically presented with liver enlargement and high concentrations of brain natriuretic peptide (BNP), giving support to the potential role of circulatory overload and/or right-sided heart failure as a mechanism of disease. The role of heart failure is controversial in SM and our work suggests that standard clinical management may not be appropriate. We find that our clustering can be a powerful data exploration tool to identify novel disease phenotypes and therapeutic options to reduce malaria-associated mortality

    Real-time tomographic reconstruction

    Get PDF
    With tomography it is possible to reconstruct the interior of an object without destroying. It is an important technique for many applications in, e.g., science, industry, and medicine. The runtime of conventional reconstruction algorithms is typically much longer than the time it takes to perform the tomographic experiment, and this prohibits the real-time reconstruction and visualization of the imaged object. The research in this dissertation introduces various techniques such as new parallelization schemes, data partitioning methods, and a quasi-3D reconstruction framework, that significantly reduce the time it takes to run conventional tomographic reconstruction algorithms without affecting image quality. The resulting methods and software implementations put reconstruction times in the same ballpark as the time it takes to do a tomographic scan, so that we can speak of real-time tomographic reconstruction.NWONumber theory, Algebra and Geometr
    corecore