19,195 research outputs found
Beyond subjective and objective in statistics
We argue that the words "objectivity" and "subjectivity" in statistics
discourse are used in a mostly unhelpful way, and we propose to replace each of
them with broader collections of attributes, with objectivity replaced by
transparency, consensus, impartiality, and correspondence to observable
reality, and subjectivity replaced by awareness of multiple perspectives and
context dependence. The advantage of these reformulations is that the
replacement terms do not oppose each other. Instead of debating over whether a
given statistical method is subjective or objective (or normatively debating
the relative merits of subjectivity and objectivity in statistical practice),
we can recognize desirable attributes such as transparency and acknowledgment
of multiple perspectives as complementary goals. We demonstrate the
implications of our proposal with recent applied examples from pharmacology,
election polling, and socioeconomic stratification.Comment: 35 page
From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation
Starting from a high-level problem description in terms of partial
differential equations using abstract tensor notation, the Chemora framework
discretizes, optimizes, and generates complete high performance codes for a
wide range of compute architectures. Chemora extends the capabilities of
Cactus, facilitating the usage of large-scale CPU/GPU systems in an efficient
manner for complex applications, without low-level code tuning. Chemora
achieves parallelism through MPI and multi-threading, combining OpenMP and
CUDA. Optimizations include high-level code transformations, efficient loop
traversal strategies, dynamically selected data and instruction cache usage
strategies, and JIT compilation of GPU code tailored to the problem
characteristics. The discretization is based on higher-order finite differences
on multi-block domains. Chemora's capabilities are demonstrated by simulations
of black hole collisions. This problem provides an acid test of the framework,
as the Einstein equations contain hundreds of variables and thousands of terms.Comment: 18 pages, 4 figures, accepted for publication in Scientific
Programmin
Measuring and Managing Answer Quality for Online Data-Intensive Services
Online data-intensive services parallelize query execution across distributed
software components. Interactive response time is a priority, so online query
executions return answers without waiting for slow running components to
finish. However, data from these slow components could lead to better answers.
We propose Ubora, an approach to measure the effect of slow running components
on the quality of answers. Ubora randomly samples online queries and executes
them twice. The first execution elides data from slow components and provides
fast online answers; the second execution waits for all components to complete.
Ubora uses memoization to speed up mature executions by replaying network
messages exchanged between components. Our systems-level implementation works
for a wide range of platforms, including Hadoop/Yarn, Apache Lucene, the
EasyRec Recommendation Engine, and the OpenEphyra question answering system.
Ubora computes answer quality much faster than competing approaches that do not
use memoization. With Ubora, we show that answer quality can and should be used
to guide online admission control. Our adaptive controller processed 37% more
queries than a competing controller guided by the rate of timeouts.Comment: Technical Repor
Extracting 3D parametric curves from 2D images of Helical objects
Helical objects occur in medicine, biology, cosmetics, nanotechnology, and engineering. Extracting a 3D parametric curve from a 2D image of a helical object has many practical applications, in particular being able to extract metrics such as tortuosity, frequency, and pitch. We present a method that is able to straighten the image object and derive a robust 3D helical curve from peaks in the object boundary. The algorithm has a small number of stable parameters that require little tuning, and the curve is validated against both synthetic and real-world data. The results show that the extracted 3D curve comes within close Hausdorff distance to the ground truth, and has near identical tortuosity for helical objects with a circular profile. Parameter insensitivity and robustness against high levels of image noise are demonstrated thoroughly and quantitatively
Principal Boundary on Riemannian Manifolds
We consider the classification problem and focus on nonlinear methods for
classification on manifolds. For multivariate datasets lying on an embedded
nonlinear Riemannian manifold within the higher-dimensional ambient space, we
aim to acquire a classification boundary for the classes with labels, using the
intrinsic metric on the manifolds. Motivated by finding an optimal boundary
between the two classes, we invent a novel approach -- the principal boundary.
From the perspective of classification, the principal boundary is defined as an
optimal curve that moves in between the principal flows traced out from two
classes of data, and at any point on the boundary, it maximizes the margin
between the two classes. We estimate the boundary in quality with its
direction, supervised by the two principal flows. We show that the principal
boundary yields the usual decision boundary found by the support vector machine
in the sense that locally, the two boundaries coincide. Some optimality and
convergence properties of the random principal boundary and its population
counterpart are also shown. We illustrate how to find, use and interpret the
principal boundary with an application in real data.Comment: 31 pages,10 figure
A hierarchical Mamdani-type fuzzy modelling approach with new training data selection and multi-objective optimisation mechanisms: A special application for the prediction of mechanical properties of alloy steels
In this paper, a systematic data-driven fuzzy modelling methodology is proposed, which allows to construct Mamdani fuzzy models considering both accuracy (precision) and transparency (interpretability) of fuzzy systems. The new methodology employs a fast hierarchical clustering algorithm to generate an initial fuzzy model efficiently; a training data selection mechanism is developed to identify appropriate and efficient data as learning samples; a high-performance Particle Swarm Optimisation (PSO) based multi-objective optimisation mechanism is developed to further improve the fuzzy model in terms of both the structure and the parameters; and a new tolerance analysis method is proposed to derive the confidence bands relating to the final elicited models. This proposed modelling approach is evaluated using two benchmark problems and is shown to outperform other modelling approaches. Furthermore, the proposed approach is successfully applied to complex high-dimensional modelling problems for manufacturing of alloy steels, using ‘real’ industrial data. These problems concern the prediction of the mechanical properties of alloy steels by correlating them with the heat treatment process conditions as well as the weight percentages of the chemical compositions
- …