10,994 research outputs found
A statistical method for estimating activity uncertainty parameters to improve project forecasting
Just like any physical system, projects have entropy that must be managed by spending energy. The entropy is the project’s tendency to move to a state of disorder (schedule delays, cost overruns), and the energy process is an inherent part of any project management methodology. In order to manage the inherent uncertainty of these projects, accurate estimates (for durations, costs, resources, …) are crucial to make informed decisions. Without these estimates, managers have to fall back to their own intuition and experience, which are undoubtedly crucial for making decisions, but are are often subject to biases and hard to quantify. This paper builds further on two published calibration methods that aim to extract data from real projects and calibrate them to better estimate the parameters for the probability distributions of activity durations. Both methods rely on the lognormal distribution model to estimate uncertainty in activity durations and perform a sequence of statistical hypothesis tests that take the possible presence of two human biases into account. Based on these two existing methods, a new so-called statistical partitioning heuristic is presented that integrates the best elements of the two methods to further improve the accuracy of estimating the distribution of activity duration uncertainty. A computational experiment has been carried out on an empirical database of 83 empirical projects. The experiment shows that the new statistical partitioning method performs at least as good as, and often better than, the two existing calibration methods. The improvement will allow a better quantification of the activity duration uncertainty, which will eventually lead to a better prediction of the project schedule and more realistic expectations about the project outcomes. Consequently, the project manager will be able to better cope with the inherent uncertainty (entropy) of projects with a minimum managerial effort (energy)
Characterizing the Quantum Confined Stark Effect in Semiconductor Quantum Dots and Nanorods for Single-Molecule Electrophysiology
We optimized the performance of quantum confined Stark effect QCSE based
voltage nanosensors. A high throughput approach for single particle QCSE
characterization was developed and utilized to screen a library of such
nanosensors. Type II ZnSe CdS seeded nanorods were found to have the best
performance among the different nanosensors evaluated in this work. The degree
of correlation between intensity changes and spectral changes of the excitons
emission under applied field was characterized. An upper limit for the temporal
response of individual ZnSe CdS nanorods to voltage modulation was
characterized by high throughput, high temporal resolution intensity
measurements using a novel photon counting camera. The measured 3.5 us response
time is limited by the voltage modulation electronics and represents about 30
times higher bandwidth than needed for recording an action potential in a
neuron.Comment: 36 pages, 6 figure
The Measure and Mismeasure of Fairness: A Critical Review of Fair Machine Learning
The nascent field of fair machine learning aims to ensure that decisions
guided by algorithms are equitable. Over the last several years, three formal
definitions of fairness have gained prominence: (1) anti-classification,
meaning that protected attributes---like race, gender, and their proxies---are
not explicitly used to make decisions; (2) classification parity, meaning that
common measures of predictive performance (e.g., false positive and false
negative rates) are equal across groups defined by the protected attributes;
and (3) calibration, meaning that conditional on risk estimates, outcomes are
independent of protected attributes. Here we show that all three of these
fairness definitions suffer from significant statistical limitations. Requiring
anti-classification or classification parity can, perversely, harm the very
groups they were designed to protect; and calibration, though generally
desirable, provides little guarantee that decisions are equitable. In contrast
to these formal fairness criteria, we argue that it is often preferable to
treat similarly risky people similarly, based on the most statistically
accurate estimates of risk that one can produce. Such a strategy, while not
universally applicable, often aligns well with policy objectives; notably, this
strategy will typically violate both anti-classification and classification
parity. In practice, it requires significant effort to construct suitable risk
estimates. One must carefully define and measure the targets of prediction to
avoid retrenching biases in the data. But, importantly, one cannot generally
address these difficulties by requiring that algorithms satisfy popular
mathematical formalizations of fairness. By highlighting these challenges in
the foundation of fair machine learning, we hope to help researchers and
practitioners productively advance the area
Proceedings of the 2011 New York Workshop on Computer, Earth and Space Science
The purpose of the New York Workshop on Computer, Earth and Space Sciences is
to bring together the New York area's finest Astronomers, Statisticians,
Computer Scientists, Space and Earth Scientists to explore potential synergies
between their respective fields. The 2011 edition (CESS2011) was a great
success, and we would like to thank all of the presenters and participants for
attending. This year was also special as it included authors from the upcoming
book titled "Advances in Machine Learning and Data Mining for Astronomy". Over
two days, the latest advanced techniques used to analyze the vast amounts of
information now available for the understanding of our universe and our planet
were presented. These proceedings attempt to provide a small window into what
the current state of research is in this vast interdisciplinary field and we'd
like to thank the speakers who spent the time to contribute to this volume.Comment: Author lists modified. 82 pages. Workshop Proceedings from CESS 2011
in New York City, Goddard Institute for Space Studie
Exclusion Limits on the WIMP-Nucleon Cross-Section from the First Run of the Cryogenic Dark Matter Search in the Soudan Underground Lab
The Cryogenic Dark Matter Search (CDMS-II) employs low-temperature Ge and Si
detectors to seek Weakly Interacting Massive Particles (WIMPs) via their
elastic scattering interactions with nuclei. Simultaneous measurements of both
ionization and phonon energy provide discrimination against interactions of
background particles. For recoil energies above 10 keV, events due to
background photons are rejected with >99.99% efficiency. Electromagnetic events
very near the detector surface can mimic nuclear recoils because of reduced
charge collection, but these surface events are rejected with >96% efficiency
by using additional information from the phonon pulse shape. Efficient use of
active and passive shielding, combined with the the 2090 m.w.e. overburden at
the experimental site in the Soudan mine, makes the background from neutrons
negligible for this first exposure. All cuts are determined in a blind manner
from in situ calibrations with external radioactive sources without any prior
knowledge of the event distribution in the signal region. Resulting
efficiencies are known to ~10%. A single event with a recoil of 64 keV passes
all of the cuts and is consistent with the expected misidentification rate of
surface-electron recoils. Under the assumptions for a standard dark matter
halo, these data exclude previously unexplored parameter space for both
spin-independent and spin-dependent WIMP-nucleon elastic scattering. The
resulting limit on the spin-independent WIMP-nucleon elastic-scattering
cross-section has a minimum of 4x10^-43 cm^2 at a WIMP mass of 60 GeV/c^2. The
minimum of the limit for the spin-dependent WIMP-neutron elastic-scattering
cross-section is 2x10^-37 cm^2 at a WIMP mass of 50 GeV/c^2.Comment: 37 pages, 42 figure
Single Cell Proteomics in Biomedicine: High-dimensional Data Acquisition, Visualization and Analysis
New insights on cellular heterogeneity in the last decade provoke the development of a variety of single cell omics tools at a lightning pace. The resultant high-dimensional single cell data generated by these tools require new theoretical approaches and analytical algorithms for effective visualization and interpretation. In this review, we briefly survey the state-of-the-art single cell proteomic tools with a particular focus on data acquisition and quantification, followed by an elaboration of a number of statistical and computational approaches developed to date for dissecting the high-dimensional single cell data. The underlying assumptions, unique features, and limitations of the analytical methods with the designated biological questions they seek to answer will be discussed. Particular attention will be given to those information theoretical approaches that are anchored in a set of first principles of physics and can yield detailed (and often surprising) predictions
Reliable ABC model choice via random forests
Approximate Bayesian computation (ABC) methods provide an elaborate approach
to Bayesian inference on complex models, including model choice. Both
theoretical arguments and simulation experiments indicate, however, that model
posterior probabilities may be poorly evaluated by standard ABC techniques. We
propose a novel approach based on a machine learning tool named random forests
to conduct selection among the highly complex models covered by ABC algorithms.
We thus modify the way Bayesian model selection is both understood and
operated, in that we rephrase the inferential goal as a classification problem,
first predicting the model that best fits the data with random forests and
postponing the approximation of the posterior probability of the predicted MAP
for a second stage also relying on random forests. Compared with earlier
implementations of ABC model choice, the ABC random forest approach offers
several potential improvements: (i) it often has a larger discriminative power
among the competing models, (ii) it is more robust against the number and
choice of statistics summarizing the data, (iii) the computing effort is
drastically reduced (with a gain in computation efficiency of at least fifty),
and (iv) it includes an approximation of the posterior probability of the
selected model. The call to random forests will undoubtedly extend the range of
size of datasets and complexity of models that ABC can handle. We illustrate
the power of this novel methodology by analyzing controlled experiments as well
as genuine population genetics datasets. The proposed methodologies are
implemented in the R package abcrf available on the CRAN.Comment: 39 pages, 15 figures, 6 table
- …