427 research outputs found
Globally Optimal Crowdsourcing Quality Management
We study crowdsourcing quality management, that is, given worker responses to
a set of tasks, our goal is to jointly estimate the true answers for the tasks,
as well as the quality of the workers. Prior work on this problem relies
primarily on applying Expectation-Maximization (EM) on the underlying maximum
likelihood problem to estimate true answers as well as worker quality.
Unfortunately, EM only provides a locally optimal solution rather than a
globally optimal one. Other solutions to the problem (that do not leverage EM)
fail to provide global optimality guarantees as well. In this paper, we focus
on filtering, where tasks require the evaluation of a yes/no predicate, and
rating, where tasks elicit integer scores from a finite domain. We design
algorithms for finding the global optimal estimates of correct task answers and
worker quality for the underlying maximum likelihood problem, and characterize
the complexity of these algorithms. Our algorithms conceptually consider all
mappings from tasks to true answers (typically a very large number), leveraging
two key ideas to reduce, by several orders of magnitude, the number of mappings
under consideration, while preserving optimality. We also demonstrate that
these algorithms often find more accurate estimates than EM-based algorithms.
This paper makes an important contribution towards understanding the inherent
complexity of globally optimal crowdsourcing quality management
Multi-Avalanche Correlations in Directed Sandpile Models
Multiple avalanches, initiated by simultaneously toppling neighbouring sites,
are studied in three different directed sandpile models. It is argued that,
while the single avalanche exponents are different for the three models, a
suitably defined two-avalanche distribution has identical exponents. The origin
of this universality is traced to particle conservation
Optimization of the detection of microbes in blood from immunocompromised patients with haematological malignancies
AbstractThe present study aimed to improve the rate of detection of blood-borne microbes by using PCRs with pan-bacterial and Candida specificity. Seventeen per cent of the blood samples (n = 178) collected from 107 febrile patients with haematological malignancies were positive using standard culture (BacT/Alert system). Candida PCR was positive in 12 patients, only one of whom scored culture-positive. Bacterial PCR using fresh blood samples was often negative, but the detection rate increased when the blood was pre-incubated for 2 days. These data indicate that PCR assays might be a complement for the detection of blood-borne opportunists in immunocompromised haematology patients
Efficient crowdsourcing for multi-class labeling
Crowdsourcing systems like Amazon's Mechanical Turk have emerged as an effective large-scale human-powered platform for performing tasks in domains such as image classification, data entry, recommendation, and proofreading. Since workers are low-paid (a few cents per task) and tasks performed are monotonous, the answers obtained are noisy and hence unreliable. To obtain reliable estimates, it is essential to utilize appropriate inference algorithms (e.g. Majority voting) coupled with structured redundancy through task assignment. Our goal is to obtain the best possible trade-off between reliability and redundancy. In this paper, we consider a general probabilistic model for noisy observations for crowd-sourcing systems and pose the problem of minimizing the total price (i.e. redundancy) that must be paid to achieve a target overall reliability. Concretely, we show that it is possible to obtain an answer to each task correctly with probability 1-ε as long as the redundancy per task is O((K/q) log (K/ε)), where each task can have any of the distinct answers equally likely, q is the crowd-quality parameter that is defined through a probabilistic model. Further, effectively this is the best possible redundancy-accuracy trade-off any system design can achieve. Such a single-parameter crisp characterization of the (order-)optimal trade-off between redundancy and reliability has various useful operational consequences. Further, we analyze the robustness of our approach in the presence of adversarial workers and provide a bound on their influence on the redundancy-accuracy trade-off.
Unlike recent prior work [GKM11, KOS11, KOS11], our result applies to non-binary (i.e. K>2) tasks. In effect, we utilize algorithms for binary tasks (with inhomogeneous error model unlike that in [GKM11, KOS11, KOS11]) as key subroutine to obtain answers for K-ary tasks. Technically, the algorithm is based on low-rank approximation of weighted adjacency matrix for a random regular bipartite graph, weighted according to the answers provided by the workers.National Science Foundation (U.S.
Limitations of Majority Agreement in Crowdsourced Image Interpretation
Crowdsourcing can efficiently complete tasks that are difficult to automate, but the quality of crowdsourced data is tricky to evaluate. Algorithms to grade volunteer work often assume that all tasks are similarly difficult, an assumption that is frequently false. We use a cropland identification game with over 2,600 participants and 165,000 unique tasks to investigate how best to evaluate the difficulty of crowdsourced tasks and to what extent this is possible based on volunteer responses alone. Inter-volunteer agreement exceeded 90% for about 80% of the images and was negatively correlated with volunteer-expressed uncertainty about image classification. A total of 343 relatively difficult images were independently classified as cropland, non-cropland or impossible by two experts. The experts disagreed weakly (one said impossible while the other rated as cropland or non-cropland) on 27% of the images, but disagreed strongly (cropland vs. non-cropland) on only 7%. Inter-volunteer disagreement increased significantly with inter-expert disagreement. While volunteers agreed with expert classifications for most images, over 20% would have been mis-categorized if only the volunteers’ majority vote was used. We end with a series of recommendations for managing the challenges posed by heterogeneous tasks in crowdsourcing campaigns
Area distribution and the average shape of a L\'evy bridge
We consider a one dimensional L\'evy bridge x_B of length n and index 0 <
\alpha < 2, i.e. a L\'evy random walk constrained to start and end at the
origin after n time steps, x_B(0) = x_B(n)=0. We compute the distribution
P_B(A,n) of the area A = \sum_{m=1}^n x_B(m) under such a L\'evy bridge and
show that, for large n, it has the scaling form P_B(A,n) \sim n^{-1-1/\alpha}
F_\alpha(A/n^{1+1/\alpha}), with the asymptotic behavior F_\alpha(Y) \sim
Y^{-2(1+\alpha)} for large Y. For \alpha=1, we obtain an explicit expression of
F_1(Y) in terms of elementary functions. We also compute the average profile <
\tilde x_B (m) > at time m of a L\'evy bridge with fixed area A. For large n
and large m and A, one finds the scaling form = n^{1/\alpha}
H_\alpha({m}/{n},{A}/{n^{1+1/\alpha}}), where at variance with Brownian bridge,
H_\alpha(X,Y) is a non trivial function of the rescaled time m/n and rescaled
area Y = A/n^{1+1/\alpha}. Our analytical results are verified by numerical
simulations.Comment: 21 pages, 4 Figure
Selected reactive oxygen species and antioxidant enzymes in common bean after Pseudomonas syringae pv. phaseolicola and Botrytis cinerea infection
Phaseolus vulgaris cv. Korona plants were
inoculated with the bacteria Pseudomonas syringae pv.
phaseolicola (Psp), necrotrophic fungus Botrytis cinerea
(Bc) or with both pathogens sequentially. The aim of the
experiment was to determine how plants cope with multiple
infection with pathogens having different attack strategy.
Possible suppression of the non-specific infection with
the necrotrophic fungus Bc by earlier Psp inoculation was
examined. Concentration of reactive oxygen species
(ROS), such as superoxide anion (O2
-) and H2O2 and
activities of antioxidant enzymes such as superoxide dismutase
(SOD), catalase (CAT) and peroxidase (POD) were
determined 6, 12, 24 and 48 h after inoculation. The
measurements were done for ROS cytosolic fraction and
enzymatic cytosolic or apoplastic fraction. Infection with
Psp caused significant increase in ROS levels since the
beginning of experiment. Activity of the apoplastic
enzymes also increased remarkably at the beginning of
experiment in contrast to the cytosolic ones. Cytosolic
SOD and guaiacol peroxidase (GPOD) activities achieved
the maximum values 48 h after treatment. Additional forms
of the examined enzymes after specific Psp infection were
identified; however, they were not present after single Bc
inoculation. Subsequent Bc infection resulted only in
changes of H2O2 and SOD that occurred to be especially
important during plant–pathogen interaction. Cultivar Korona
of common bean is considered to be resistant to Psp and mobilises its system upon infection with these bacteria.
We put forward a hypothesis that the extent of defence
reaction was so great that subsequent infection did not
trigger significant additional response
Skeletal concentrations of lead, cadmium, zinc, and silver in ancient North American Pecos Indians.
Bone samples of 14 prehistoric North American Pecos Indians from circa 1400 A.D. were analyzed for lead, cadmium, zinc, and silver by graphite furnace atomic absorption spectrometry to establish the baseline levels of these elements in an ancient North American population. Measurements of outer and inner bone fractions indicate the former were contaminated postmortem for lead, zinc, and cadmium. The contamination-adjusted average (mean +/- SD) level of lead (expressed as the ratio of atomic lead to atomic calcium) in bones of the Indians was 8.4 +/- 4.4 x 10(-7)), which was similar to ratios in bones of ancient Peruvians (0.9 to 7.7 x 10(-7)) and significantly lower than ratios in bones of modern adults in England and the United States (210 to 350 x 10(-7]. The adjusted average concentrations (microgram per gram dry weight) of biologic cadmium, silver, and zinc in the Pecos Indian bones were 0.032 +/- 0.013, 0.094 +/- 0.044, and 130 +/- 66, as compared to concentrations of 1.8, 0.01 to 0.44, and 75 to 170 in the bones of modern people, respectively. Therefore, cadmium concentrations in Pecos Indian bones are also approximately 50-fold lower than those of contemporary humans. These data support earlier findings that most previously reported natural concentrations of lead in human tissues are erroneously high and indicate that natural concentrations of cadmium are also between one and two orders of magnitude lower than contemporary concentrations
Challenging the heterogeneity of disease presentation in malignant melanoma-impact on patient treatment
There is an increasing global interest to support research areas that can assist in understanding disease and improving patient care. The National Cancer Institute (NIH) has identified precision medicine-based approaches as key research strategies to expedite advances in cancer research. The Cancer Moonshot program ( https://www.cancer.gov/research/key-initiatives/moonshot-cancer-initiative ) is the largest cancer program of all time, and has been launched to accelerate cancer research that aims to increase the availability of therapies to more patients and, ultimately, to eradicate cancer. Mass spectrometry-based proteomics has been extensively used to study the molecular mechanisms of cancer, to define molecular subtypes of tumors, to map cancer-associated protein interaction networks and post-translational modifications, and to aid in the development of new therapeutics and new diagnostic and prognostic tests. To establish the basis for our melanoma studies, we have established the Southern Sweden Malignant Melanoma Biobank. Tissues collected over many years have been accurately characterized with respect to the tumor and patient information. The extreme variability displayed in the protein profiles and the detection of missense mutations has confirmed the complexity and heterogeneity of the disease. It is envisaged that the combined analysis of clinical, histological, and proteomic data will provide patients with a more personalized medical treatment. With respect to disease presentation, targeted treatment and medical mass spectrometry analysis and imaging, this overview report will outline and summarize the current achievements and status within malignant melanoma. We present data generated by our cancer research center in Lund, Sweden, where we have built extensive capabilities in biobanking, proteogenomics, and patient treatments over an extensive time period
- …