54 research outputs found
Mislearning from Censored Data: The Gambler's Fallacy in Optimal-Stopping Problems
I study endogenous learning dynamics for people expecting systematic
reversals from random sequences - the "gambler's fallacy." Biased agents face
an optimal-stopping problem. They are uncertain about the underlying
distribution and learn its parameters from predecessors. Agents stop when early
draws are "good enough," so predecessors' experience contain negative streaks
but not positive streaks. Since biased agents understate the likelihood of
consecutive below-average draws, society converges to over-pessimistic beliefs
about the distribution's mean and stops too early. Agents uncertain about the
distribution's variance overestimate it to an extent that depends on
predecessors' stopping thresholds. Subsidizing search partially mitigates
long-run belief distortions
Validating the predictions of case-based decision theory
Real-life decision-makers typically do not know all possible outcomes arising from alternative courses of action. Instead, when people face a problem, they may rely on the recollection of their past personal experience: the situation, the action taken, and the accompanying consequence. In addition, the applicability of a past experience in decision-making may depend on how similar the current problem is to situations encountered previously. Case-based decision theory (CBDT), proposed by Itzhak Gilboa and David Schmeidler (1995), formalises this type of analogical reasoning. While CBDT is intuitively appealing, only a few experimental and empirical studies have attempted to validate its predictions. This thesis reports two laboratory experiments and an empirical study that attempt to confirm the predictive power of CBDT vis-à-vis Bayesian reasoning
Recommended from our members
Toward a Robust and Universal Crowd Labeling Framework
The advent of fast and economical computers with large electronic storage has led to a large volume of data, most of which is unlabeled. While computers provide expeditious, accurate and low-cost computation, they still lag behind in many tasks that require human intelligence such as labeling medical images, videos or text. Consequently, current research focuses on a combination of computer accuracy and human intelligence to complete labeling task. In most cases labeling needs to be done by domain experts, however, because of the variability in expertise, experience, and intelligence of human beings, experts can be scarce.
As an alternative to using domain experts, help is sought from non-experts, also known as Crowd, to complete tasks that cannot be readily automated. Since crowd labelers are non-expert, multiple labels per instance are acquired for quality purposes. The final label is obtained by com- bining these multiple labels. It is very common that the ground truth, instance difficulty, and the labeler ability are unknown entities. Therefore, the aggregation task becomes a “chicken and egg” problem to start with.
Despite the fact that much research using machine learning and statistical techniques has been conducted in this area (e.g., [Dekel and Shamir, 2009; Hovy et al., 2013a; Liu et al., 2012; Donmez and Carbonell, 2008]), many questions remain unresolved, these include: (a) What are the best ways to evaluate labelers? (b) It is common to use expert-labeled instances (ground truth) to evaluate la- beler ability (e.g., [Le et al., 2010; Khattak and Salleb-Aouissi, 2011; Khattak and Salleb-Aouissi, 2012; Khattak and Salleb-Aouissi, 2013]). The question is, what should be the cardinality of the set of expert-labeled instances to have an accurate evaluation? (c) Which factors other than labeler expertise (e.g., difficulty of instance, prevalence of class, bias of a labeler toward a particular class) can affect the labeling accuracy? (d) Is there any optimal way to combine multiple labels to get the
best labeling accuracy? (e) Should the labels provided by oppositional/malicious labelers be dis- carded and blocked? Or is there a way to use the “information” provided by oppositional/malicious labelers? (f) How can labelers and instances be evaluated if the ground truth is not known with certitude?
In this thesis, we investigate these questions. We present methods that rely on few expert-labeled instances (usually 0.1% -10% of the dataset) to evaluate various parameters using a frequentist and a Bayesian approach. The estimated parameters are then used for label aggregation to produce one final label per instance.
In the first part of this thesis, we propose a method called Expert Label Injected Crowd Esti- mation (ELICE) and extend it to different versions and variants. ELICE is based on a frequentist approach for estimating the underlying parameters. The first version of ELICE estimates the pa- rameters i.e., labeler expertise and data instance difficulty, using the accuracy of crowd labelers on expert-labeled instances [Khattak and Salleb-Aouissi, 2011; Khattak and Salleb-Aouissi, 2012]. The multiple labels for each instance are combined using weighted majority voting. These weights are the scores of labeler reliability on any given instance, which are obtained by inputting the pa- rameters in the logistic function.
In the second version of ELICE [Khattak and Salleb-Aouissi, 2013], we introduce entropy as a way to estimate the uncertainty of labeling. This provides an advantage of differentiating between good, random and oppositional/malicious labelers. The aggregation of labels for ELICE version 2 flips the label (for binary classification) provided by the oppositional/malicious labeler thus utilizing the information that is generally discarded by other labeling methodologies.
Both versions of ELICE have a cluster-based variant in which rather than making a random choice of instances from the whole dataset, clusters of data are first formed using any clustering approach e.g., K-means. Then an equal number of instances from each cluster are chosen randomly to get expert-labels. This is done to ensure equal representation of each class in the test dataset.
Besides taking advantage of expert-labeled instances, the third version of ELICE [Khattak and Salleb-Aouissi, 2016], incorporates pairwise/circular comparison of labelers to labelers and in- stances to instances. The idea here is to improve accuracy by using the crowd labels, which unlike expert-labels, are available for the whole dataset and may provide a more comprehensive view of the labeler ability and instance difficulty. This is especially helpful for the case when the domain
experts do not agree on one label and ground truth is not known for certain. Therefore, incorporating more information beyond expert labels can provide better results.
We test the performance of ELICE on simulated labels as well as real labels obtained from Amazon Mechanical Turk. Results show that ELICE is effective as compared to state-of-the-art methods. All versions and variants of ELICE are capable of delaying phase transition. The main contribution of ELICE is that it makes the use of all possible information available from crowd and experts. Next, we also present a theoretical framework to estimate the number of expert-labeled instances needed to achieve certain labeling accuracy. Experiments are presented to demonstrate the utility of the theoretical bound.
In the second part of this thesis, we present Crowd Labeling Using Bayesian Statistics (CLUBS) [Khattak and Salleb-Aouissi, 2015; Khattak et al., 2016b; Khattak et al., 2016a], a new approach for crowd labeling to estimate labeler and instance parameters along with label aggregation. Our approach is inspired by Item Response Theory (IRT). We introduce new parameters and refine the existing IRT parameters to fit the crowd labeling scenario. The main challenge is that unlike IRT, in the crowd labeling case, the ground truth is not known and has to be estimated based on the parameters. To overcome this challenge, we acquire expert-labels for a small fraction of instances in the dataset. Our model estimates the parameters based on the expert-labeled instances. The estimated parameters are used for weighted aggregation of crowd labels for the rest of the dataset. Experiments conducted on synthetic data and real datasets with heterogeneous quality crowd-labels show that our methods perform better than many state-of-the-art crowd labeling methods.
We also conduct significance tests between our methods and other state-of-the-art methods to check the significance of the accuracy of these methods. The results show the superiority of our method in most cases. Moreover, we present experiments to demonstrate the impact of the accuracy of final aggregated labels when used as training data. The results essentially emphasize the need for high accuracy of the aggregated labels.
In the last part of the thesis, we present past and contemporary research related to crowd la- beling. We conclude with future of crowd labeling and further research directions. To summarize, in this thesis, we have investigated different methods for estimating crowd labeling parameters and using them for label aggregation. We hope that our contribution will be useful to the crowd labeling community
Homeostatic epistemology : reliability, coherence and coordination in a Bayesian virtue epistemology
How do agents with limited cognitive capacities flourish in informationally impoverished or unexpected circumstances? Aristotle argued that human flourishing emerged from knowing about the world and our place within it. If he is right, then the virtuous processes that produce knowledge, best explain flourishing. Influenced by Aristotle, virtue epistemology defends an analysis of knowledge where beliefs are evaluated for their truth and the intellectual virtue or competences relied on in their creation. However, human flourishing may emerge from how degrees of ignorance are managed in an uncertain world. Perhaps decision-making in the shadow of knowledge best explains human wellbeing—a Bayesian approach? In this dissertation I argue that a hybrid of virtue and Bayesian epistemologies explains human flourishing—what I term homeostatic epistemology. \ud
\ud
Homeostatic epistemology supposes that an agent has a rational credence p when p is the product of reliable processes aligned with the norms of probability theory; whereas an agent knows that p when a rational credence p is the product of reliable processes such that: 1) p meets some relevant threshold for belief (such that the agent acts as though p were true and indeed p is true), 2) p coheres with a satisficing set of relevant beliefs and, 3) the relevant set of beliefs is coordinated appropriately to meet the integrated aims of the agent. \ud
\ud
Homeostatic epistemology recognizes that justificatory relationships between beliefs are constantly changing to combat uncertainties and to take advantage of predictable circumstances. Contrary to holism, justification is built up and broken down across limited sets like the anabolic and catabolic processes that maintain homeostasis in the cells, organs and systems of the body. It is the coordination of choristic sets of reliably produced beliefs that create the greatest flourishing given the limitations inherent in the situated agent. \u
A critical analysis of the role of statistical significance testing in education research: With special attention to mathematics education
This study analyzes the role of statistical significance testing (SST) in education. Although the basic logic underlying SST 一 a hypothesis is rejected because the observed data would be very unlikely if the hypothesis is true 一 appears so obvious that many people are tempted to accept it, it is in fact fallacious. In the light of its historical background and conceptual development, discussed in Chapter 2, the Fisher’s significance testing, Neyman-Pearson hypothesis testing and their hybrids are clearly distinguished. We argue that the probability of obtaining the observed or more extreme outcomes (p value) can hardly act as a measure of the strength of evidence against the null hypothesis. After discussing the five major interpretations of probability, we conclude that if we do not accept the subjective theory of probability, talking about the probability of a hypothesis that is not the outcome of a chance process is unintelligible. But the subjective theory itself has many intractable difficulties that can hardly be resolved. If we insist on assigning a probability value to a hypothesis in the same way as we assign one to a chance event, we have to accept that it is the hypothesis with low probability, rather than high probability, that we should aim at when conducting scientific research. More important, the inferences behind SST are shown to be fallacious from three different perspectives. The attempt to invoke the likelihood ratio with the observed or more extreme data instead of the probability of a hypothesis in defending the use of р value as a measure of the strength of evidence against the null hypothesis is also shown to be misleading because it can be demonstrated that the use of tail region to represent a result that is actually on the border would overstate the evidence against the ทน11 hypothesis.Although Neyman-Pearson hypothesis testing does not involve the concept of the probability of a hypothesis, it does have some other serious problems that can hardly be resolved. We show that it cannot address researchers' genuine concerns. By explaining why the level of significance must be specified or fixed prior to the analysis of data and why a blurring of the distinction between the р value and the significance level would lead to undesirable consequences, we conclude that the Neyman-Pearson hypothesis testing cannot provide an effective means for rejecting false hypotheses. After a thorough discussion of common misconceptions associated with SST and the major arguments for and against SST, we conclude that SST has insurmountable problems that could misguide the research paradigm although some other criticisms on SST are not really as justified. We also analyze various proposed alternatives to SST and conclude that confidence intervals (CIs) are no better than SST for the purpose of testing hypotheses and it is unreasonable to expect the existence of a statistical test that could provide researchers with algorithms or rigid rules by conforming to which all problems about testing hypotheses could be solved. Finally, we argue that falsificationism could eschew the disadvantages of SST and other similar statistical inductive inferences and we discuss how it could bring education research into a more fruitful situation in which to their practices. Although we pay special attention to mathematics education, the core of the discussion in the thesis might apply equally to other educational contexts
General Course Catalog [July-December 2020]
Undergraduate Course Catalog, July-December 2020https://repository.stcloudstate.edu/undergencat/1132/thumbnail.jp
- …