92 research outputs found

    Beyond Condorcet: Optimal Aggregation Rules Using Voting Records

    Get PDF
    The difficulty of optimal decision making in uncertain dichotomous choice settings is that it requires information on the expertise of the decision makers (voters). This paper presents a method of optimally weighting voters even without testing them against questions with known right answers. The method is based on the realization that if we can see how voters vote on a variety of questions, it is possible to gauge their respective degrees of expertise by comparing their votes in a suitable fashion, even without knowing the right answers.

    Beyond Condorcet: Optimal Aggregation Rules Using Voting Records

    Get PDF
    In certain judgmental situations where a “correct” decision is presumed to exist, optimal decision making requires evaluation of the decision-maker's capabilities and the selection of the appropriate aggregation rule. The major and so far unresolved difficulty is the former necessity. This paper presents the optimal aggregation rule that simultaneously satisfies these two interdependent necessary requirements. In our setting, some record of the voters' past decisions is available, but the correct decisions are not known. We observe that any arbitrary evaluation of the decision-maker's capabilities as probabilities yields some optimal aggregation rule that, in turn, yields a maximum-likelihood estimation of decisional skills. Thus, a skill-evaluation equilibrium can be defined as an evaluation of decisional skills that yields itself as a maximum-likelihood estimation of decisional skills. We show that such equilibrium exists and offer a procedure for finding one. The obtained equilibrium is locally optimal and is shown empirically to generally be globally optimal in terms of the correctness of the resulting collective decisions. Interestingly, under minimally competent (almost symmetric) skill distributions that allow unskilled decision makers, the optimal rule considerably outperforms the common simple majority rule (SMR). Furthermore, a sufficient record of past decisions ensures that the collective probability of making a correct decision converges to 1, as opposed to accuracy of about 0.7 under SMR. Our proposed optimal voting procedure relaxes the fundamental (and sometimes unrealistic) assumptions in Condorcet celebrated theorem and its extensions, such as sufficiently high decision-making quality, skill homogeneity or existence of a sufficiently large group of decision makers.

    DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew

    Full text link
    We present DictaBERT, a new state-of-the-art pre-trained BERT model for modern Hebrew, outperforming existing models on most benchmarks. Additionally, we release two fine-tuned versions of the model, designed to perform two specific foundational tasks in the analysis of Hebrew texts: prefix segmentation and morphological tagging. These fine-tuned models allow any developer to perform prefix segmentation and morphological tagging of a Hebrew sentence with a single call to a HuggingFace model, without the need to integrate any additional libraries or code. In this paper we describe the details of the training as well and the results on the different benchmarks. We release the models to the community, along with sample code demonstrating their use. We release these models as part of our goal to help further research and development in Hebrew NLP

    Identification of Parallel Passages Across a Large Hebrew/Aramaic Corpus

    Full text link
    We propose a method for efficiently finding all parallel passages in a large corpus, even if the passages are not quite identical due to rephrasing and orthographic variation. The key ideas are the representation of each word in the corpus by its two most infrequent letters, finding matched pairs of strings of four or five words that differ by at most one word and then identifying clusters of such matched pairs. Using this method, over 4600 parallel pairs of passages were identified in the Babylonian Talmud, a Hebrew-Aramaic corpus of over 1.8 million words, in just over 30 seconds. Empirical comparisons on sample data indicate that the coverage obtained by our method is essentially the same as that obtained using slow exhaustive methods.Comment: Submission to the Journal of Data Mining and Digital Humanities (Special Issue on Computer-Aided Processing of Intertextuality in Ancient Languages
    corecore