92 research outputs found
Beyond Condorcet: Optimal Aggregation Rules Using Voting Records
The difficulty of optimal decision making in uncertain dichotomous choice settings is that it requires information on the expertise of the decision makers (voters). This paper presents a method of optimally weighting voters even without testing them against questions with known right answers. The method is based on the realization that if we can see how voters vote on a variety of questions, it is possible to gauge their respective degrees of expertise by comparing their votes in a suitable fashion, even without knowing the right answers.
Beyond Condorcet: Optimal Aggregation Rules Using Voting Records
In certain judgmental situations where a “correct” decision is presumed to exist, optimal decision making requires evaluation of the decision-maker's capabilities and the selection of the appropriate aggregation rule. The major and so far unresolved difficulty is the former necessity. This paper presents the optimal aggregation rule that simultaneously satisfies these two interdependent necessary requirements. In our setting, some record of the voters' past decisions is available, but the correct decisions are not known. We observe that any arbitrary evaluation of the decision-maker's capabilities as probabilities yields some optimal aggregation rule that, in turn, yields a maximum-likelihood estimation of decisional skills. Thus, a skill-evaluation equilibrium can be defined as an evaluation of decisional skills that yields itself as a maximum-likelihood estimation of decisional skills. We show that such equilibrium exists and offer a procedure for finding one. The obtained equilibrium is locally optimal and is shown empirically to generally be globally optimal in terms of the correctness of the resulting collective decisions. Interestingly, under minimally competent (almost symmetric) skill distributions that allow unskilled decision makers, the optimal rule considerably outperforms the common simple majority rule (SMR). Furthermore, a sufficient record of past decisions ensures that the collective probability of making a correct decision converges to 1, as opposed to accuracy of about 0.7 under SMR. Our proposed optimal voting procedure relaxes the fundamental (and sometimes unrealistic) assumptions in Condorcet celebrated theorem and its extensions, such as sufficiently high decision-making quality, skill homogeneity or existence of a sufficiently large group of decision makers.
DictaBERT: A State-of-the-Art BERT Suite for Modern Hebrew
We present DictaBERT, a new state-of-the-art pre-trained BERT model for
modern Hebrew, outperforming existing models on most benchmarks. Additionally,
we release two fine-tuned versions of the model, designed to perform two
specific foundational tasks in the analysis of Hebrew texts: prefix
segmentation and morphological tagging. These fine-tuned models allow any
developer to perform prefix segmentation and morphological tagging of a Hebrew
sentence with a single call to a HuggingFace model, without the need to
integrate any additional libraries or code. In this paper we describe the
details of the training as well and the results on the different benchmarks. We
release the models to the community, along with sample code demonstrating their
use. We release these models as part of our goal to help further research and
development in Hebrew NLP
Identification of Parallel Passages Across a Large Hebrew/Aramaic Corpus
We propose a method for efficiently finding all parallel passages in a large
corpus, even if the passages are not quite identical due to rephrasing and
orthographic variation. The key ideas are the representation of each word in
the corpus by its two most infrequent letters, finding matched pairs of strings
of four or five words that differ by at most one word and then identifying
clusters of such matched pairs. Using this method, over 4600 parallel pairs of
passages were identified in the Babylonian Talmud, a Hebrew-Aramaic corpus of
over 1.8 million words, in just over 30 seconds. Empirical comparisons on
sample data indicate that the coverage obtained by our method is essentially
the same as that obtained using slow exhaustive methods.Comment: Submission to the Journal of Data Mining and Digital Humanities
(Special Issue on Computer-Aided Processing of Intertextuality in Ancient
Languages
- …