10 research outputs found
TOWARDS WORD SENSES AND LINKS BETWEEN THEM
In this study, we demonstrate an unsupervised approach for constructing a semantic network uniting word senses (or word concepts) rather than the coarse-grained con-cepts. The reported study was funded by RFBR (project no. 16-37-00354 ΠΌΠΎΠ»_a) and by RFH (project no. 16-04-12019).ΠΡΡΠ»Π΅Π΄ΠΎΠ²Π°Π½ΠΈΠ΅ Π²ΡΠΏΠΎΠ»Π½Π΅Π½ΠΎ ΠΏΡΠΈ ΡΠΈΠ½Π°Π½ΡΠΎΠ²ΠΎΠΉ ΠΏΠΎΠ΄Π΄Π΅ΡΠΆΠΊΠ΅ Π Π€Π€Π Π² ΡΠ°ΠΌΠΊΠ°Ρ
Π½Π°ΡΡ-Π½ΠΎΠ³ΠΎ ΠΏΡΠΎΠ΅ΠΊΡΠ° β 16-37-00354 ΠΌΠΎΠ»_Π° ΠΈ ΠΏΡΠΈ ΡΠΈΠ½Π°Π½ΡΠΎΠ²ΠΎΠΉ ΠΏΠΎΠ΄Π΄Π΅ΡΠΆΠΊΠ΅ Π ΠΠΠ€ Π² ΡΠ°ΠΌΠΊΠ°Ρ
Π½Π°ΡΡΠ½ΠΎΠ³ΠΎ ΠΏΡΠΎΠ΅ΠΊΡΠ° β 16-04-12019 Β«ΠΠ½ΡΠ΅Π³ΡΠ°ΡΠΈΡ ΡΠ΅Π·Π°ΡΡΡΡΠΎΠ² RussNet ΠΈ YARNΒ»
CROWDSOURCING AS A HUMAN-COMPUTER SYSTEM WITH FEEDBACK
Crowdsourcing is an established approach for such problems as data gathering, annotation, cleaning, etc. Given a set of simple and verifiable tasks, many participants execute them voluntarily or on a paid basis. Since the resources are constrained, it is crucial to evaluate the effort of each participant and to focus the crowdsourcing process. We discuss the representation of crowdsourcing as a human-computer system with feedback and propose a reference model of such a system.Π Π΅Π°Π»ΠΈΠ·Π°ΡΠΈΡ ΠΏΡΠ΅Π΄Π»ΠΎΠΆΠ΅Π½Π½ΠΎΠ³ΠΎ ΠΏΠΎΠ΄Ρ
ΠΎΠ΄Π° Π²ΡΠΏΠΎΠ»Π½ΡΠ΅ΡΡΡ Π² ΡΠ°ΠΌΠΊΠ°Ρ
ΠΎΡΠΊΡΡΡΠΎΠ³ΠΎ ΠΏΡΠΎΠ΅ΠΊΡΠ° Yet Another RussNet [1]. Π Π°Π±ΠΎΡΠ° ΠΏΠΎΠ΄Π΄Π΅ΡΠΆΠ°Π½Π° Π³ΡΠ°Π½ΡΠΎΠΌ Π ΠΠΠ€ β 13-04-12020 Β«ΠΠΎΠ²ΡΠΉ ΠΎΡΠΊΡΡΡΡΠΉ ΡΠ»Π΅ΠΊΡΡΠΎΠ½Π½ΡΠΉ ΡΠ΅Π·Π°ΡΡΡΡ ΡΡΡΡΠΊΠΎΠ³ΠΎ ΡΠ·ΡΠΊΠ°Β»
ΠΠΎΠ»Π»Π΅ΠΊΡΠΈΠ²Π½ΡΠ΅ ΠΏΠΎΡΠΎΠΊΠΎΠ²ΡΠ΅ Π²ΡΡΠΈΡΠ»Π΅Π½ΠΈΡ: ΡΠ΅Π»ΡΡΠΈΠΎΠ½Π½ΡΠ΅ ΠΌΠΎΠ΄Π΅Π»ΠΈ ΠΈ Π°Π»Π³ΠΎΡΠΈΡΠΌΡ
Recently, microtask crowdsourcing has become a popular approach for addressing various data mining problems. Crowdsourcing workflows for approaching such problems are composed of several data processing stages which require consistent representation for making the work reproducible. This paper is devoted to the problem of reproducibility and formalization of the microtask crowdsourcing process. A computational model for microtask crowdsourcing based on an extended relational model and a dataflow computational model has been proposed. The proposed collaborative dataflow computational model is designed for processing the input data sources by executing annotation stages and automatic synchronization stages simultaneously. Data processing stages and connections between them are expressed by using collaborative computation workflows represented as loosely connected directed acyclic graphs. A synchronous algorithm for executing such workflows has been described. The computational model has been evaluated by applying it to two tasks from the computational linguistics field: concept lexicalization refining in electronic thesauri and establishing hierarchical relations between such concepts. The βAddβRemoveβConfirmβ procedure is designed for adding the missing lexemes to the concepts while removing the odd ones. The βGenusβSpeciesβMatchβ procedure is designed for establishing βis-aβ relations between the concepts provided with the corresponding word pairs. The experiments involving both volunteers from popular online social networks and paid workers from crowdsourcing marketplaces confirm applicability of these procedures for enhancing lexical resources.Β Π ΠΏΠΎΡΠ»Π΅Π΄Π½Π΅Π΅ Π²ΡΠ΅ΠΌΡ ΠΊΡΠ°ΡΠ΄ΡΠΎΡΡΠΈΠ½Π³ Π½Π° ΠΎΡΠ½ΠΎΠ²Π΅ Π²ΡΠΏΠΎΠ»Π΅Π½ΠΈΡ ΠΌΠΈΠΊΡΠΎΠ·Π°Π΄Π°Ρ ΠΏΠΎΠ»ΡΡΠΈΠ» ΡΠΈΡΠΎΠΊΠΎΠ΅ ΠΏΡΠΈΠΌΠ΅Π½Π΅Π½ΠΈΠ΅ Π² ΠΎΠ±Π»Π°ΡΡΠΈ Π°Π½Π°Π»ΠΈΠ·Π° Π½Π΅ΡΡΡΡΠΊΡΡΡΠΈΡΠΎΠ²Π°Π½Π½ΡΡ
Π΄Π°Π½Π½ΡΡ
. Π Π°Π·ΡΠ°Π±Π°ΡΡΠ²Π°ΡΡΡΡ ΡΠΏΠ΅ΡΠΈΠ°Π»ΠΈΠ·ΠΈΡΠΎΠ²Π°Π½Π½ΡΠ΅ ΠΌΠ΅ΡΠΎΠ΄ΠΈΠΊΠΈ, ΡΠΎΡΡΠΎΡΡΠΈΠ΅ ΠΈΠ· ΠΌΠ½ΠΎΠΆΠ΅ΡΡΠ²Π° ΡΡΠ°ΠΏΠΎΠ² ΠΎΠ±ΡΠ°Π±ΠΎΡΠΊΠΈ ΠΈΡΡ
ΠΎΠ΄Π½ΡΡ
Π΄Π°Π½Π½ΡΡ
, ΡΡΠ΅Π±ΡΡΡΠΈΡ
ΡΠΎΠ³Π»Π°ΡΠΎΠ²Π°Π½Π½ΠΎΡΡΠΈ ΠΈΡ
ΠΏΡΠ΅Π΄ΡΡΠ°Π²Π»Π΅Π½ΠΈΡ Π΄Π»Ρ ΠΎΠ±Π΅ΡΠΏΠ΅ΡΠ΅Π½ΠΈΡ Π²ΠΎΡΠΏΡΠΎΠΈΠ·Π²ΠΎΠ΄ΠΈΠΌΠΎΡΡΠΈ ΡΠ°Π±ΠΎΡΡ. ΠΠ°Π½Π½Π°Ρ ΡΡΠ°ΡΡΡ ΠΏΠΎΡΠ²ΡΡΠ΅Π½Π° ΡΠ΅ΡΠ΅Π½ΠΈΡ ΠΏΡΠΎΠ±Π»Π΅ΠΌΡ Π²ΠΎΡΠΏΡΠΎΠΈΠ·Π²ΠΎΠ΄ΠΈΠΌΠΎΡΡΠΈ ΠΈ ΡΠΎΡΠΌΠ°Π»ΠΈΠ·Π°ΡΠΈΠΈ ΠΏΡΠΎΡΠ΅ΡΡΠ° ΠΊΡΠ°ΡΠ΄ΡΠΎΡΡΠΈΠ½Π³Π° ΠΌΠΈΠΊΡΠΎΠ·Π°Π΄Π°ΡΠ°ΠΌΠΈ. ΠΡΠ΅Π΄Π»ΠΎΠΆΠ΅Π½Π° ΠΌΠΎΠ΄Π΅Π»Ρ ΠΊΠΎΠ»Π»Π΅ΠΊΡΠΈΠ²Π½ΡΡ
ΠΏΠΎΡΠΎΠΊΠΎΠ²ΡΡ
Π²ΡΡΠΈΡΠ»Π΅Π½ΠΈΠΈΜ Π½Π° ΠΎΡΠ½ΠΎΠ²Π΅ ΡΠ°ΡΡΠΈΡΠ΅Π½Π½ΠΎΠΈΜ ΡΠ΅Π»ΡΡΠΈΠΎΠ½Π½ΠΎΠΈΜ ΠΌΠΎΠ΄Π΅Π»ΠΈ ΠΈ ΠΏΠΎΡΠΎΠΊΠΎΠ²ΠΎΠΈΜ ΠΌΠΎΠ΄Π΅Π»ΠΈ Π²ΡΡΠΈΡΠ»Π΅Π½ΠΈΠΈΜ. ΠΠΎΠ΄Π΅Π»Ρ ΠΏΡΠ΅Π΄Π½Π°Π·Π½Π°ΡΠ΅Π½Π° Π΄Π»Ρ ΠΎΠ±ΡΠ°Π±ΠΎΡΠΊΠΈ ΠΈΡΡ
ΠΎΠ΄Π½ΡΡ
Π΄Π°Π½Π½ΡΡ
Π² Π²ΠΈΠ΄Π΅ ΡΠ΅Π»ΡΡΠΈΠΎΠ½Π½ΡΡ
ΠΎΡΠ½ΠΎΡΠ΅Π½ΠΈΠΈΜ ΠΏΡΡΠ΅ΠΌ ΠΏΠ°ΡΠ°Π»Π»Π΅Π»ΡΠ½ΠΎΠ³ΠΎ Π²ΡΠΏΠΎΠ»Π½Π΅Π½ΠΈΡ ΡΡΠ°ΠΏΠΎΠ² ΡΠ°Π·ΠΌΠ΅ΡΠΊΠΈ ΠΌΠΈΠΊΡΠΎΠ·Π°Π΄Π°ΡΠ°ΠΌΠΈ ΠΈ ΡΡΠ°ΠΏΠΎΠ² Π°Π²ΡΠΎΠΌΠ°ΡΠΈΡΠ΅ΡΠΊΠΎΠΈΜ ΡΠΈΠ½Ρ
ΡΠΎΠ½ΠΈΠ·Π°ΡΠΈΠΈ. ΠΡΠ°ΠΏΡ ΠΎΠ±ΡΠ°Π±ΠΎΡΠΊΠΈ Π΄Π°Π½Π½ΡΡ
ΠΈ ΡΠ²ΡΠ·ΠΈ ΠΌΠ΅ΠΆΠ΄Ρ Π½ΠΈΠΌΠΈ Π·Π°ΠΏΠΈΡΡΠ²Π°ΡΡΡΡ Ρ ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΠ΅ΠΌ ΡΡ
Π΅ΠΌΡ ΠΊΠΎΠ»Π»Π΅ΠΊΡΠΈΠ²Π½ΡΡ
Π²ΡΡΠΈΡΠ»Π΅Π½ΠΈΠΈΜ, ΠΏΡΠ΅Π΄ΡΡΠ°Π²Π»ΡΡΡΠ΅ΠΈΜ ΡΠΎΠ±ΠΎΠΈΜ ΡΠ»Π°Π±ΠΎ ΡΠ²ΡΠ·Π½ΡΠΈΜ ΠΎΡΠΈΠ΅Π½ΡΠΈΡΠΎΠ²Π°Π½Π½ΡΠΈΜ Π°ΡΠΈΠΊΠ»ΠΈΡΠ΅ΡΠΊΠΈΠΈΜ Π³ΡΠ°Ρ. ΠΠΏΠΈΡΠ°Π½ ΡΠΈΠ½Ρ
ΡΠΎΠ½Π½ΡΠΈΜ Π°Π»Π³ΠΎΡΠΈΡΠΌ Π²ΡΠΏΠΎΠ»Π½Π΅Π½ΠΈΡ ΡΡ
Π΅ΠΌ ΠΊΠΎΠ»Π»Π΅ΠΊΡΠΈΠ²Π½ΡΡ
Π²ΡΡΠΈΡΠ»Π΅Π½ΠΈΠΈΜ. ΠΡΠΎΠ΄Π΅ΠΌΠΎΠ½ΡΡΡΠΈΡΠΎΠ²Π°Π½Ρ ΠΏΡΠΈΠ»ΠΎΠΆΠ΅Π½ΠΈΡ ΠΌΠΎΠ΄Π΅Π»ΠΈ Π² ΠΎΠ±Π»Π°ΡΡΠΈ ΠΊΠΎΠΌΠΏΡΡΡΠ΅ΡΠ½ΠΎΠΈΜ Π»ΠΈΠ½Π³Π²ΠΈΡΡΠΈΠΊΠΈ Π΄Π»Ρ ΡΡΠΎΡΠ½Π΅Π½ΠΈΡ Π»Π΅ΠΊΡΠΈΠΊΠ°Π»ΠΈΠ·Π°ΡΠΈΠΈ ΠΏΠΎΠ½ΡΡΠΈΠΈΜ Π² ΡΠ»Π΅ΠΊΡΡΠΎΠ½Π½ΡΡ
ΡΠ΅Π·Π°ΡΡΡΡΠ°Ρ
ΠΈ ΠΏΠΎΡΡΡΠΎΠ΅Π½ΠΈΡ ΡΠΎΠ΄ΠΎ-Π²ΠΈΠ΄ΠΎΠ²ΡΡ
ΠΎΡΠ½ΠΎΡΠ΅Π½ΠΈΠΈΜ ΠΌΠ΅ΠΆΠ΄Ρ ΠΏΠΎΠ½ΡΡΠΈΡΠΌΠΈ ΠΏΡΠΈ ΠΏΠΎΠΌΠΎΡΠΈ ΠΊΡΠ°ΡΠ΄ΡΠΎΡΡΠΈΠ½Π³Π°. ΠΡΠΎΡΠ΅Π΄ΡΡΠ° Β«Π΄ΠΎΠ±Π°Π²ΠΈΡΡβΡΠ΄Π°Π»ΠΈΡΡβΠΏΠΎΠ΄ΡΠ²Π΅ΡΠ΄ΠΈΡΡΒ» ΠΏΠΎΠ·Π²ΠΎΠ»ΡΠ΅Ρ Π²Π½Π΅ΡΡΠΈ Π² Π»Π΅ΠΊΡΠΈΠΊΠ°Π»ΠΈΠ·Π°ΡΠΈΡ ΠΏΠΎΠ½ΡΡΠΈΠΈΜ Π½Π΅Π΄ΠΎΡΡΠ°ΡΡΠΈΠ΅ Π»Π΅ΠΊΡΠ΅ΠΌΡ ΠΈ ΠΈΡΠΊΠ»ΡΡΠΈΡΡ ΠΏΠΎΡΡΠΎΡΠΎΠ½Π½ΠΈΠ΅. ΠΡΠΎΡΠ΅Π΄ΡΡΠ° Β«ΡΠΎΠ΄βΠ²ΠΈΠ΄βΡΠΎΠΏΠΎΡΡΠ°Π²ΠΈΡΡΒ» ΠΏΠΎΠ·Π²ΠΎΠ»ΡΠ΅Ρ ΡΡΠΎΡΠΌΠΈΡΠΎΠ²Π°ΡΡ Π³ΠΈΠΏΠΎ-Π³ΠΈΠΏΠ΅ΡΠΎΠ½ΠΈΠΌΠΈΡΠ΅ΡΠΊΠΈΠ΅ ΠΎΡΠ½ΠΎΡΠ΅Π½ΠΈΡ ΠΌΠ΅ΠΆΠ΄Ρ ΠΏΠΎΠ½ΡΡΠΈΡΠΌΠΈ Π½Π° ΠΎΡΠ½ΠΎΠ²Π΅ ΡΠΎΠΎΡΠ²Π΅ΡΡΡΠ²ΡΡΡΠΈΡ
ΡΠΎΠ΄ΠΎ-Π²ΠΈΠ΄ΠΎΠ²ΡΡ
ΠΏΠ°Ρ ΡΠ»ΠΎΠ². Π Π΅Π·ΡΠ»ΡΡΠ°ΡΡ ΡΠΊΡΠΏΠ΅ΡΠΈΠΌΠ΅Π½ΡΠΎΠ² Π½Π° ΠΌΠ°ΡΠ΅ΡΠΈΠ°Π»Π°Ρ
ΠΎΡΠΊΡΡΡΠΎΠ³ΠΎ ΡΠ»Π΅ΠΊΡΡΠΎΠ½Π½ΠΎΠ³ΠΎ ΡΠ΅Π·Π°ΡΡΡΡΠ° ΡΡΡΡΠΊΠΎΠ³ΠΎ ΡΠ·ΡΠΊΠ° ΠΏΠΎΠ΄ΡΠ²Π΅ΡΠΆΠ΄Π°ΡΡ ΠΏΡΠΈΠΌΠ΅Π½ΠΈΠΌΠΎΡΡΡ ΡΠ°Π·ΡΠ°Π±ΠΎΡΠ°Π½Π½ΡΡ
ΠΏΡΠΎΡΠ΅Π΄ΡΡ Π΄Π»Ρ ΡΠ°Π·Π²ΠΈΡΠΈΡ Π»Π΅ΠΊΡΠΈΡΠ΅ΡΠΊΠΈΡ
ΡΠ΅ΡΡΡΡΠΎΠ². Π ΡΠΊΡΠΏΠ΅ΡΠΈΠΌΠ΅Π½ΡΠ°Ρ
ΠΏΡΠΈΠ½ΡΠ»ΠΈ ΡΡΠ°ΡΡΠΈΠ΅ ΠΊΠ°ΠΊ Π²ΠΎΠ»ΠΎΠ½ΡΠ΅ΡΡ ΠΈΠ· ΠΏΠΎΠΏΡΠ»ΡΡΠ½ΡΡ
ΡΠΎΡΠΈΠ°Π»ΡΠ½ΡΡ
ΡΠ΅ΡΠ΅ΠΈΜ, ΡΠ°ΠΊ ΠΈ ΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°ΡΠ΅Π»ΠΈ Π±ΠΈΡΠΆ ΠΊΡΠ°ΡΠ΄ΡΠΎΡΡΠΈΠ½Π³Π° (Π·Π° Π²ΠΎΠ·Π½Π°Π³ΡΠ°ΠΆΠ΄Π΅Π½ΠΈΠ΅ Π² ΡΠΎΡΠΌΠ΅ ΠΌΠΈΠΊΡΠΎΠΏΠ»Π°ΡΠ΅ΠΆΠ΅ΠΈΜ).
What can crowd computing do for the next generation of AI systems?
The unprecedented rise in the adoption of artificial intelligence techniques and automation in many contexts is concomitant with shortcomings of such technology with respect to robustness, interpretability, usability, and trustworthiness. Crowd computing offers a viable means to leverage human intelligence at scale for data creation, enrichment, and interpretation, demonstrating a great potential to improve the performance of AI systems and increase the adoption of AI in general. Existing research and practice has mainly focused on leveraging crowd computing for training data creation. However, this perspective is rather limiting in terms of how AI can fully benefit from crowd computing. In this vision paper, we identify opportunities in crowd computing to propel better AI technology, and argue that to make such progress, fundamental problems need to be tackled from both computation and interaction standpoints. We discuss important research questions in both these themes, with an aim to shed light on the research needed to pave a future where humans and AI can work together seamlessly, while benefiting from each other.</p
Improving hypernymy extraction with distributional semantic classes
In this paper, we show how distributionally-induced semantic classes can be helpful for extracting hypernyms. We present methods for inducing sense-aware semantic classes using distributional semantics and using these induced semantic classes for filtering noisy hypernymy relations. Denoising of hypernyms is performed by labeling each semantic class with its hypernyms. On the one hand, this allows us to filter out wrong extractions using the global structure of distributionally similar senses. On the other hand, we infer missing hypernyms via label propagation to cluster terms. We conduct a large-scale crowdsourcing study showing that processing of automatically extracted hypernyms using our approach improves the quality of the hypernymy extraction in terms of both precision and recall. Furthermore, we show the utility of our method in the domain taxonomy induction task, achieving the state-of-the-art results on a SemEval'16 task on taxonomy induction
Unsupervised, knowledge-free, and interpretable word sense disambiguation
Interpretability of a predictive model is a powerful feature that gains the trust of users in the correctness of the predictions. In word sense disambiguation (WSD), knowledge-based systems tend to be much more interpretable than knowledge-free counterparts as they rely on the wealth of manually-encoded elements representing word senses, such as hypernyms, usage examples, and images. We present a WSD system that bridges the gap between these two so far disconnected groups of methods. Namely, our system, providing access to several state-of-the-art WSD models, aims to be interpretable as a knowledge-based system while it remains completely unsupervised and knowledge-free. The presented tool features a Web interface for all-word disambiguation of texts that makes the sense predictions human readable by providing interpretable word sense inventories, sense representations, and disambiguation results. We provide a public API, enabling seamless integration