34 research outputs found
A Dataset of Argumentative Dialogues on Scientific Papers
With recent advances in question-answering models, various datasets have been collected to improve and study the effectiveness of these models on scientific texts. Questions and answers in these datasets explore a scientific paper by seeking factual information from the paper’s content. However, these datasets do not tackle the argumentative content of scientific papers, which is of huge importance in persuasiveness of a scientific discussion. We introduce ArgSciChat, a dataset of 41 argumentative dialogues between scientists on 20 NLP papers. The unique property of our dataset is that it includes both exploratory and argumentative questions and answers in a dialogue discourse on a scientific paper. Moreover, the size of ArgSciChat demonstrates the difficulties in collecting dialogues for specialized domains. Thus, our dataset is a challenging resource to evaluate dialogue agents in low-resource domains, in which collecting training data is costly. We annotate all sentences of dialogues in ArgSciChat and analyze them extensively. The results confirm that dialogues in ArgSciChat include exploratory and argumentative interactions. Furthermore, we use our dataset to fine-tune and evaluate a pre-trained document-grounded dialogue agent. The agent achieves a low performance on our dataset, motivating a need for dialogue agents with a capability to reason and argue about their answers. We publicly release ArgSciChat
Mining legal arguments in court decisions
Identifying, classifying, and analyzing arguments in legal discourse has been a prominent area of research since the inception of the argument mining field. However, there has been a major discrepancy between the way natural language processing (NLP) researchers model and annotate arguments in court decisions and the way legal experts understand and analyze legal argumentation. While computational approaches typically simplify arguments into generic premises and claims, arguments in legal research usually exhibit a rich typology that is important for gaining insights into the particular case and applications of law in general. We address this problem and make several substantial contributions to move the field forward. First, we design a new annotation scheme for legal arguments in proceedings of the European Court of Human Rights (ECHR) that is deeply rooted in the theory and practice of legal argumentation research. Second, we compile and annotate a large corpus of 373 court decisions (2.3M tokens and 15k annotated argument spans). Finally, we train an argument mining model that outperforms state-of-the-art models in the legal NLP domain and provide a thorough expert-based evaluation. All data-sets and source codes are available under open licenses at https://github.com/trusthlt/mining-legal-arguments
Arithmetic-based pretraining improving numeracy of pretrained language models
State-of-the-art pretrained language models tend to perform below their capabilities when applied out-of-the-box on tasks that require understanding and working with numbers. Recent work suggests two main reasons for this: (1) popular tokenisation algorithms have limited expressiveness for numbers, and (2) common pretraining objectives do not target numeracy. Approaches that address these shortcomings usually require architectural changes or pretraining from scratch. In this paper, we propose a new extended pretraining approach called Arithmetic-Based Pretraining that jointly addresses both in one extended pretraining step without requiring architectural changes or pretraining from scratch. Arithmetic-Based Pretraining combines contrastive learning to improve the number representation, and a novel extended pretraining objective called Inferable Number Prediction Task to improve numeracy. Our experiments show the effectiveness of Arithmetic-Based Pretraining in three different tasks that require improved numeracy, i.e., reading comprehension in the DROP dataset, inference-on-tables in the InfoTabs dataset, and table-to-text generation in the WikiBio and Sci-Gen datasets
Falsesum: generating document-level NLI examples for recognizing factual inconsistency in summarization
Modelling of the preparation of masters of professional education for activities in the information and digital environment
У статті теоретично обґрунтовано модель підготовки та формування готовності майбутніх педагогів професійного навчання до діяльності в інформаційно-освітньому середовищі на основі розвитку їх цифрових компетентностей. У процесі аналізу видів діяльності майбутнього педагога професійного навчання ступеня вищої освіти магістра, зазначених у відповідному професійному стандарті, та галузі застосування його цифрових компетентностей виявлений їх взаємозв'язок і наводяться можливості розвитку цих компетентностей випускника в межах запропонованої освітньо-професійної програми підготовки магістрів професійної освіти. Програма побудована на основі вивчення прикладного профілю цифрових компетентностей майбутнього педагога професійного навчання. Методологія дослідження спирається на аналіз цифрових компетентностей педагога і логіку побудови предметної підготовки майбутніх магістрів професійної освіти до успішної та ефективної професійної діяльності в умовах інформаційно-цифрової діяльності.
У дослідженні проаналізовано зміст основних цифрових компетентностей педагога професійного навчання; запропонована модель підготовки та формування професійної готовності випускника до діяльності в інформаційно-цифровому середовищі на основі розвитку відповідних компетентностей; запропоновано зміст освітньо-професійної програми «Комп’ютерні технології в управлінні та навчанні» підготовки магістра в межах галузі знань 01 Освіта/Педагогіка за спеціальністю 015.39 Професійна освіта (Цифрові технології) на засадах розвитку цифрових компетентностей майбутнього фахівця в межах його професійної підготовки.
Показано, що для підвищення ефективності діяльності педагогів у сучасних умовах інформатизації та цифровізації суспільства і формування віртуальної соціально-освітньої системи необхідно введення в зміст їх професійної підготовки навчальних дисциплін, спрямованих на розвиток цифрових компетентностей на визначеному рівні.The article theoretically substantiates the model of training and formation of future teachers' of vocational education readiness to work in the information and educational environment based on the development of their digital competencies. Upon analyzing the types of activities of a future teacher with a master's degree that are specified in the professional standard and the field of application of their digital competencies, the research paper reveals their interconnection and presents the possibilities of developing these graduate competencies within the proposed educational and professional master's program. The program is elaborated upon studying the applied profile of the future teacher's digital competencies. The research methodology is based on the analysis of the digital competencies of the teacher and the logic of elaboration of the subject preparation of future masters for successful and effective professional activity in the context of information and digital activities.
The content of the teacher’s basic digital competencies is analyzed in the study; a model of training and formation of a graduate's professional readiness to work in the information and digital environment based on the development of relevant competencies is proposed; the content of the educational and professional program "Computer Technologies in Management and Education" for the master's degree in the field of knowledge 01 Education/Pedagogy, specialty 015.39 Vocational Education (Digital Technologies) on the basis of the development of digital competencies of a future specialist within his/her professional training is suggested.
The research demonstrates that it is necessary to include subjects aimed at developing digital competencies at a certain level in the content of their professional training in order to increase the effectiveness of teachers' activities in the modern conditions of informatization and digitalization of society and the formation of a virtual social and educational system
Sketch-Based Interfaces: Exploiting Spatio-temporal Context for Automatic Stroke Grouping
A new benchmark dataset with production methodology for short text semantic similarity algorithms
This research presents a new benchmark dataset for evaluating Short Text Semantic Similarity (STSS) measurement algorithms and the methodology used for its creation. The power of the dataset is evaluated by using it to compare two established algorithms, STASIS and Latent Semantic Analysis. This dataset focuses on measures for use in Conversational Agents; other potential applications include email processing and data mining of social networks. Such applications involve integrating the STSS algorithm in a complex system, but STSS algorithms must be evaluated in their own right and compared with others for their effectiveness before systems integration. Semantic similarity is an artifact of human perception; therefore its evaluation is inherently empirical and requires benchmark datasets derived from human similarity ratings. The new dataset of 64 sentence pairs, STSS-131, has been designed to meet these requirements drawing on a range of resources from traditional grammar to cognitive neuroscience. The human ratings are obtained from a set of trials using new and improved experimental methods, with validated measures and statistics. The results illustrate the increased challenge and the potential longevity of the STSS-131 dataset as the Gold Standard for future STSS algorithm evaluation. © 2013 ACM 1550-4875/2013/12-ART17 15.00
