Search CORE

26 research outputs found

The role of verb semantics in Hungarian verb-object order

Author: Demszky Dorottya
Publication venue: 'Linguistic Society of America'
Publication date: 20/03/2021
Field of study

Hungarian is often referred to as a discourse-configurational language, since the structural position of constituents is determined by their logical function (topic or comment) rather than their grammatical function (e.g., subject or object). We build on work by Komlósy (1989) and argue that in addition to discourse context, the lexical semantics of the verb also plays a significant role in determining Hungarian word order. In order to investigate the role of lexical semantics in determining Hungarian word order, we conduct a large-scale, data-driven analysis on the ordering of 380 transitive verbs and their objects, as observed in hundreds of thousands of examples extracted from the Hungarian Gigaword Corpus. We test the effect of lexical semantics on the ordering of verbs and their objects by grouping verbs into 11 semantic classes. In addition to the semantic class of the verb, we also include two control features related to information structure, object definiteness and object NP weight, chosen to allow a comparison of their effect size to that of verb semantics. Our results suggest that all three features have a significant effect on verb-object ordering in Hungarian and among these features, the semantic class of the verb has the largest effect. Specifically, we find that stative verbs, such as fed 'cover', jelent 'mean' and övez 'surround', tend to be OV-preferring (with the exception of psych verbs which are strongly VO-preferring) and non-stative verbs, such as bírál 'judge', csökkent 'reduce' and csókol 'kiss', verbs tend to be VO-preferring. These findings support our hypothesis that lexical semantic factors influence word order in Hungarian

Proceedings Published by the LSA (Linguistic Society of America)

The NCTE Transcripts: A Dataset of Elementary Math Classroom Transcripts

Author: Demszky Dorottya
Hill Heather
Publication venue
Publication date: 21/11/2022
Field of study

Classroom discourse is a core medium of instruction -- analyzing it can provide a window into teaching and learning as well as driving the development of new tools for improving instruction. We introduce the largest dataset of mathematics classroom transcripts available to researchers, and demonstrate how this data can help improve instruction. The dataset consists of 1,660 45-60 minute long 4th and 5th grade elementary mathematics observations collected by the National Center for Teacher Effectiveness (NCTE) between 2010-2013. The anonymized transcripts represent data from 317 teachers across 4 school districts that serve largely historically marginalized students. The transcripts come with rich metadata, including turn-level annotations for dialogic discourse moves, classroom observation scores, demographic information, survey responses and student test scores. We demonstrate that our natural language processing model, trained on our turn-level annotations, can learn to identify dialogic discourse moves and these moves are correlated with better classroom observation scores and learning outcomes. This dataset opens up several possibilities for researchers, educators and policymakers to learn about and improve K-12 instruction. The data and its terms of use can be accessed here: https://github.com/ddemszky/classroom-transcript-analysi

arXiv.org e-Print Archive

A tudás szerepe a (poszt)modern társadalmakban

Author: Demszky Alma Míra
Publication venue: Eszterházy Károly Egyetem Líceum Kiadó
Publication date
Field of study

EKE Repository of Publications

Lebensführung und alltägliche Vergesellschaftung in einer Plattenbausiedlung in Budapest

Author: Hagen-Demszky Alma von der
Publication venue: Frankfurt am Main
Publication date: 01/01/2004
Field of study

"Der Beitrag stellt die Ergebnisse der 2005 an der Technischen Universität Chemnitz abgeschlossenen Dissertation der Referentin vor. Die empirische Erhebung beschäftigte sich mit der Entstehung und Aufrechterhaltung von Sozietät im Alltagsleben. Untersucht wurden Formen und Logik alltäglicher Vergesellschaftung: Die individuell spezifische Art und Weise von Personen, soziale Kontakte zu knüpfen undauf diesem Weg Gesellschaft täglich neu entstehen zu lassen. Sowohl die sozialen Netzwerke der Mikroebene als auch die Einbindung der einzelnen Person in die Gesellschaft und damit die Verknüpfung von Mikro- und Makroebene wurden erforscht. Eine zu enge Präzisierung und Definition der Begriffe 'Vergesellschaftung' und 'Gesellschaft' wurde bei der Fragestellung bewusst vermieden. Die Arbeit sollte die Sicht der Individuen einfangen und ihnen nichts aufzwingen. Gesellschaft meint in erster Annäherung alles Soziale, das die Menschen umgibt: ihre Familie, die Nachbarn, die Kollegen, den Nachrichtensprecher, die Politiker, die Lehrerin in der Schule. 'Gesellschaft' soll zunächst in ihrer alltagssprachlichen Bedeutung verstanden werden: Man ist in Gesellschaft - also nicht alleine. Es geht um die Gesellschaft von Mitmenschen, die jeder Tag für Tag erlebt. Niemand, der Familie hat, arbeiten geht und in einer Großstadt lebt, ist wirklich alleine. Denn jeder hat regelmäßigen Kontakt zu Mitmenschen, er muss sich an sie anpassen. Sei es zu Hause am Esstisch, im Bus mit anderen Fahrgästen oder mit den Kollegen bei der Arbeit. Bei diesen Kontakten entsteht - scheinbar 'nebenbei' - das, was die Soziologie 'Gesellschaft' nennt. Um folgende Fragen kreiste die Untersuchung: Wie docken Menschen ihr individuelles Leben Tag für Tag an das anderer Menschen an und wie werden sie tagtäglich Teil der Gesellschaft, die sie umgibt? Welche alltäglichen Leistungen und Anstrengungen sind erforderlich? Wie entsteht aus Millionen einzelner Leben die Gesellschaft? Was tun diese Millionen von Menschen dafür, dass sie entsteht? Die grundlagentheoretische Fragestellung wurde am Beispiel einer Wohnsiedlung in einer Plattenbausiedlung in Budapest untersucht. Obwohl sich die Arbeit nicht in erster Linie der stadtsoziologischen Erforschung dieser Siedlungsform widmet, sondernder Untersuchung einer allgemeinen Fragestellung an einem konkreten Ort, wurde eine Analyse der Siedlung, ihrer Geschichte und ihrer örtlichen Gemeinschaft vorgenommen. Die Plattenbausiedlung ist Schauplatz des alltäglichen Lebens der Befragten. Sie ist eine der Plattformen, an der alltägliche Vergesellschaftung greifbar und aktuell wird: Anhand der Siedlung konnte Vergesellschaftung nicht 'nur' aus den Erzählungen der Befragten nachgezeichnet, sondern 'live' beobachtet werden. Auf diesem Weg eröffnete die Analyse der Siedlung eine zusätzliche Dimension in der Untersuchung. Das Untersuchungsland Ungarn und der Untersuchungsort Budapest ermöglichten es zudem, Besonderheiten einer postkommunistischen Gesellschaft nachzuzeichnen. Die Arbeit liefert somit auch Ansätze eines Vergleichs zwischen ungarischen und deutschen gesellschaftlichen Verhältnissen: Sowohl auf dem Gebiet der Alltagsorganisation als auch bei der Vergesellschaftung wird auf Ähnlichkeiten und Unterschiede zwischen Deutschland und Ungarn hingewiesen. Das Untersuchungsdesign verzahnte theoretische und empirische Forschungsschritte. Die Aufarbeitung des Standes der Forschung verhalf zur Präzisierung der Fragestellung, Erarbeitung eines eigenen theoretischen Rahmens und Ausarbeitung der Methodologie. Die Sammlung und Auswertung der empirischen Daten wurde theoriegeleitet kontrolliert. Die Arbeit schloss mit der Diskussion der theoretischen Bedeutung der empirischen Ergebnisse." (Autorenreferat

SSOAR - Social Science Open Access Repository

Kid-Whisper: Towards Bridging the Performance Gap in Automatic Speech Recognition for Children VS. Adults

Author: Ai Wei
Attia Ahmed Adel
Demszky Dorottya
Espy-Wilson Carol
Liu Jing
Publication venue
Publication date: 18/09/2023
Field of study

Recent advancements in Automatic Speech Recognition (ASR) systems, exemplified by Whisper, have demonstrated the potential of these systems to approach human-level performance given sufficient data. However, this progress doesn't readily extend to ASR for children due to the limited availability of suitable child-specific databases and the distinct characteristics of children's speech. A recent study investigated leveraging the My Science Tutor (MyST) children's speech corpus to enhance Whisper's performance in recognizing children's speech. They were able to demonstrate some improvement on a limited testset. This paper builds on these findings by enhancing the utility of the MyST dataset through more efficient data preprocessing. We reduce the Word Error Rate (WER) on the MyST testset 13.93% to 9.11% with Whisper-Small and from 13.23% to 8.61% with Whisper-Medium and show that this improvement can be generalized to unseen datasets. We also highlight important challenges towards improving children's ASR performance. The results showcase the viable and efficient integration of Whisper for effective children's speech recognition

arXiv.org e-Print Archive

Learning to Recognize Dialect Features

Author: Clark J
Demszky D
Eisenstein J
Prabhakaran V
Sharma D
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 30/06/2021
Field of study

Building NLP systems that serve everyone requires accounting for dialect differences. But dialects are not monolithic entities: rather, distinctions between and within dialects are captured by the presence, absence, and frequency of dozens of dialect features in speech and text, such as the deletion of the copula in “He ∅ running”. In this paper, we introduce the task of dialect feature detection, and present two multitask learning approaches, both based on pretrained transformers. For most dialects, large-scale annotated corpora for these features are unavailable, making it difficult to train recognizers. We train our models on a small number of minimal pairs, building on how linguists typically define dialect features. Evaluation on a test set of 22 dialect features of Indian English demonstrates that these models learn to recognize many features with high accuracy, and that a few minimal pairs can be as effective for training as thousands of labeled examples. We also demonstrate the downstream applicability of dialect feature detection both as a measure of dialect density and as a dialect classifier

Queen Mary Research Online

Stance detection on social media: State of the art and trends

Author: Aker
Al-Ayyoub
Aldayel
Aldayel
Allaway
Allcott
Augenstein
Baird
Banegas
Bar-Haim
Barberá
Bassiouney
Beigman Klebanov
Belkaroui
Benamara
Bessi
Biber
Borge-Holthoefer
Borges
Bucholtz
Chauhan
Cignarella
Conforti
Cramér
Darwish
Darwish
Darwish
Demszky
Dong
Dori-Hacohen
Du Bois
Ebrahimi
Ferreira
Ferreira
Fraisier
Fuchs
Garimella
Garimella
Gautam
Ghosh
Gottipati
Graells-Garrido
Grcar
Gu
Gu
Hanawa
Himelboim
Jaffe
Jang
Jurafsky
Küçük
Lahoti
Lai
Lai
Lai
Li
Li
Liebetrau
Lin
Ma
Magdy
McKendrick
Mohammad
Mohammad
Mohammad
Mohtarami
Murakami
Newman
Pang
Pennacchiotti
Qazvinian
Qiu
Rajadesingan
Sen
Shu
Siddiqua
Siddiqua
Siddiqua
Simaki
Simaki
Singh
Sobhani
Sobhani
Sobhani
Somasundaran
Stefanov
Sun
Tanaka
Taulé
Thonet
Trabelsi
Walker
Walker
Wang
Weber
Wei
Wei
Xi
Zhang
Zhang
Zhou
Zhu
Zubiaga
Zubiaga
Publication venue: 'Elsevier BV'
Publication date: 24/02/2021
Field of study

Stance detection on social media is an emerging opinion mining paradigm for various social and political applications in which sentiment analysis may be sub-optimal. There has been a growing research interest for developing effective methods for stance detection methods varying among multiple communities including natural language processing, web science, and social computing. This paper surveys the work on stance detection within those communities and situates its usage within current opinion mining techniques in social media. It presents an exhaustive review of stance detection techniques on social media, including the task definition, different types of targets in stance detection, features set used, and various machine learning approaches applied. The survey reports state-of-the-art results on the existing benchmark datasets on stance detection, and discusses the most effective approaches. In addition, this study explores the emerging trends and different applications of stance detection on social media. The study concludes by discussing the gaps in the current existing research and highlights the possible future directions for stance detection on social media.Comment: We request withdrawal of this article sincerely. We will re-edit this paper. Please withdraw this article before we finish the new versio

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

PISA, policy and persuasion: translating complex conditions into education ‘best practice’

Author: Alexander R
Apart Worlds
Barber M
Barber M
Barber M.
Barnes B.
Bloor D
Demszky A.
DfE
Euan Auld
Fairclough N
Farnsworth V.
Flew A
Hopmann S. T.
Institute Grattan
IPPR (Institute for Public Policy)
Jenkins-Smith H. C.
Kallo J
Lemke J. L.
Majone G
Markkanen R.
McKinsey
McKinsey
OECD
OECD
OECD
OECD
OECD
Ozga J
Paul Morris
Rappleye J
Sabatier P. A.
Schleicher A
Schleicher A.
Stone D. A
Stone D. A
Tardy C. M
Tucker M
Tucker M.
Waldow F
Publication venue: 'Informa UK Limited'
Publication date
Field of study

Crossref