Search CORE

13 research outputs found

Grounding event references in news

Author: Altena R.
Geerlings W.A.
Klingeren B. van
Lange W.C.M. de
Werf T.S.
Publication venue: School of Information Technologies
Publication date: 01/01/2000
Field of study

Events are frequently discussed in natural language, and their accurate identification is central to language understanding. Yet they are diverse and complex in ontology and reference; computational processing hence proves challenging. News provides a shared basis for communication by reporting events. We perform several studies into news event reference. One annotation study characterises each news report in terms of its update and topic events, but finds that topic is better consider through explicit references to background events. In this context, we propose the event linking task which—analogous to named entity linking or disambiguation—models the grounding of references to notable events. It defines the disambiguation of an event reference as a link to the archival article that first reports it. When two references are linked to the same article, they need not be references to the same event. Event linking hopes to provide an intuitive approximation to coreference, erring on the side of over-generation in contrast with the literature. The task is also distinguished in considering event references from multiple perspectives over time. We diagnostically evaluate the task by first linking references to past, newsworthy events in news and opinion pieces to an archive of the Sydney Morning Herald. The intensive annotation results in only a small corpus of 229 distinct links. However, we observe that a number of hyperlinks targeting online news correspond to event links. We thus acquire two large corpora of hyperlinks at very low cost. From these we learn weights for temporal and term overlap features in a retrieval system. These noisy data lead to significant performance gains over a bag-of-words baseline. While our initial system can accurately predict many event links, most will require deep linguistic processing for their disambiguation

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Sydney eScholarship

Radboud Repository

Dissertations of the University of Groningen

Grounding event references in news

Author: Nothman Joel
Publication venue: Faculty of Engineering and Information Technologies, School of Information Technologies
Publication date: 01/01/2014
Field of study

Sydney eScholarship

The People’s Encyclopedia Under the Gaze of the Sages: A Systematic Review of Scholarly Research on Wikipedia

Author: Lanamäki Arto
Mehdi Mohamad
Mesgari Mostafa
Nielsen Finn Årup
Okoli Chitu
Publication venue: 'Elsevier BV'
Publication date: 01/01/2012
Field of study

Crossref

Online Research Database In Technology

Les résumés de la Conférence canadienne sur l'éducation médicale 2021

Author: Hickey Heather
Publication venue: Canadian Medical Education Journal
Publication date: 12/04/2021
Field of study

University of Calgary Journal Hosting

Synchronous collaborative L2 writing with technology

Author: Steinberger Franz
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 09/11/2017
Field of study

This study explored the process of synchronous collaborative L2 writing using Google Docs in an English for medical purposes setting at university level. The research design is qualitative in nature as the collaborative practices of 24 German medical students in eight groups of three were investigated. The study focussed on the (approximately) 45-50 minutes long collaborative writing process of the eight groups with respect to their negotiation of the collaborative process. In other words, how did the students use Google Docs synchronously in terms of channel usage? What aspects of the collaborative task did those groups of L2 students decide to make a subject of discussion and what does that tell us about the nature of the process? Finally, how did students experience this synchronous collaborative writing process? The data collection relied primarily on the built-in recording features of Google Docs. The resulting data (chat logs, revision history of the co-authored texts, comments history) was compiled into a chronologically organised data set. In addition, participants took part in a post-activity survey. The participants’ collaborative practices and their answers in the survey were analysed utilising a qualitative content analysis approach. The results of the analysis revealed three major findings: First, students participated very actively in the activity, resulting in many opportunities for creating and negotiating language output – a necessary condition for second language development. Students focussed primarily on content- and workflow-related discussions, which is in line with findings from collaborative writing research. Students also engaged in ‘languaging’, i.e. language-related metatalk, which raises their language awareness – another facilitator of second language development. Due to the computer-mediated nature of the student discourse and the students’ high language level, surface-level matters like layout or spelling were not discussed by the students. Second, the analysis of discussion episodes revealed that the participants verbalised certain aspects of the writing process in their task-related meta discussions. An initial peak in workflow- and content-related discussions resembled a planning phase, the following rise in language- and structure-related discussions represented the translating phase. The final phase, which resembled a revision phase, saw a decrease of all discussions. Third, it had been hoped that two distinct patterns of solving a task together, namely collaboration and cooperation, could be identified by investigating instances of synchronous channel usage. However, synchronous activity in the text or overlap of activity in the chat and text did not prove to be a reliable indicator of either pattern. Due to the synchronous and all-written nature of the activity, it seemed plausible to classify synchronous collaborative writing as collaboration by default. The analysis also revealed a negative correlation between chat activity and performance in the final text. Groups who performed worst in the final texts dedicated substantially more time to chatting (about content- and workflow-related matters) than more successful groups. These groups seemed to struggle to establish a common content and workflow understanding, which is further supported by the post-activity survey. An all-written, multi-modal environment proved to be a challenge for some students, who could have benefitted from pedagogical guidance. The exploratory investigation of the synchronous collaborative L2 writing process with Google Docs led to several implications for foreign language teaching and research. First, the implementation of web-based technology can pose a serious legal and ethical challenge for educators and researchers in Germany, in particular, as user data is surrendered to global cloud-based systems – a problem which can only be solved by relying on locally installed, open source software. Second, shared documents can be a powerful tool to bridge the gap between classroom activities and the online component in blended learning settings. Third, shared documents make learning processes visible and, hence, assessable – albeit a shift from a product-oriented to a process-oriented assessment approach poses several pedagogical and pragmatic challenges. Fourth, shared documents is a feasible way for educators to collect user data for research but could benefit from the inclusion of more sophisticated means of data collection, such as eye-tracking or screen recording. Finally, the exploratory setup of this study revealed that a new way of working together requires guidelines on how to best exploit the possibilities of shared documents technology to work collaboratively on a joint project – a valuable avenue for future research

Recommended from our members

Making digital history: The impact of digitality on public participation and scholarly practices in historical research

Author: Ridge Mia
Publication venue
Publication date: 29/06/2016
Field of study

This thesis investigates tow key questions: firstly, how do two broad groups - academic, family and local historians, and the public - evaluate, use, and contribute to digital history resources? And consequently, what impact have digital technologies had on public participation and scholarly practices in historical research? Analysing the impact of design on participant experiences and the reception of digital historiography by demonstrating the value of methods drawn from human-computer interaction, including heuristic evaluation, trace ethnography and semi-structured interviews. This thesis also investigates the relationship between heritage crowdsourcing projects (which ask the public to help with meaningful, inherently rewarding tasks that contribute to a shared, significant goal or research interest related to cultural heritage collections or knowledge) and the development of historical skills and interests. It situates crowdsourcing and citizen history within the broader field of participatory digital history and then focuses on the impact of digitality on the research practices of faculty and community historians. Chapter 1 provides an overview of over 400 digital history projects aimed at engaging the public or collecting, creating or enhancing records about historical materials for scholarly and general audiences. Chapter 2 discusses design factors that may influence the success of crowdsourcing projects. Following this, Chapter 3 explores the ways in which some crowdsourcing projects encourage deeper engagement with history or science, and the role of communities of practice in citizen history. Chapter 4 shifts our focus from public participation to scholarly practices in historical research, presenting the results of interviews conducted with 29 faculty and community historians. Finally, the Conclusion draws together the threads that link public participation and scholarly practices, teasing out the ways in which the practices of discovering, gathering, creating and sharing historical materials and knowledge have been affected by digital methods, tools and resources

Open Research Online (The Open University)

Moving towards the semantic web: enabling new technologies through the semantic annotation of social contents.

Author: Vicient Monllaó Carlos
Publication venue: 'Universitat Rovira I Virgili'
Publication date: 01/01/2015
Field of study

La Web Social ha causat un creixement exponencial dels continguts disponibles deixant enormes quantitats de recursos textuals electrònics que sovint aclaparen els usuaris. Aquest volum d’informació és d’interès per a la comunitat de mineria de dades. Els algorismes de mineria de dades exploten característiques de les entitats per tal de categoritzar-les, agrupar-les o classificar-les segons la seva semblança. Les dades per si mateixes no aporten cap mena de significat: han de ser interpretades per esdevenir informació. Els mètodes tradicionals de mineria de dades no tenen com a objectiu “entendre” el contingut d’un recurs, sinó que extreuen valors numèrics els quals esdevenen models en aplicar-hi càlculs estadístics, que només cobren sentit sota l’anàlisi manual d’un expert. Els darrers anys, motivat per la Web Semàntica, molts investigadors han proposat mètodes semàntics de classificació de dades capaços d’explotar recursos textuals a nivell conceptual. Malgrat això, normalment aquests mètodes depenen de recursos anotats prèviament per poder interpretar semànticament el contingut d’un document. L’ús d’aquests mètodes està estretament relacionat amb l’associació de dades i el seu significat. Aquest treball es centra en el desenvolupament d’una metodologia genèrica capaç de detectar els trets més rellevants d’un recurs textual descobrint la seva associació semàntica, es a dir, enllaçant-los amb conceptes modelats a una ontologia, i detectant els principals temes de discussió. Els mètodes proposats són no supervisats per evitar el coll d’ampolla generat per l’anotació manual, independents del domini (aplicables a qualsevol àrea de coneixement) i flexibles (capaços d’analitzar recursos heterogenis: documents textuals o documents semi-estructurats com els articles de la Viquipèdia o les publicacions de Twitter). El treball ha estat avaluat en els àmbits turístic i mèdic. Per tant, aquesta dissertació és un primer pas cap a l'anotació semàntica automàtica de documents necessària per possibilitar el camí cap a la visió de la Web Semàntica.La Web Social ha provocado un crecimiento exponencial de los contenidos disponibles, dejando enormes cantidades de recursos electrónicos que a menudo abruman a los usuarios. Tal volumen de información es de interés para la comunidad de minería de datos. Los algoritmos de minería de datos explotan características de las entidades para categorizarlas, agruparlas o clasificarlas según su semejanza. Los datos por sí mismos no aportan ningún significado: deben ser interpretados para convertirse en información. Los métodos tradicionales no tienen como objetivo "entender" el contenido de un recurso, sino que extraen valores numéricos que se convierten en modelos tras aplicar cálculos estadísticos, los cuales cobran sentido bajo el análisis manual de un experto. Actualmente, motivados por la Web Semántica, muchos investigadores han propuesto métodos semánticos de clasificación de datos capaces de explotar recursos textuales a nivel conceptual. Sin embargo, generalmente estos métodos dependen de recursos anotados previamente para poder interpretar semánticamente el contenido de un documento. El uso de estos métodos está estrechamente relacionado con la asociación de datos y su significado. Este trabajo se centra en el desarrollo de una metodología genérica capaz de detectar los rasgos más relevantes de un recurso textual descubriendo su asociación semántica, es decir, enlazándolos con conceptos modelados en una ontología, y detectando los principales temas de discusión. Los métodos propuestos son no supervisados para evitar el cuello de botella generado por la anotación manual, independientes del dominio (aplicables a cualquier área de conocimiento) y flexibles (capaces de analizar recursos heterogéneos: documentos textuales o documentos semi-estructurados, como artículos de la Wikipedia o publicaciones de Twitter). El trabajo ha sido evaluado en los ámbitos turístico y médico. Esta disertación es un primer paso hacia la anotación semántica automática de documentos necesaria para posibilitar el camino hacia la visión de la Web Semántica.Social Web technologies have caused an exponential growth of the documents available through the Web, making enormous amounts of textual electronic resources available. Users may be overwhelmed by such amount of contents and, therefore, the automatic analysis and exploitation of all this information is of interest to the data mining community. Data mining algorithms exploit features of the entities in order to characterise, group or classify them according to their resemblance. Data by itself does not carry any meaning; it needs to be interpreted to convey information. Classical data analysis methods did not aim to “understand” the content and the data were treated as meaningless numbers and statistics were calculated on them to build models that were interpreted manually by human domain experts. Nowadays, motivated by the Semantic Web, many researchers have proposed semantic-grounded data classification and clustering methods that are able to exploit textual data at a conceptual level. However, they usually rely on pre-annotated inputs to be able to semantically interpret textual data such as the content of Web pages. The usability of all these methods is related to the linkage between data and its meaning. This work focuses on the development of a general methodology able to detect the most relevant features of a particular textual resource finding out their semantics (associating them to concepts modelled in ontologies) and detecting its main topics. The proposed methods are unsupervised (avoiding the manual annotation bottleneck), domain-independent (applicable to any area of knowledge) and flexible (being able to deal with heterogeneous resources: raw text documents, semi-structured user-generated documents such Wikipedia articles or short and noisy tweets). The methods have been evaluated in different fields (Tourism, Oncology). This work is a first step towards the automatic semantic annotation of documents, needed to pave the way towards the Semantic Web vision

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Tesis Doctorals en Xarxa