13 research outputs found

    Grounding event references in news

    Get PDF
    Events are frequently discussed in natural language, and their accurate identification is central to language understanding. Yet they are diverse and complex in ontology and reference; computational processing hence proves challenging. News provides a shared basis for communication by reporting events. We perform several studies into news event reference. One annotation study characterises each news report in terms of its update and topic events, but finds that topic is better consider through explicit references to background events. In this context, we propose the event linking task which—analogous to named entity linking or disambiguation—models the grounding of references to notable events. It defines the disambiguation of an event reference as a link to the archival article that first reports it. When two references are linked to the same article, they need not be references to the same event. Event linking hopes to provide an intuitive approximation to coreference, erring on the side of over-generation in contrast with the literature. The task is also distinguished in considering event references from multiple perspectives over time. We diagnostically evaluate the task by first linking references to past, newsworthy events in news and opinion pieces to an archive of the Sydney Morning Herald. The intensive annotation results in only a small corpus of 229 distinct links. However, we observe that a number of hyperlinks targeting online news correspond to event links. We thus acquire two large corpora of hyperlinks at very low cost. From these we learn weights for temporal and term overlap features in a retrieval system. These noisy data lead to significant performance gains over a bag-of-words baseline. While our initial system can accurately predict many event links, most will require deep linguistic processing for their disambiguation

    Grounding event references in news

    Get PDF
    Events are frequently discussed in natural language, and their accurate identification is central to language understanding. Yet they are diverse and complex in ontology and reference; computational processing hence proves challenging. News provides a shared basis for communication by reporting events. We perform several studies into news event reference. One annotation study characterises each news report in terms of its update and topic events, but finds that topic is better consider through explicit references to background events. In this context, we propose the event linking task which—analogous to named entity linking or disambiguation—models the grounding of references to notable events. It defines the disambiguation of an event reference as a link to the archival article that first reports it. When two references are linked to the same article, they need not be references to the same event. Event linking hopes to provide an intuitive approximation to coreference, erring on the side of over-generation in contrast with the literature. The task is also distinguished in considering event references from multiple perspectives over time. We diagnostically evaluate the task by first linking references to past, newsworthy events in news and opinion pieces to an archive of the Sydney Morning Herald. The intensive annotation results in only a small corpus of 229 distinct links. However, we observe that a number of hyperlinks targeting online news correspond to event links. We thus acquire two large corpora of hyperlinks at very low cost. From these we learn weights for temporal and term overlap features in a retrieval system. These noisy data lead to significant performance gains over a bag-of-words baseline. While our initial system can accurately predict many event links, most will require deep linguistic processing for their disambiguation

    Les résumés de la Conférence canadienne sur l'éducation médicale 2021

    Get PDF

    Synchronous collaborative L2 writing with technology

    Get PDF
    This study explored the process of synchronous collaborative L2 writing using Google Docs in an English for medical purposes setting at university level. The research design is qualitative in nature as the collaborative practices of 24 German medical students in eight groups of three were investigated. The study focussed on the (approximately) 45-50 minutes long collaborative writing process of the eight groups with respect to their negotiation of the collaborative process. In other words, how did the students use Google Docs synchronously in terms of channel usage? What aspects of the collaborative task did those groups of L2 students decide to make a subject of discussion and what does that tell us about the nature of the process? Finally, how did students experience this synchronous collaborative writing process? The data collection relied primarily on the built-in recording features of Google Docs. The resulting data (chat logs, revision history of the co-authored texts, comments history) was compiled into a chronologically organised data set. In addition, participants took part in a post-activity survey. The participants’ collaborative practices and their answers in the survey were analysed utilising a qualitative content analysis approach. The results of the analysis revealed three major findings: First, students participated very actively in the activity, resulting in many opportunities for creating and negotiating language output – a necessary condition for second language development. Students focussed primarily on content- and workflow-related discussions, which is in line with findings from collaborative writing research. Students also engaged in ‘languaging’, i.e. language-related metatalk, which raises their language awareness – another facilitator of second language development. Due to the computer-mediated nature of the student discourse and the students’ high language level, surface-level matters like layout or spelling were not discussed by the students. Second, the analysis of discussion episodes revealed that the participants verbalised certain aspects of the writing process in their task-related meta discussions. An initial peak in workflow- and content-related discussions resembled a planning phase, the following rise in language- and structure-related discussions represented the translating phase. The final phase, which resembled a revision phase, saw a decrease of all discussions. Third, it had been hoped that two distinct patterns of solving a task together, namely collaboration and cooperation, could be identified by investigating instances of synchronous channel usage. However, synchronous activity in the text or overlap of activity in the chat and text did not prove to be a reliable indicator of either pattern. Due to the synchronous and all-written nature of the activity, it seemed plausible to classify synchronous collaborative writing as collaboration by default. The analysis also revealed a negative correlation between chat activity and performance in the final text. Groups who performed worst in the final texts dedicated substantially more time to chatting (about content- and workflow-related matters) than more successful groups. These groups seemed to struggle to establish a common content and workflow understanding, which is further supported by the post-activity survey. An all-written, multi-modal environment proved to be a challenge for some students, who could have benefitted from pedagogical guidance. The exploratory investigation of the synchronous collaborative L2 writing process with Google Docs led to several implications for foreign language teaching and research. First, the implementation of web-based technology can pose a serious legal and ethical challenge for educators and researchers in Germany, in particular, as user data is surrendered to global cloud-based systems – a problem which can only be solved by relying on locally installed, open source software. Second, shared documents can be a powerful tool to bridge the gap between classroom activities and the online component in blended learning settings. Third, shared documents make learning processes visible and, hence, assessable – albeit a shift from a product-oriented to a process-oriented assessment approach poses several pedagogical and pragmatic challenges. Fourth, shared documents is a feasible way for educators to collect user data for research but could benefit from the inclusion of more sophisticated means of data collection, such as eye-tracking or screen recording. Finally, the exploratory setup of this study revealed that a new way of working together requires guidelines on how to best exploit the possibilities of shared documents technology to work collaboratively on a joint project – a valuable avenue for future research

    Moving towards the semantic web: enabling new technologies through the semantic annotation of social contents.

    Get PDF
    La Web Social ha causat un creixement exponencial dels continguts disponibles deixant enormes quantitats de recursos textuals electrònics que sovint aclaparen els usuaris. Aquest volum d’informació és d’interès per a la comunitat de mineria de dades. Els algorismes de mineria de dades exploten característiques de les entitats per tal de categoritzar-les, agrupar-les o classificar-les segons la seva semblança. Les dades per si mateixes no aporten cap mena de significat: han de ser interpretades per esdevenir informació. Els mètodes tradicionals de mineria de dades no tenen com a objectiu “entendre” el contingut d’un recurs, sinó que extreuen valors numèrics els quals esdevenen models en aplicar-hi càlculs estadístics, que només cobren sentit sota l’anàlisi manual d’un expert. Els darrers anys, motivat per la Web Semàntica, molts investigadors han proposat mètodes semàntics de classificació de dades capaços d’explotar recursos textuals a nivell conceptual. Malgrat això, normalment aquests mètodes depenen de recursos anotats prèviament per poder interpretar semànticament el contingut d’un document. L’ús d’aquests mètodes està estretament relacionat amb l’associació de dades i el seu significat. Aquest treball es centra en el desenvolupament d’una metodologia genèrica capaç de detectar els trets més rellevants d’un recurs textual descobrint la seva associació semàntica, es a dir, enllaçant-los amb conceptes modelats a una ontologia, i detectant els principals temes de discussió. Els mètodes proposats són no supervisats per evitar el coll d’ampolla generat per l’anotació manual, independents del domini (aplicables a qualsevol àrea de coneixement) i flexibles (capaços d’analitzar recursos heterogenis: documents textuals o documents semi-estructurats com els articles de la Viquipèdia o les publicacions de Twitter). El treball ha estat avaluat en els àmbits turístic i mèdic. Per tant, aquesta dissertació és un primer pas cap a l'anotació semàntica automàtica de documents necessària per possibilitar el camí cap a la visió de la Web Semàntica.La Web Social ha provocado un crecimiento exponencial de los contenidos disponibles, dejando enormes cantidades de recursos electrónicos que a menudo abruman a los usuarios. Tal volumen de información es de interés para la comunidad de minería de datos. Los algoritmos de minería de datos explotan características de las entidades para categorizarlas, agruparlas o clasificarlas según su semejanza. Los datos por sí mismos no aportan ningún significado: deben ser interpretados para convertirse en información. Los métodos tradicionales no tienen como objetivo "entender" el contenido de un recurso, sino que extraen valores numéricos que se convierten en modelos tras aplicar cálculos estadísticos, los cuales cobran sentido bajo el análisis manual de un experto. Actualmente, motivados por la Web Semántica, muchos investigadores han propuesto métodos semánticos de clasificación de datos capaces de explotar recursos textuales a nivel conceptual. Sin embargo, generalmente estos métodos dependen de recursos anotados previamente para poder interpretar semánticamente el contenido de un documento. El uso de estos métodos está estrechamente relacionado con la asociación de datos y su significado. Este trabajo se centra en el desarrollo de una metodología genérica capaz de detectar los rasgos más relevantes de un recurso textual descubriendo su asociación semántica, es decir, enlazándolos con conceptos modelados en una ontología, y detectando los principales temas de discusión. Los métodos propuestos son no supervisados para evitar el cuello de botella generado por la anotación manual, independientes del dominio (aplicables a cualquier área de conocimiento) y flexibles (capaces de analizar recursos heterogéneos: documentos textuales o documentos semi-estructurados, como artículos de la Wikipedia o publicaciones de Twitter). El trabajo ha sido evaluado en los ámbitos turístico y médico. Esta disertación es un primer paso hacia la anotación semántica automática de documentos necesaria para posibilitar el camino hacia la visión de la Web Semántica.Social Web technologies have caused an exponential growth of the documents available through the Web, making enormous amounts of textual electronic resources available. Users may be overwhelmed by such amount of contents and, therefore, the automatic analysis and exploitation of all this information is of interest to the data mining community. Data mining algorithms exploit features of the entities in order to characterise, group or classify them according to their resemblance. Data by itself does not carry any meaning; it needs to be interpreted to convey information. Classical data analysis methods did not aim to “understand” the content and the data were treated as meaningless numbers and statistics were calculated on them to build models that were interpreted manually by human domain experts. Nowadays, motivated by the Semantic Web, many researchers have proposed semantic-grounded data classification and clustering methods that are able to exploit textual data at a conceptual level. However, they usually rely on pre-annotated inputs to be able to semantically interpret textual data such as the content of Web pages. The usability of all these methods is related to the linkage between data and its meaning. This work focuses on the development of a general methodology able to detect the most relevant features of a particular textual resource finding out their semantics (associating them to concepts modelled in ontologies) and detecting its main topics. The proposed methods are unsupervised (avoiding the manual annotation bottleneck), domain-independent (applicable to any area of knowledge) and flexible (being able to deal with heterogeneous resources: raw text documents, semi-structured user-generated documents such Wikipedia articles or short and noisy tweets). The methods have been evaluated in different fields (Tourism, Oncology). This work is a first step towards the automatic semantic annotation of documents, needed to pave the way towards the Semantic Web vision

    Synchronous collaborative L2 writing with technology

    Get PDF
    This study explored the process of synchronous collaborative L2 writing using Google Docs in an English for medical purposes setting at university level. The research design is qualitative in nature as the collaborative practices of 24 German medical students in eight groups of three were investigated. The study focussed on the (approximately) 45-50 minutes long collaborative writing process of the eight groups with respect to their negotiation of the collaborative process. In other words, how did the students use Google Docs synchronously in terms of channel usage? What aspects of the collaborative task did those groups of L2 students decide to make a subject of discussion and what does that tell us about the nature of the process? Finally, how did students experience this synchronous collaborative writing process? The data collection relied primarily on the built-in recording features of Google Docs. The resulting data (chat logs, revision history of the co-authored texts, comments history) was compiled into a chronologically organised data set. In addition, participants took part in a post-activity survey. The participants’ collaborative practices and their answers in the survey were analysed utilising a qualitative content analysis approach. The results of the analysis revealed three major findings: First, students participated very actively in the activity, resulting in many opportunities for creating and negotiating language output – a necessary condition for second language development. Students focussed primarily on content- and workflow-related discussions, which is in line with findings from collaborative writing research. Students also engaged in ‘languaging’, i.e. language-related metatalk, which raises their language awareness – another facilitator of second language development. Due to the computer-mediated nature of the student discourse and the students’ high language level, surface-level matters like layout or spelling were not discussed by the students. Second, the analysis of discussion episodes revealed that the participants verbalised certain aspects of the writing process in their task-related meta discussions. An initial peak in workflow- and content-related discussions resembled a planning phase, the following rise in language- and structure-related discussions represented the translating phase. The final phase, which resembled a revision phase, saw a decrease of all discussions. Third, it had been hoped that two distinct patterns of solving a task together, namely collaboration and cooperation, could be identified by investigating instances of synchronous channel usage. However, synchronous activity in the text or overlap of activity in the chat and text did not prove to be a reliable indicator of either pattern. Due to the synchronous and all-written nature of the activity, it seemed plausible to classify synchronous collaborative writing as collaboration by default. The analysis also revealed a negative correlation between chat activity and performance in the final text. Groups who performed worst in the final texts dedicated substantially more time to chatting (about content- and workflow-related matters) than more successful groups. These groups seemed to struggle to establish a common content and workflow understanding, which is further supported by the post-activity survey. An all-written, multi-modal environment proved to be a challenge for some students, who could have benefitted from pedagogical guidance. The exploratory investigation of the synchronous collaborative L2 writing process with Google Docs led to several implications for foreign language teaching and research. First, the implementation of web-based technology can pose a serious legal and ethical challenge for educators and researchers in Germany, in particular, as user data is surrendered to global cloud-based systems – a problem which can only be solved by relying on locally installed, open source software. Second, shared documents can be a powerful tool to bridge the gap between classroom activities and the online component in blended learning settings. Third, shared documents make learning processes visible and, hence, assessable – albeit a shift from a product-oriented to a process-oriented assessment approach poses several pedagogical and pragmatic challenges. Fourth, shared documents is a feasible way for educators to collect user data for research but could benefit from the inclusion of more sophisticated means of data collection, such as eye-tracking or screen recording. Finally, the exploratory setup of this study revealed that a new way of working together requires guidelines on how to best exploit the possibilities of shared documents technology to work collaboratively on a joint project – a valuable avenue for future research

    Synchronous collaborative L2 writing with technology

    Get PDF
    This study explored the process of synchronous collaborative L2 writing using Google Docs in an English for medical purposes setting at university level. The research design is qualitative in nature as the collaborative practices of 24 German medical students in eight groups of three were investigated. The study focussed on the (approximately) 45-50 minutes long collaborative writing process of the eight groups with respect to their negotiation of the collaborative process. In other words, how did the students use Google Docs synchronously in terms of channel usage? What aspects of the collaborative task did those groups of L2 students decide to make a subject of discussion and what does that tell us about the nature of the process? Finally, how did students experience this synchronous collaborative writing process? The data collection relied primarily on the built-in recording features of Google Docs. The resulting data (chat logs, revision history of the co-authored texts, comments history) was compiled into a chronologically organised data set. In addition, participants took part in a post-activity survey. The participants’ collaborative practices and their answers in the survey were analysed utilising a qualitative content analysis approach. The results of the analysis revealed three major findings: First, students participated very actively in the activity, resulting in many opportunities for creating and negotiating language output – a necessary condition for second language development. Students focussed primarily on content- and workflow-related discussions, which is in line with findings from collaborative writing research. Students also engaged in ‘languaging’, i.e. language-related metatalk, which raises their language awareness – another facilitator of second language development. Due to the computer-mediated nature of the student discourse and the students’ high language level, surface-level matters like layout or spelling were not discussed by the students. Second, the analysis of discussion episodes revealed that the participants verbalised certain aspects of the writing process in their task-related meta discussions. An initial peak in workflow- and content-related discussions resembled a planning phase, the following rise in language- and structure-related discussions represented the translating phase. The final phase, which resembled a revision phase, saw a decrease of all discussions. Third, it had been hoped that two distinct patterns of solving a task together, namely collaboration and cooperation, could be identified by investigating instances of synchronous channel usage. However, synchronous activity in the text or overlap of activity in the chat and text did not prove to be a reliable indicator of either pattern. Due to the synchronous and all-written nature of the activity, it seemed plausible to classify synchronous collaborative writing as collaboration by default. The analysis also revealed a negative correlation between chat activity and performance in the final text. Groups who performed worst in the final texts dedicated substantially more time to chatting (about content- and workflow-related matters) than more successful groups. These groups seemed to struggle to establish a common content and workflow understanding, which is further supported by the post-activity survey. An all-written, multi-modal environment proved to be a challenge for some students, who could have benefitted from pedagogical guidance. The exploratory investigation of the synchronous collaborative L2 writing process with Google Docs led to several implications for foreign language teaching and research. First, the implementation of web-based technology can pose a serious legal and ethical challenge for educators and researchers in Germany, in particular, as user data is surrendered to global cloud-based systems – a problem which can only be solved by relying on locally installed, open source software. Second, shared documents can be a powerful tool to bridge the gap between classroom activities and the online component in blended learning settings. Third, shared documents make learning processes visible and, hence, assessable – albeit a shift from a product-oriented to a process-oriented assessment approach poses several pedagogical and pragmatic challenges. Fourth, shared documents is a feasible way for educators to collect user data for research but could benefit from the inclusion of more sophisticated means of data collection, such as eye-tracking or screen recording. Finally, the exploratory setup of this study revealed that a new way of working together requires guidelines on how to best exploit the possibilities of shared documents technology to work collaboratively on a joint project – a valuable avenue for future research
    corecore