175 research outputs found
Enhancing knowledge acquisition systems with user generated and crowdsourced resources
This thesis is on leveraging knowledge acquisition systems with collaborative data and
crowdsourcing work from internet. We propose two strategies and apply them for building
effective entity linking and question answering (QA) systems.
The first strategy is on integrating an information extraction system with online collaborative
knowledge bases, such as Wikipedia and Freebase. We construct a Cross-Lingual Entity
Linking (CLEL) system to connect Chinese entities, such as people and locations, with
corresponding English pages in Wikipedia.
The main focus is to break the language barrier between Chinese entities and the English
KB, and to resolve the synonymy and polysemy of Chinese entities. To address those
problems, we create a cross-lingual taxonomy and a Chinese knowledge base (KB). We
investigate two methods of connecting the query representation with the KB representation.
Based on our CLEL system participating in TAC KBP 2011 evaluation, we finally propose
a simple and effective generative model, which achieved much better performance.
The second strategy is on creating annotation for QA systems with the help of crowd-
sourcing. Crowdsourcing is to distribute a task via internet and recruit a lot of people to
complete it simultaneously. Various annotated data are required to train the data-driven
statistical machine learning algorithms for underlying components in our QA system. This
thesis demonstrates how to convert the annotation task into crowdsourcing micro-tasks,
investigate different statistical methods for enhancing the quality of crowdsourced anno-
tation, and finally use enhanced annotation to train learning to rank models for passage
ranking algorithms for QA.Gegenstand dieser Arbeit ist das Nutzbarmachen sowohl von Systemen zur Wissener-
fassung als auch von kollaborativ erstellten Daten und Arbeit aus dem Internet. Es
werden zwei Strategien vorgeschlagen, welche für die Erstellung effektiver Entity Linking
(Disambiguierung von Entitätennamen) und Frage-Antwort Systeme eingesetzt werden.
Die erste Strategie ist, ein Informationsextraktions-System mit kollaborativ erstellten Online-
Datenbanken zu integrieren. Wir entwickeln ein Cross-Linguales Entity Linking-System
(CLEL), um chinesische Entitäten, wie etwa Personen und Orte, mit den entsprechenden
Wikipediaseiten zu verknüpfen.
Das Hauptaugenmerk ist es, die Sprachbarriere zwischen chinesischen Entitäten und
englischer Datenbank zu durchbrechen, und Synonymie und Polysemie der chinesis-
chen Entitäten aufzulösen. Um diese Probleme anzugehen, erstellen wir eine cross
linguale Taxonomie und eine chinesische Datenbank. Wir untersuchen zwei Methoden,
die Repräsentation der Anfrage und die Repräsentation der Datenbank zu verbinden.
Schließlich stellen wir ein einfaches und effektives generatives Modell vor, das auf unserem
System für die Teilnahme an der TAC KBP 2011 Evaluation basiert und eine erheblich
bessere Performanz erreichte.
Die zweite Strategie ist, Annotationen für Frage-Antwort-Systeme mit Hilfe von "Crowd-
sourcing" zu erstellen. "Crowdsourcing" bedeutet, eine Aufgabe via Internet an eine
große Menge an angeworbene Menschen zu verteilen, die diese simultan erledigen.
Verschiedene annotierte Daten sind notwendig, um die datengetriebenen statistischen
Lernalgorithmen zu trainieren, die unserem Frage-Antwort System zugrunde liegen. Wir
zeigen, wie die Annotationsaufgabe in Mikro-Aufgaben für das Crowdsourcing umgewan-
delt werden kann, wir untersuchen verschiedene statistische Methoden, um die Qualität
der Annotation aus dem Crowdsourcing zu erweitern, und schließlich nutzen wir die erwei-
erte Annotation, um Modelle zum Lernen von Ranglisten von Textabschnitten zu trainieren
Recommended from our members
First-year international Chinese undergraduate students' academic writing in the digital age
Driven by the desire to expand and deepen the understanding of the academic performance and multiliteracies development of international Chinese undergraduate students, and the current scarcity of research on the said topic, this study examined first-year international Chinese students' academic writing process, and how this process was situated in the cultural- historical context, and mediated by students' use of web-enabled resources and the their interaction with the social others. This research project comprised two parts, namely, a survey study and a multiple case study. The data were collected through survey responses, interviews, and real-time screen recordings.The most common challenges and strategies in academic writing for this group of students were investigated. While using digital resources and relying on past ESL training and writing experience were unsurprisingly chosen as the most convenient strategies, asking the instructors and teaching assistants for help was also found popular among the respondents. In addition, the participants were aware of the fact that, so far, digital tools were not able to solve all the challenges they encountered in academic writing, and the challenges were the result of an intricacy of influencing factors. Students did not demonstrate highly advanced skills in searching for resources and determining their credibility and authorship, and they did not seem to be bothered much by the frequency of all the transactions during the writing process and were generally happy with what they could find at their fingertips for achieving the short-term goal of finishing the academic assignment. Meanwhile, they were able to articulate a series of strategies and criteria to illustrate their basic multiliteracies skills, and they were cognizant of the inadequacy of their multiliteracies. These students interacted with various social others within the bounded system, and were influenced by and influencing others during this process. They generally preferred working alone and thinking independently on their writing assignments, but they also wished for clearer instructions and communications of expectations from the instructors and more opportunities to exchange views and ideas with other student groups. These students' academic writing and multiliteracies practices were deeply situated in their cultural, historical, and educational backgrounds, as well as the current social-academic context. Their decision-making process in relation to writing strategies and digital resources use embodied their constant negotiation with their multiple identities evolving from the past into the present. In addition to verifying previous research findings and filling gaps in the literature, identifying the emerging contradictions was another objective of this study. Gaps and misfits found in different levels and among different components in the bounded system provided implications for pedagogy and curriculum development and student academic support, and were expected to inspire further exploration of the related issues in future academic writing and multiliteracies studies
CLARIN. The infrastructure for language resources
CLARIN, the "Common Language Resources and Technology Infrastructure", has established itself as a major player in the field of research infrastructures for the humanities. This volume provides a comprehensive overview of the organization, its members, its goals and its functioning, as well as of the tools and resources hosted by the infrastructure. The many contributors representing various fields, from computer science to law to psychology, analyse a wide range of topics, such as the technology behind the CLARIN infrastructure, the use of CLARIN resources in diverse research projects, the achievements of selected national CLARIN consortia, and the challenges that CLARIN has faced and will face in the future.
The book will be published in 2022, 10 years after the establishment of CLARIN as a European Research Infrastructure Consortium by the European Commission (Decision 2012/136/EU)
Proceedings of the 17th Annual Conference of the European Association for Machine Translation
Proceedings of the 17th Annual Conference of the European Association for Machine Translation (EAMT
CLARIN
The book provides a comprehensive overview of the Common Language Resources and Technology Infrastructure – CLARIN – for the humanities. It covers a broad range of CLARIN language resources and services, its underlying technological infrastructure, the achievements of national consortia, and challenges that CLARIN will tackle in the future. The book is published 10 years after establishing CLARIN as an Europ. Research Infrastructure Consortium
CLARIN
The book provides a comprehensive overview of the Common Language Resources and Technology Infrastructure – CLARIN – for the humanities. It covers a broad range of CLARIN language resources and services, its underlying technological infrastructure, the achievements of national consortia, and challenges that CLARIN will tackle in the future. The book is published 10 years after establishing CLARIN as an Europ. Research Infrastructure Consortium
A new approach to CALL content authoring
[no abstract
Cultura-Inspired Intercultural Exchanges: Focus on Asian and Pacific Languages
Although many online intercultural exchanges have been conducted based on the groundbreaking Cultura model, most to date have been between and among European languages. This volume presents several chapters with a focus on exchanges involving Asian and Pacific languages. Many of the benefits and challenges of these exchanges are similar to those reported for European languages; however, some of the difficulties reported in the Chinese and Japanese exchanges might be due to the significant linguistic differences between English and East Asian languages. This volume adds to the body of emerging studies of telecollaboration among learners of Asian and Pacific languages
New Data-Driven Approaches to Text Simplification
A thesis submitted in partial fulfilment of the requirements of the University of
Wolverhampton for the degree of Doctor of PhilosophyMany texts we encounter in our everyday lives are lexically and syntactically very complex. This makes them difficult to understand for people with intellectual or reading impairments, and difficult for various natural language processing systems to process. This motivated the need for text simplification (TS) which transforms texts into their simpler variants. Given that this is still a relatively new research area, many challenges are still remaining. The focus of this thesis is on better understanding the current problems in automatic text simplification (ATS) and proposing new data-driven approaches to solving them. We propose methods for learning sentence splitting and deletion decisions, built upon parallel corpora of original and manually simplified Spanish texts, which outperform the existing similar systems. Our experiments in adaptation of those methods to different text genres and target populations report promising results, thus offering one possible solution for dealing with the scarcity of parallel corpora for text simplification aimed at specific target populations, which is currently one of the main issues in ATS. The results of our extensive analysis of the phrase-based statistical machine translation (PB-SMT) approach to ATS reject the widespread assumption that the success of that approach largely depends on the size of the training and development datasets. They indicate more influential factors for the success of the PB-SMT approach to ATS, and reveal some important differences between cross-lingual MT and the monolingual v MT used in ATS. Our event-based system for simplifying news stories in English (EventSimplify) overcomes some of the main problems in ATS. It does not require a large number of handcrafted simplification rules nor parallel data, and it performs significant content reduction. The automatic and human evaluations conducted show that it produces grammatical text and increases readability, preserving and simplifying relevant content and reducing irrelevant content. Finally, this thesis addresses another important issue in TS which is how to automatically evaluate the performance of TS systems given that access to the target users might be difficult. Our experiments indicate that existing readability metrics can successfully be used for this task when enriched with human evaluation of grammaticality and preservation of meaning
Principles and Applications of Data Science
Data science is an emerging multidisciplinary field which lies at the intersection of computer science, statistics, and mathematics, with different applications and related to data mining, deep learning, and big data. This Special Issue on “Principles and Applications of Data Science” focuses on the latest developments in the theories, techniques, and applications of data science. The topics include data cleansing, data mining, machine learning, deep learning, and the applications of medical and healthcare, as well as social media
- …