Search CORE

1,585 research outputs found

Menetelmiä luonnollisella kielellä kirjoitettujen raporttien automaattiseen tuottamiseen

Author: Leppänen Leo
Publication venue: 'University of Helsinki Libraries'
Publication date: 21/04/2023
Field of study

The use of computer software to automatically produce natural language texts expressing factual content is of interest to practitioners of multiple fields, ranging from journalists to researchers to educators. This thesis studies natural language report generation from structured data for the purposes of journalism. The topic is approached from three directions. First, we approach the problem from the perspective of analysing what requirements the journalistic domain imposes on the software, and how software might be architectured to account for the requirements. This includes identifying the key domain norms (such as the "objectivity norm") and business requirements (such as system transferability) and mapping them to software requirements. Based on the identified requirements, we then describe how a modular data-to-text approach to natural language generation can be implemented in the specific context of hard news reporting. Second, we investigate how the highly domain-specific natural language generation subtask of document planning - deciding what information is to be included in an automatically produced text, and in what order - might be conducted in a less domain-specific manner. To this end, we describe an approach to operationalizing the complex concept of "newsworthiness" in a manner where a natural language generation system can employ it. We also present a broadly applicable baseline method for structuring the content in a data-to-text setting without explicit domain knowledge. Third, we discuss how bias in text generation systems is perceived by key stakeholders, and whether those perceptions align with the reality of news automation. This discussion includes identifying how automated systems might exhibit bias and how the biases might be - potentially unconsciously - embedded in the systems. As a result, we conclude that common perceptions of automated journalism as fundamentally "unbiased" are unfounded, and that beliefs about "unbiased" automation might have the negative effect of further entrenching pre-existing biases in organizations or society. Together, through these three avenues, the thesis sketches out a way towards more widespread use of news automation in newsrooms, taking into account the various ethical questions associated with the use of such systems.Tämä väitöskirja käsittelee luonnollisen kielen – siis esimerkiksi suomen tai englannin kielen – tuottamista automaattisesti sellaisissa yhteyksissä, joissa kielen asiasisällön oikeellisuus on kriittistä. Tällaisia tietokonejärjestelmiä käytetään esimerkiksi säätiedotteiden, urheilu- ja talousuutisten sekä potilaskuvausten kirjoittamiseen. Väitöskirja lähestyy aihetta kolmesta eri näkökulmasta, keskittyen erityisesti journalismiin. Ensimmäisenä väitöskirjassa tarkastellaan, kuinka journalistinen konteksti vaikuttaa siihen, kuinka luonnollista kieltä tuottava tietokonejärjestelmä tulisi rakentaa. Väitöskirjassa analysoidaan journalismiin liittyviä normeja ja käytäntöjä ja siirretään ne ohjelmistotuotannollisiksi vaatimuksiksi. Vaatimusten pohjalta väitöskirjassa tunnistetaan journalistisiin tarkoituksiin sopiva luonnollisen kielen tuotannon ohjelmistoarkkitehtuuri. Toiseksi väitöskirjassa perehdytään luonnollisen kielen tuotannon yhteen aliongelmaan, tekstinsuunnitteluun. Tekstinsuunnitteluvaiheessa valitaan ne tietoalkiot, jotka tekstiin sisällytetään, ja järjestetään valitut tietoalkiot siten, että ne muodostavat ymmärrettävän tekstin. Tätä työvaihetta on yleisesti pidetty eräänä tekstintuotannon “sovelluskohderiippuvaisimmista” vaiheista. Tämä tarkoittaa sitä, että se pitää ratkaista erikseen jokaiselle eri sovellukselle: vaaliuutisia jäsentävä menetelmä ei välttämättä sovellu talousuutisten jäsentämiseen. Väitöskirjassa analysoidaan journalismissa käytettyä “uutisarvon” käsitettä ja kuvataan siihen perustuva menetelmä tietoalkioiden valinnalle. Lisäksi väitöskirjassa esitellään tietoalkioiden järjestämiseen laaja-alaisesti soveltuva menetelmä. Yhdessä nämä menetelmät yksinkertaistavat uusien tekstintuotantojärjestelmien rakentamista tietyissä konteksteissa. Kolmanneksi väitöskirjassa käsitellään tekstintuotantojärjestelmien vinoumia. Kirjassa kuvataan, kuinka automaattisen tekstintuotannon journalistisen käytön kannalta avainasemassa olevat henkilöt näkevät vinoumien uhkan ja kuinka nämä näkemykset vastaavat automaattisen tekstintuotannon todellisuutta. Tarkemmin kirjassa kuvataan, millaisia vinoumia automaattisen tekstintuotannon järjestelmistä saattaa löytyä ja kuinka vinoumat voivat päätyä järjestelmiin. Tältä osin väitöskirjan päätelmä on, että automaattisen tekstintuotannon järjestelmiä ei tulisi pitää lähtökohtaisesti vähemmän vinoutuneina kuin ihmisiä ja että uskomukset automaattisten menetelmien sisäänrakennetusta “reiluudesta” saattavat johtaa epätoivottuihin vaikutuksiin organisaatioiden ja yhteiskunnan vinoumia vakiinnuttaen. Näiden kolmen näkökulman kautta väitöskirjassa hahmotellaan tietä automaattisten tekstintuotannon järjestelmien laajemmalle käytöllä erityisesti uutishuoneissa eettisesti kestävällä tavalla

Helsingin yliopiston digitaalinen arkisto

Ludwig Wittgenstein & Gertrude Stein – Meeting in Language

Author: Melzer Tine
Publication venue: Plymouth University
Publication date: 01/01/2014
Field of study

Former Director of Studies: Professor Antonio CaroniaTine Melzer: Ludwig Wittgenstein & Gertrude Stein – Meeting in Language The purpose of this study is to show transitions between verbal and visual meaning in ordinary language, based on philosophical concepts and conceptual artworks. It offers models for artistic research and collaboration in arts and science. Shared experiences in ordinary language are fundamental to this thesis and make it an accessible and trans-disciplinary study. Language as such, is approached from different practices and disciplines and becomes the central object of investigation. The research introduces a general set of mechanisms in language, stemming from the Wittgensteinian notion of the language-game. The study examines the possibility of a meeting between the philosopher Ludwig Wittgenstein and the writer Gertrude Stein in a linguistic, biographical and poetic sense. The main claim is that Wittgenstein and Stein share the understanding of language as a game, which is a fruitful principle for artistic and poetic production. Gertrude Stein developed a dimension in her writing which partly succeeds in showing this notion of creating meaning-as-practice and making sense on the ‘edge’ of conventional meaning. In this way she augments Wittgenstein’s idea of the language-game and puts it into practice, tests its limits on her own language and on the reader’s habits. The artistic works represented in this thesis are equally experimental tests of Wittgenstein’s meaning-as-use hypothesis. They put his ideas into practice. They extend the research with strategies from the arts, poetry and fiction. The methodology of the research is based on Wittgenstein’s notion of meaning as context-dependent use. This concept defines the meaning of a word by the way it is used in a specific context. This perspective is then challenged with visual artistic work. This hypothesis is tested throughout the research by applying tools and concepts from several practices, like computer linguistic tools, collaboration with writers and artists from other fields and autonomous visual and poetic work to augment the study of facts. Conceptual artworks, often produced in collaboration, function as language experiments, or language-games. The Wittgensteinian differentiation between what can be shown and what can be said is examined. The context of the research lies in the practices developed as a conceptual artist in which theoretical research informs artistic practice. This thesis, on the border between verbal and visual language, is founded upon antecedent studies in philosophy of language and the practice of Fine Arts. Against this background the research focuses on the relationship between word, context and meaning: issues of communication, ordinary language, words and their composition, context-based meaning, naming visual phenomena, examination of word-and-world-relationships and vocabularies. Main sources are the major works and biographies of Ludwig Wittgenstein, Gertrude Stein, the critical work of Marjorie Perloff, language philosophers concerned with ordinary language and the contrastive corpus linguistic approach. The results of this research are generated by several interdisciplinary productive methods. Artworks, poetic and scientific work, all of which employ modes of language, and whose their domains overlap. Additionally, the notion of meeting acts as model metaphor for the development of a solid trans-disciplinary methodology for research between science and the arts. One major result of comparing their ideas on language is reflected in the meeting of the language used by Wittgenstein and Stein. Their meeting is materialized in the computer generated Shared Vocabulary, which is a list of words which both Wittgenstein and Stein used in their writing. It applies linguistic tools from contrastive corpus linguistics to compare their vocabularies (corpora), which offers new methods for investigating the works of the philosopher Wittgenstein and writer Stein. Generally, this thesis may act as an introduction to language as ideal fundament for interdisciplinary study. The application of the principle of the language-game (Wittgenstein) is a significant of displaying possible strategies for artists and researchers who work transdisciplinarily. The research results directly inform practice and practitioners from other fields, which means that collaboration is central to the research. It implies that language permeates every sort of research, art and its discourse. It also suggests that the meaning of words and images depend on their use, which extends the Wittgensteinian meaning-as-use hypothesis to visual language. The findings of the research on vocabularies are quite specific, but they overlap with offering simple general mechanisms of the language-game. The consequent alliance of the discussion with the language of the everyday makes the research a general contribution to everyone who is genuinely interested in language and the arts.Parts of this research were supported by The Netherlands Foundation for Visual Arts, Design and Architecture (Fonds BKVB, Studiebeurs)and Prins Bernhard Cultuurfonds Amsterdam (Cultuurfondsbeurs

Berner Fachhochschule: ARBOR

Plymouth Electronic Archive and Research Library

Jewish Studies in the Digital Age

Author
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 21/11/2022
Field of study

The digitisation boom of the last two decades, and the rapid advancement of digital tools to analyse data in myriad ways, have opened up new avenues for humanities research. This volume discusses how the so-called digital turn has affected the field of Jewish Studies, explores the current state of the art and probes how digital developments can be harnessed to address the specific questions, challenges and problems in the field

Directory of Open Access Books (DOAB)

Processing temporal information in unstructured documents

Author: Costa Francisco Nuno Quintiliano Mendonça Carapeto, 1980-
Publication venue
Publication date: 01/01/2013
Field of study

Tese de doutoramento, Informática (Ciência da Computação), Universidade de Lisboa, Faculdade de Ciências, 2013Temporal information processing has received substantial attention in the last few years, due to the appearance of evaluation challenges focused on the extraction of temporal information from texts written in natural language. This research area belongs to the broader field of information extraction, which aims to automatically find specific pieces of information in texts, producing structured representations of that information, which can then be easily used by other computer applications. It has the potential to be useful in several applications that deal with natural language, given that many languages, among which we find Portuguese, extensively refer to time. Despite that, temporal processing is still incipient for many language, Portuguese being one of them. The present dissertation has various goals. On one hand, it addresses this current gap, by developing and making available resources that support the development of tools for this task, employing this language, and also by developing precisely this kind of tools. On the other hand, its purpose is also to report on important results of the research on this area of temporal processing. This work shows how temporal processing requires and benefits from modeling different kinds of knowledge: grammatical knowledge, logical knowledge, knowledge about the world, etc. Additionally, both machine learning methods and rule-based approaches are explored and used in the development of hybrid systems that are capable of taking advantage of the strengths of each of these two types of approach.O processamento de informação temporal tem recebido bastante atenção nos últimos anos, devido ao surgimento de desafios de avaliação focados na extração de informação temporal de textos escritos em linguagem natural. Esta área de investigação enquadra-se no campo mais lato da extração de informação, que visa encontrar automaticamente informação específica presente em textos, produzindo representações estruturadas da mesma, que podem depois ser facilmente utilizadas por outras aplicações computacionais. Tem o potencial de ser útil em diversas aplicações que lidam com linguagem natural, dado o caráter quase ubíquo da referência ao tempo cronólogico em muitas línguas, entre as quais o Português. Apesar de tudo, o processamento temporal encontra-se ainda incipiente para bastantes línguas, sendo o Português uma delas. A presente dissertação tem vários objetivos. Por um lado vem colmatar esta lacuna existente, desenvolvendo e disponibilizando recursos que suportam o desenvolvimento de ferramentas para esta tarefa, utilizando esta língua, e desenvolvendo também precisamente este tipo de ferramentas. Por outro serve também para relatar resultados importantes da pesquisa nesta área do processamento temporal. Neste trabalho, mostra- -se como o processamento temporal requer e beneficia da modelação de conhecimento de diversos níveis: gramatical, lógico, acerca do mundo, etc. Adicionalmente, são explorados tanto métodos de aprendizagem automática como abordagens baseadas em regras, desenvolvendo-se sistemas híbridos capazes de tirar partido das vantagens de cada um destes dois tipos de abordagem.Fundação para a Ciência e a Tecnologia (FCT, SFRH/BD/40140/2007

Universidade de Lisboa: Repositório.UL

Dublin Institute of Technology, Kevin Street : Calendar 1991/92

Author: Dublin Institute of Technology
Publication venue: Dublin Institute of Technology
Publication date: 01/01/1991
Field of study

Calendar of academic year 1991/92. Contents include. DIT Courses, fee structures, undergrad programmes, short courses, fees, research & development, campus companies, student services, college regulations, Graduates and prizewinners, awards and external examiners, advisory services for prospective students, college structures, college staff and college library. Foreward by F.M. Brennan, President

Arrow@TUDublin

Recommended from our members

B!SON: A Tool for Open Access Journal Recommendation

Author: Entrup Elias
Eppelin Anita
Ewerth Ralph
Hartwig Josephine
Hoppe Anett
Tullney Marco
Wohlgemuth Michael
Publication venue: Heidelberg : Springer
Publication date: 01/01/2022
Field of study

Finding a suitable open access journal to publish scientific work is a complex task: Researchers have to navigate a constantly growing number of journals, institutional agreements with publishers, funders’ conditions and the risk of Predatory Publishers. To help with these challenges, we introduce a web-based journal recommendation system called B!SON. It is developed based on a systematic requirements analysis, built on open data, gives publisher-independent recommendations and works across domains. It suggests open access journals based on title, abstract and references provided by the user. The recommendation quality has been evaluated using a large test set of 10,000 articles. Development by two German scientific libraries ensures the longevity of the project

Repositorium für Naturwissenschaften und Technik

Jewish Studies in the Digital Age

Author
Publication venue: 'Walter de Gruyter GmbH'
Publication date
Field of study

OAPEN Library

Teaching Classics in the Digital Age

Author
Publication venue: 'Universitatsbibliothek Kiel'
Publication date: 01/01/2021
Field of study

The papers and videos presented here are the result of the international conference 'Teaching Classics in the Digital Age' held online on the 15 and 16 June 2020. As digital media provide new possibilities for teaching and outreach in Classics, the conference 'Teaching Classics in the Digital Age' aimed at presenting current approaches to digital teaching and sharing best practices by bringing together different projects and practitioners from all fields of Classics (including Classical Archaeology, Greek and Latin Studies and Ancient History). Furthermore, it aimed at starting a discussion about principles, problems and the future of teaching Classics in the 21st century within and beyond its single fields

MACAU: Open Access Repository of Kiel University

Creating a frequency-based Turkish-English Loanword Cognates Word List (TELCWL)

Author: Altunel Veysel
Yu Xiaoli
Publication venue: 'Kare Publishing'
Publication date: 01/12/2021
Field of study

This lexical study aims to establish a frequency-based Turkish-English Loanword Cognates Word List (TELCWL) to assist Turkish English learners’ improvement in English language learning and the corresponding pedagogical practice. A final list of 582 Turkish-English loan-based cognate word pairs was derived from the New General Service List (NGSL) and the Frequency Dictionary of Turkish (FDT). For pedagogical purposes, the TELCWL was divided into five sublists with different features of the cognates in spelling and pronunciation. The coverages of the TELCWL were particularly high in discipline and field-specific corpora on average compared to general service written (5%) and spoken corpora (3.5%), accounting for more than 7%. This result suggests that the TELCWL may be more beneficial for enhancing learners’ reading and writing ability; in addition, not only general Turkish English learners but also learners who need to improve their English language proficiency in specific disciplines can benefit from the TELCWL. Further pedagogical implications are made for English instructors regarding the employment of the TELCWL in English classrooms in Turkey

FELT - Focus on ELT Journal

OpenMETU (Middle East Technical University)

CyberResearch on the Ancient Near East and Eastern Mediterranean

Author
Publication venue: 'Brill'
Publication date: 01/04/2020
Field of study

CyberResearch on the Ancient Near East and Neighboring Regions provides case studies on archaeology, objects, cuneiform texts, and online publishing, digital archiving, and preservation. Eleven chapters present a rich array of material, spanning the fifth through the first millennium BCE, from Anatolia, the Levant, Mesopotamia, and Iran. Customized cyber- and general glossaries support readers who lack either a technical background or familiarity with the ancient cultures. Edited by Vanessa Bigot Juloux, Amy Rebecca Gansell, and Alessandro Di Ludovico, this volume is dedicated to broadening the understanding and accessibility of digital humanities tools, methodologies, and results to Ancient Near Eastern Studies. Ultimately, this book provides a model for introducing cyber-studies to the mainstream of humanities research

Directory of Open Access Books (DOAB)