1,585 research outputs found

    MenetelmiÀ luonnollisella kielellÀ kirjoitettujen raporttien automaattiseen tuottamiseen

    Get PDF
    The use of computer software to automatically produce natural language texts expressing factual content is of interest to practitioners of multiple fields, ranging from journalists to researchers to educators. This thesis studies natural language report generation from structured data for the purposes of journalism. The topic is approached from three directions. First, we approach the problem from the perspective of analysing what requirements the journalistic domain imposes on the software, and how software might be architectured to account for the requirements. This includes identifying the key domain norms (such as the "objectivity norm") and business requirements (such as system transferability) and mapping them to software requirements. Based on the identified requirements, we then describe how a modular data-to-text approach to natural language generation can be implemented in the specific context of hard news reporting. Second, we investigate how the highly domain-specific natural language generation subtask of document planning - deciding what information is to be included in an automatically produced text, and in what order - might be conducted in a less domain-specific manner. To this end, we describe an approach to operationalizing the complex concept of "newsworthiness" in a manner where a natural language generation system can employ it. We also present a broadly applicable baseline method for structuring the content in a data-to-text setting without explicit domain knowledge. Third, we discuss how bias in text generation systems is perceived by key stakeholders, and whether those perceptions align with the reality of news automation. This discussion includes identifying how automated systems might exhibit bias and how the biases might be - potentially unconsciously - embedded in the systems. As a result, we conclude that common perceptions of automated journalism as fundamentally "unbiased" are unfounded, and that beliefs about "unbiased" automation might have the negative effect of further entrenching pre-existing biases in organizations or society. Together, through these three avenues, the thesis sketches out a way towards more widespread use of news automation in newsrooms, taking into account the various ethical questions associated with the use of such systems.TĂ€mĂ€ vĂ€itöskirja kĂ€sittelee luonnollisen kielen – siis esimerkiksi suomen tai englannin kielen – tuottamista automaattisesti sellaisissa yhteyksissĂ€, joissa kielen asiasisĂ€llön oikeellisuus on kriittistĂ€. TĂ€llaisia tietokonejĂ€rjestelmiĂ€ kĂ€ytetÀÀn esimerkiksi sÀÀtiedotteiden, urheilu- ja talousuutisten sekĂ€ potilaskuvausten kirjoittamiseen. VĂ€itöskirja lĂ€hestyy aihetta kolmesta eri nĂ€kökulmasta, keskittyen erityisesti journalismiin. EnsimmĂ€isenĂ€ vĂ€itöskirjassa tarkastellaan, kuinka journalistinen konteksti vaikuttaa siihen, kuinka luonnollista kieltĂ€ tuottava tietokonejĂ€rjestelmĂ€ tulisi rakentaa. VĂ€itöskirjassa analysoidaan journalismiin liittyviĂ€ normeja ja kĂ€ytĂ€ntöjĂ€ ja siirretÀÀn ne ohjelmistotuotannollisiksi vaatimuksiksi. Vaatimusten pohjalta vĂ€itöskirjassa tunnistetaan journalistisiin tarkoituksiin sopiva luonnollisen kielen tuotannon ohjelmistoarkkitehtuuri. Toiseksi vĂ€itöskirjassa perehdytÀÀn luonnollisen kielen tuotannon yhteen aliongelmaan, tekstinsuunnitteluun. Tekstinsuunnitteluvaiheessa valitaan ne tietoalkiot, jotka tekstiin sisĂ€llytetÀÀn, ja jĂ€rjestetÀÀn valitut tietoalkiot siten, ettĂ€ ne muodostavat ymmĂ€rrettĂ€vĂ€n tekstin. TĂ€tĂ€ työvaihetta on yleisesti pidetty erÀÀnĂ€ tekstintuotannon “sovelluskohderiippuvaisimmista” vaiheista. TĂ€mĂ€ tarkoittaa sitĂ€, ettĂ€ se pitÀÀ ratkaista erikseen jokaiselle eri sovellukselle: vaaliuutisia jĂ€sentĂ€vĂ€ menetelmĂ€ ei vĂ€lttĂ€mĂ€ttĂ€ sovellu talousuutisten jĂ€sentĂ€miseen. VĂ€itöskirjassa analysoidaan journalismissa kĂ€ytettyĂ€ “uutisarvon” kĂ€sitettĂ€ ja kuvataan siihen perustuva menetelmĂ€ tietoalkioiden valinnalle. LisĂ€ksi vĂ€itöskirjassa esitellÀÀn tietoalkioiden jĂ€rjestĂ€miseen laaja-alaisesti soveltuva menetelmĂ€. YhdessĂ€ nĂ€mĂ€ menetelmĂ€t yksinkertaistavat uusien tekstintuotantojĂ€rjestelmien rakentamista tietyissĂ€ konteksteissa. Kolmanneksi vĂ€itöskirjassa kĂ€sitellÀÀn tekstintuotantojĂ€rjestelmien vinoumia. Kirjassa kuvataan, kuinka automaattisen tekstintuotannon journalistisen kĂ€ytön kannalta avainasemassa olevat henkilöt nĂ€kevĂ€t vinoumien uhkan ja kuinka nĂ€mĂ€ nĂ€kemykset vastaavat automaattisen tekstintuotannon todellisuutta. Tarkemmin kirjassa kuvataan, millaisia vinoumia automaattisen tekstintuotannon jĂ€rjestelmistĂ€ saattaa löytyĂ€ ja kuinka vinoumat voivat pÀÀtyĂ€ jĂ€rjestelmiin. TĂ€ltĂ€ osin vĂ€itöskirjan pÀÀtelmĂ€ on, ettĂ€ automaattisen tekstintuotannon jĂ€rjestelmiĂ€ ei tulisi pitÀÀ lĂ€htökohtaisesti vĂ€hemmĂ€n vinoutuneina kuin ihmisiĂ€ ja ettĂ€ uskomukset automaattisten menetelmien sisÀÀnrakennetusta “reiluudesta” saattavat johtaa epĂ€toivottuihin vaikutuksiin organisaatioiden ja yhteiskunnan vinoumia vakiinnuttaen. NĂ€iden kolmen nĂ€kökulman kautta vĂ€itöskirjassa hahmotellaan tietĂ€ automaattisten tekstintuotannon jĂ€rjestelmien laajemmalle kĂ€ytöllĂ€ erityisesti uutishuoneissa eettisesti kestĂ€vĂ€llĂ€ tavalla

    Ludwig Wittgenstein & Gertrude Stein – Meeting in Language

    Get PDF
    Former Director of Studies: Professor Antonio CaroniaTine Melzer: Ludwig Wittgenstein & Gertrude Stein – Meeting in Language The purpose of this study is to show transitions between verbal and visual meaning in ordinary language, based on philosophical concepts and conceptual artworks. It offers models for artistic research and collaboration in arts and science. Shared experiences in ordinary language are fundamental to this thesis and make it an accessible and trans-disciplinary study. Language as such, is approached from different practices and disciplines and becomes the central object of investigation. The research introduces a general set of mechanisms in language, stemming from the Wittgensteinian notion of the language-game. The study examines the possibility of a meeting between the philosopher Ludwig Wittgenstein and the writer Gertrude Stein in a linguistic, biographical and poetic sense. The main claim is that Wittgenstein and Stein share the understanding of language as a game, which is a fruitful principle for artistic and poetic production. Gertrude Stein developed a dimension in her writing which partly succeeds in showing this notion of creating meaning-as-practice and making sense on the ‘edge’ of conventional meaning. In this way she augments Wittgenstein’s idea of the language-game and puts it into practice, tests its limits on her own language and on the reader’s habits. The artistic works represented in this thesis are equally experimental tests of Wittgenstein’s meaning-as-use hypothesis. They put his ideas into practice. They extend the research with strategies from the arts, poetry and fiction. The methodology of the research is based on Wittgenstein’s notion of meaning as context-dependent use. This concept defines the meaning of a word by the way it is used in a specific context. This perspective is then challenged with visual artistic work. This hypothesis is tested throughout the research by applying tools and concepts from several practices, like computer linguistic tools, collaboration with writers and artists from other fields and autonomous visual and poetic work to augment the study of facts. Conceptual artworks, often produced in collaboration, function as language experiments, or language-games. The Wittgensteinian differentiation between what can be shown and what can be said is examined. The context of the research lies in the practices developed as a conceptual artist in which theoretical research informs artistic practice. This thesis, on the border between verbal and visual language, is founded upon antecedent studies in philosophy of language and the practice of Fine Arts. Against this background the research focuses on the relationship between word, context and meaning: issues of communication, ordinary language, words and their composition, context-based meaning, naming visual phenomena, examination of word-and-world-relationships and vocabularies. Main sources are the major works and biographies of Ludwig Wittgenstein, Gertrude Stein, the critical work of Marjorie Perloff, language philosophers concerned with ordinary language and the contrastive corpus linguistic approach. The results of this research are generated by several interdisciplinary productive methods. Artworks, poetic and scientific work, all of which employ modes of language, and whose their domains overlap. Additionally, the notion of meeting acts as model metaphor for the development of a solid trans-disciplinary methodology for research between science and the arts. One major result of comparing their ideas on language is reflected in the meeting of the language used by Wittgenstein and Stein. Their meeting is materialized in the computer generated Shared Vocabulary, which is a list of words which both Wittgenstein and Stein used in their writing. It applies linguistic tools from contrastive corpus linguistics to compare their vocabularies (corpora), which offers new methods for investigating the works of the philosopher Wittgenstein and writer Stein. Generally, this thesis may act as an introduction to language as ideal fundament for interdisciplinary study. The application of the principle of the language-game (Wittgenstein) is a significant of displaying possible strategies for artists and researchers who work transdisciplinarily. The research results directly inform practice and practitioners from other fields, which means that collaboration is central to the research. It implies that language permeates every sort of research, art and its discourse. It also suggests that the meaning of words and images depend on their use, which extends the Wittgensteinian meaning-as-use hypothesis to visual language. The findings of the research on vocabularies are quite specific, but they overlap with offering simple general mechanisms of the language-game. The consequent alliance of the discussion with the language of the everyday makes the research a general contribution to everyone who is genuinely interested in language and the arts.Parts of this research were supported by The Netherlands Foundation for Visual Arts, Design and Architecture (Fonds BKVB, Studiebeurs)and Prins Bernhard Cultuurfonds Amsterdam (Cultuurfondsbeurs

    Jewish Studies in the Digital Age

    Get PDF
    The digitisation boom of the last two decades, and the rapid advancement of digital tools to analyse data in myriad ways, have opened up new avenues for humanities research. This volume discusses how the so-called digital turn has affected the field of Jewish Studies, explores the current state of the art and probes how digital developments can be harnessed to address the specific questions, challenges and problems in the field

    Processing temporal information in unstructured documents

    Get PDF
    Tese de doutoramento, InformĂĄtica (CiĂȘncia da Computação), Universidade de Lisboa, Faculdade de CiĂȘncias, 2013Temporal information processing has received substantial attention in the last few years, due to the appearance of evaluation challenges focused on the extraction of temporal information from texts written in natural language. This research area belongs to the broader field of information extraction, which aims to automatically find specific pieces of information in texts, producing structured representations of that information, which can then be easily used by other computer applications. It has the potential to be useful in several applications that deal with natural language, given that many languages, among which we find Portuguese, extensively refer to time. Despite that, temporal processing is still incipient for many language, Portuguese being one of them. The present dissertation has various goals. On one hand, it addresses this current gap, by developing and making available resources that support the development of tools for this task, employing this language, and also by developing precisely this kind of tools. On the other hand, its purpose is also to report on important results of the research on this area of temporal processing. This work shows how temporal processing requires and benefits from modeling different kinds of knowledge: grammatical knowledge, logical knowledge, knowledge about the world, etc. Additionally, both machine learning methods and rule-based approaches are explored and used in the development of hybrid systems that are capable of taking advantage of the strengths of each of these two types of approach.O processamento de informação temporal tem recebido bastante atenção nos Ășltimos anos, devido ao surgimento de desafios de avaliação focados na extração de informação temporal de textos escritos em linguagem natural. Esta ĂĄrea de investigação enquadra-se no campo mais lato da extração de informação, que visa encontrar automaticamente informação especĂ­fica presente em textos, produzindo representaçÔes estruturadas da mesma, que podem depois ser facilmente utilizadas por outras aplicaçÔes computacionais. Tem o potencial de ser Ăștil em diversas aplicaçÔes que lidam com linguagem natural, dado o carĂĄter quase ubĂ­quo da referĂȘncia ao tempo cronĂłlogico em muitas lĂ­nguas, entre as quais o PortuguĂȘs. Apesar de tudo, o processamento temporal encontra-se ainda incipiente para bastantes lĂ­nguas, sendo o PortuguĂȘs uma delas. A presente dissertação tem vĂĄrios objetivos. Por um lado vem colmatar esta lacuna existente, desenvolvendo e disponibilizando recursos que suportam o desenvolvimento de ferramentas para esta tarefa, utilizando esta lĂ­ngua, e desenvolvendo tambĂ©m precisamente este tipo de ferramentas. Por outro serve tambĂ©m para relatar resultados importantes da pesquisa nesta ĂĄrea do processamento temporal. Neste trabalho, mostra- -se como o processamento temporal requer e beneficia da modelação de conhecimento de diversos nĂ­veis: gramatical, lĂłgico, acerca do mundo, etc. Adicionalmente, sĂŁo explorados tanto mĂ©todos de aprendizagem automĂĄtica como abordagens baseadas em regras, desenvolvendo-se sistemas hĂ­bridos capazes de tirar partido das vantagens de cada um destes dois tipos de abordagem.Fundação para a CiĂȘncia e a Tecnologia (FCT, SFRH/BD/40140/2007

    Dublin Institute of Technology, Kevin Street : Calendar 1991/92

    Get PDF
    Calendar of academic year 1991/92. Contents include. DIT Courses, fee structures, undergrad programmes, short courses, fees, research & development, campus companies, student services, college regulations, Graduates and prizewinners, awards and external examiners, advisory services for prospective students, college structures, college staff and college library. Foreward by F.M. Brennan, President

    Jewish Studies in the Digital Age

    Get PDF
    The digitisation boom of the last two decades, and the rapid advancement of digital tools to analyse data in myriad ways, have opened up new avenues for humanities research. This volume discusses how the so-called digital turn has affected the field of Jewish Studies, explores the current state of the art and probes how digital developments can be harnessed to address the specific questions, challenges and problems in the field

    Teaching Classics in the Digital Age

    Get PDF
    The papers and videos presented here are the result of the international conference 'Teaching Classics in the Digital Age' held online on the 15 and 16 June 2020. As digital media provide new possibilities for teaching and outreach in Classics, the conference 'Teaching Classics in the Digital Age' aimed at presenting current approaches to digital teaching and sharing best practices by bringing together different projects and practitioners from all fields of Classics (including Classical Archaeology, Greek and Latin Studies and Ancient History). Furthermore, it aimed at starting a discussion about principles, problems and the future of teaching Classics in the 21st century within and beyond its single fields

    Creating a frequency-based Turkish-English Loanword Cognates Word List (TELCWL)

    Get PDF
    This lexical study aims to establish a frequency-based Turkish-English Loanword Cognates Word List (TELCWL) to assist Turkish English learners’ improvement in English language learning and the corresponding pedagogical practice. A final list of 582 Turkish-English loan-based cognate word pairs was derived from the New General Service List (NGSL) and the Frequency Dictionary of Turkish (FDT). For pedagogical purposes, the TELCWL was divided into five sublists with different features of the cognates in spelling and pronunciation. The coverages of the TELCWL were particularly high in discipline and field-specific corpora on average compared to general service written (5%) and spoken corpora (3.5%), accounting for more than 7%. This result suggests that the TELCWL may be more beneficial for enhancing learners’ reading and writing ability; in addition, not only general Turkish English learners but also learners who need to improve their English language proficiency in specific disciplines can benefit from the TELCWL. Further pedagogical implications are made for English instructors regarding the employment of the TELCWL in English classrooms in Turkey

    CyberResearch on the Ancient Near East and Eastern Mediterranean

    Get PDF
    CyberResearch on the Ancient Near East and Neighboring Regions provides case studies on archaeology, objects, cuneiform texts, and online publishing, digital archiving, and preservation. Eleven chapters present a rich array of material, spanning the fifth through the first millennium BCE, from Anatolia, the Levant, Mesopotamia, and Iran. Customized cyber- and general glossaries support readers who lack either a technical background or familiarity with the ancient cultures. Edited by Vanessa Bigot Juloux, Amy Rebecca Gansell, and Alessandro Di Ludovico, this volume is dedicated to broadening the understanding and accessibility of digital humanities tools, methodologies, and results to Ancient Near Eastern Studies. Ultimately, this book provides a model for introducing cyber-studies to the mainstream of humanities research
    • 

    corecore