3,583 research outputs found

    Tool for journalists to edit the text generation logic of an automated journalist

    Get PDF
    Automated journalism means writing fact-based articles based on structured data using algorithms or software. The advantages of automated journalism are scalability, speed and lower costs. The limitations of it are fluency, quality of writing and limited perception. In this thesis, the different implementation methods of automated journalism were compared. These implementation methods were templates, decision trees, fact ranking method and different machine learning solutions. It was found out that no implementation method was strictly better than others but all had distinct advantages and disadvantages. When selecting an implementation method these factors should be taken into account and weighed. Finnish national broadcasting company Yle’s automated journalist Voitto-robot was discussed. Voitto’s implementation is based on templates and decision trees. While Voitto’s text generation is easily modifiable and transparent due to its implementation method, this was only available to programmers. The decision trees were implemented directly in the code which made them hard to understand and the template files were too complex to be easily edited. In this thesis, a proof-of-concept web application was made to allow journalists and other content creators the possibility to edit the templates and decision trees of Voitto independently. The created software was analysed and it was found that it helped journalists understand the text generation and modify it as they wanted. Even in its proof-of-concept state, it was good enough to be used to automate election reporting for the Finnish parliamentary election of 2019

    Automatic Generation of Factual News Headlines in Finnish

    Get PDF
    Peer reviewe

    Menetelmiä luonnollisella kielellä kirjoitettujen raporttien automaattiseen tuottamiseen

    Get PDF
    The use of computer software to automatically produce natural language texts expressing factual content is of interest to practitioners of multiple fields, ranging from journalists to researchers to educators. This thesis studies natural language report generation from structured data for the purposes of journalism. The topic is approached from three directions. First, we approach the problem from the perspective of analysing what requirements the journalistic domain imposes on the software, and how software might be architectured to account for the requirements. This includes identifying the key domain norms (such as the "objectivity norm") and business requirements (such as system transferability) and mapping them to software requirements. Based on the identified requirements, we then describe how a modular data-to-text approach to natural language generation can be implemented in the specific context of hard news reporting. Second, we investigate how the highly domain-specific natural language generation subtask of document planning - deciding what information is to be included in an automatically produced text, and in what order - might be conducted in a less domain-specific manner. To this end, we describe an approach to operationalizing the complex concept of "newsworthiness" in a manner where a natural language generation system can employ it. We also present a broadly applicable baseline method for structuring the content in a data-to-text setting without explicit domain knowledge. Third, we discuss how bias in text generation systems is perceived by key stakeholders, and whether those perceptions align with the reality of news automation. This discussion includes identifying how automated systems might exhibit bias and how the biases might be - potentially unconsciously - embedded in the systems. As a result, we conclude that common perceptions of automated journalism as fundamentally "unbiased" are unfounded, and that beliefs about "unbiased" automation might have the negative effect of further entrenching pre-existing biases in organizations or society. Together, through these three avenues, the thesis sketches out a way towards more widespread use of news automation in newsrooms, taking into account the various ethical questions associated with the use of such systems.Tämä väitöskirja käsittelee luonnollisen kielen – siis esimerkiksi suomen tai englannin kielen – tuottamista automaattisesti sellaisissa yhteyksissä, joissa kielen asiasisällön oikeellisuus on kriittistä. Tällaisia tietokonejärjestelmiä käytetään esimerkiksi säätiedotteiden, urheilu- ja talousuutisten sekä potilaskuvausten kirjoittamiseen. Väitöskirja lähestyy aihetta kolmesta eri näkökulmasta, keskittyen erityisesti journalismiin. Ensimmäisenä väitöskirjassa tarkastellaan, kuinka journalistinen konteksti vaikuttaa siihen, kuinka luonnollista kieltä tuottava tietokonejärjestelmä tulisi rakentaa. Väitöskirjassa analysoidaan journalismiin liittyviä normeja ja käytäntöjä ja siirretään ne ohjelmistotuotannollisiksi vaatimuksiksi. Vaatimusten pohjalta väitöskirjassa tunnistetaan journalistisiin tarkoituksiin sopiva luonnollisen kielen tuotannon ohjelmistoarkkitehtuuri. Toiseksi väitöskirjassa perehdytään luonnollisen kielen tuotannon yhteen aliongelmaan, tekstinsuunnitteluun. Tekstinsuunnitteluvaiheessa valitaan ne tietoalkiot, jotka tekstiin sisällytetään, ja järjestetään valitut tietoalkiot siten, että ne muodostavat ymmärrettävän tekstin. Tätä työvaihetta on yleisesti pidetty eräänä tekstintuotannon “sovelluskohderiippuvaisimmista” vaiheista. Tämä tarkoittaa sitä, että se pitää ratkaista erikseen jokaiselle eri sovellukselle: vaaliuutisia jäsentävä menetelmä ei välttämättä sovellu talousuutisten jäsentämiseen. Väitöskirjassa analysoidaan journalismissa käytettyä “uutisarvon” käsitettä ja kuvataan siihen perustuva menetelmä tietoalkioiden valinnalle. Lisäksi väitöskirjassa esitellään tietoalkioiden järjestämiseen laaja-alaisesti soveltuva menetelmä. Yhdessä nämä menetelmät yksinkertaistavat uusien tekstintuotantojärjestelmien rakentamista tietyissä konteksteissa. Kolmanneksi väitöskirjassa käsitellään tekstintuotantojärjestelmien vinoumia. Kirjassa kuvataan, kuinka automaattisen tekstintuotannon journalistisen käytön kannalta avainasemassa olevat henkilöt näkevät vinoumien uhkan ja kuinka nämä näkemykset vastaavat automaattisen tekstintuotannon todellisuutta. Tarkemmin kirjassa kuvataan, millaisia vinoumia automaattisen tekstintuotannon järjestelmistä saattaa löytyä ja kuinka vinoumat voivat päätyä järjestelmiin. Tältä osin väitöskirjan päätelmä on, että automaattisen tekstintuotannon järjestelmiä ei tulisi pitää lähtökohtaisesti vähemmän vinoutuneina kuin ihmisiä ja että uskomukset automaattisten menetelmien sisäänrakennetusta “reiluudesta” saattavat johtaa epätoivottuihin vaikutuksiin organisaatioiden ja yhteiskunnan vinoumia vakiinnuttaen. Näiden kolmen näkökulman kautta väitöskirjassa hahmotellaan tietä automaattisten tekstintuotannon järjestelmien laajemmalle käytöllä erityisesti uutishuoneissa eettisesti kestävällä tavalla

    Automation in Sports Reporting: Strategies of Data Providers, Software Providers, and Media Outlets

    Get PDF
    This study examines how algorithmic processing affects structures and practices in sports journalism in Germany. A multi-level perspective is used to determine which strategies data providers, software providers, and media outlets use to develop automated reporting, which compiles perspectives across the entire line of news production. The results of 11 in-depth interviews show that non-journalistic actors are vital partners in the news production process, as all actors work together in data handling, training, and software development. Moreover, automation can generate additional content such as match and historical coverage to help address shortfalls in capacity. However, given the business case for automation, amateur football (soccer) is currently the only viable candidate for its use. Many actors involved in the process argue that automated content is an added value for their readers, but claim that content quality has to be put before quantity. This means that some media outlets edit automated articles to increase the quality of their sports journalism, but that this is done only on a small scale. Media outlets do not perceive their roles to be changing, but see automation as a helpful tool that complements their work; a few use automatically created articles as a baseline for in-depth reporting. Moreover, the so-called ‘meta-writer’ has not become a reality yet, as data-processing and news writing are still kept separate. This article sheds new light on the use of automation in the sports beat, highlighting the growing role of non-journalistic actors in the news production process

    Conditional Neural Headline Generation for Finnish

    Get PDF
    Automatic headline generation has the potential to significantly assist editors charged with head- lining articles. Approaches to automation in the headlining process can range from tools as creative aids, to complete end to end automation. The latter is difficult to achieve as journalistic require- ments imposed on headlines must be met with little room for error, with the requirements depending on the news brand in question. This thesis investigates automatic headline generation in the context of the Finnish newsroom. The primary question I seek to answer is how well the current state of text generation using deep neural language models can be applied to the headlining process in Finnish news media. To answer this, I have implemented and pre-trained a Finnish generative language model based on the Transformer architecture. I have fine-tuned this language model for headline generation as autoregression of headlines conditioned on the article text. I have designed and implemented a variation of the Diverse Beam Search algorithm, with additional parameters, to perform the headline generation in order to generate a diverse set of headlines for a given text. The evaluation of the generative capabilities of this system was done with real world usage in mind. I asked domain-experts in headlining to evaluate a generated set of text-headline pairs. The task was to accept or reject the individual headlines in key criteria. The responses of this survey were then quantitatively and qualitatively analyzed. Based on the analysis and feedback, this model can already be useful as a creative aid in the newsroom despite being far from ready for automation. I have identified concrete improvement directions based on the most common types of errors, and this provides interesting future work
    corecore