    The Craft Edition:Unbox Caravan - Field Notes, from Goa 2017

    Suomenkielisen geojäsentimen kehittäminen: kuinka hankkia sijaintitietoa jäsentelemättömistä tekstiaineistoista

    Alati enemmän aineistoa tuotetaan ja jaetaan internetin kautta. Aineistot ovat vaihtelevia muodoiltaan, kuten verkkoartikkelien ja sosiaalisen media julkaisujen kaltaiset digitaaliset tekstit, ja niillä on usein spatiaalinen ulottuvuus. Teksteissä geospatiaalisuutta ilmaistaan paikannimien kautta, mutta tavanomaisilla paikkatietomenetelmillä ei kyetä käsittelemään tietoa epätäsmällisessä kielellisessä asussaan. Tämä on luonut tarpeen muuntaa tekstimuotoisen sijaintitiedon näkyvään muotoon, koordinaateiksi. Ongelmaa ratkaisemaan on kehitetty geojäsentimiä, jotka tunnistavat ja paikantavat paikannimet vapaista teksteistä, ja jotka oikein toimiessaan voisivat toimia paikkatiedon lähteenä maantieteellisessä tutkimuksessa. Geojäsentämistä onkin sovellettu katastrofihallinnasta kirjallisuudentutkimukseen. Merkittävässä osassa geojäsentämisen tutkimusta tutkimusaineiston kielenä on ollut englanti ja geojäsentimetkin ovat kielikohtaisia – tämä jättää pimentoon paitsi geojäsentimien kehitykseen vaikuttavat havainnot pienemmistä kielistä myös kyseisten kielten puhujien näkemykset. Maisterintutkielmassani pyrin vastaamaan kolmeen tutkimuskysymykseen: Mitkä ovat edistyneimmät geojäsentämismenetelmät? Mitkä kielelliset ja maantieteelliset monitulkintaisuudet vaikeuttavat tämän monitahoisen ongelman ratkaisua? Ja miten arvioida geojäsentimien luotettavuutta ja käytettävyyttä? Tutkielman soveltavassa osuudessa esittelen Fingerin, geojäsentimen suomen kielelle, ja kuvaan sen kehitystä sekä suorituskyvyn arviointia. Arviointia varten loin kaksi testiaineistoa, joista toinen koostuu Twitter-julkaisuista ja toinen uutisartikkeleista. Finger-geojäsennin, testiaineistot ja relevantit ohjelmakoodit jaetaan avoimesti. Geojäsentäminen voidaan jakaa kahteen alitehtävään: paikannimien tunnistamiseen tekstivirrasta ja paikannimien ratkaisemiseen oikeaan koordinaattipisteeseen mahdollisesti useasta kandidaatista. Molemmissa vaiheissa uusimmat metodit nojaavat syväoppimismalleihin ja -menetelmiin, joiden syötteinä ovat sanaupotusten kaltaiset vektorit. Geojäsentimien suoriutumista testataan aineistoilla, joissa paikannimet ja niiden koordinaatit tiedetään. Mittatikkuna tunnistamisessa on vastaavuus ja ratkaisemisessa etäisyys oikeasta sijainnista. Finger käyttää paikannimitunnistinta, joka hyödyntää suomenkielistä BERT-kielimallia, ja suoraviivaista tietokantahakua paikannimien ratkaisemiseen. Ohjelmisto tuottaa taulukkomuotoiseksi jäsenneltyä paikkatietoa, joka sisältää syötetekstit ja niistä mahdollisesti tunnistetut paikannimet koordinaattisijainteineen. Testiaineistot eroavat aihepiireiltään, mutta Finger suoriutuu niillä likipitäen samoin, ja suoriutuu englanninkielisillä aineistoilla tehtyihin arviointeihin suhteutettuna kelvollisesti. Virheanalyysi paljastaa useita virhelähteitä, jotka johtuvat kielten tai maantieteellisen todellisuuden luontaisesta epäselvyydestä tai ovat prosessoinnin aiheuttamia, kuten perusmuotoistamisvirheet. Kaikkia osia Fingerissä voidaan parantaa, muun muassa kehittämällä kielellistä käsittelyä pidemmälle ja luomalla kattavampia testiaineistoja. Samoin tulevaisuuden geojäsentimien tulee kyetä käsittelemään monimutkaisempia kielellisiä ja maantieteellisiä kuvaustapoja kuin pelkät paikannimet ja koordinaattipisteet. Finger ei nykymuodossaan tuota valmista paikkatietoa, jota kannattaisi kritiikittä käyttää. Se on kuitenkin lupaava ensiaskel suomen kielen geojäsentimille ja astinlauta vastaisuuden soveltavalle tutkimukselle.Ever more data is available and shared through the internet. The big data masses often have a spatial dimension and can take many forms, one of which are digital texts, such as articles or social media posts. The geospatial links in these texts are made through place names, also called toponyms, but traditional GIS methods are unable to deal with the fuzzy linguistic information. This creates the need to transform the linguistic location information to an explicit coordinate form. Several geoparsers have been developed to recognize and locate toponyms in free-form texts: the task of these systems is to be a reliable source of location information. Geoparsers have been applied to topics ranging from disaster management to literary studies. Major language of study in geoparser research has been English and geoparsers tend to be language-specific, which threatens to leave the experiences provided by studying and expressed in smaller languages unexplored. This thesis seeks to answer three research questions related to geoparsing: What are the most advanced geoparsing methods? What linguistic and geographical features complicate this multi-faceted problem? And how to evaluate the reliability and usability of geoparsers? The major contributions of this work are an open-source geoparser for Finnish texts, Finger, and two test datasets, or corpora, for testing Finnish geoparsers. One of the datasets consists of tweets and the other of news articles. All of these resources, including the relevant code for acquiring the test data and evaluating the geoparser, are shared openly. Geoparsing can be divided into two sub-tasks: recognizing toponyms amid text flows and resolving them to the correct coordinate location. Both tasks have seen a recent turn to deep learning methods and models, where the input texts are encoded as, for example, word embeddings. Geoparsers are evaluated against gold standard datasets where toponyms and their coordinates are marked. Performance is measured on equivalence and distance-based metrics for toponym recognition and resolution respectively. Finger uses a toponym recognition classifier built on a Finnish BERT model and a simple gazetteer query to resolve the toponyms to coordinate points. The program outputs structured geodata, with input texts and the recognized toponyms and coordinate locations. While the datasets represent different text types in terms of formality and topics, there is little difference in performance when evaluating Finger against them. The overall performance is comparable to the performance of geoparsers of English texts. Error analysis reveals multiple error sources, caused either by the inherent ambiguousness of the studied language and the geographical world or are caused by the processing itself, for example by the lemmatizer. Finger can be improved in multiple ways, such as refining how it analyzes texts and creating more comprehensive evaluation datasets. Similarly, the geoparsing task should move towards more complex linguistic and geographical descriptions than just toponyms and coordinate points. Finger is not, in its current state, a ready source of geodata. However, the system has potential to be the first step for geoparsers for Finnish and it can be a steppingstone for future applied research

    Developmentally appropriate guidelines for technology augmented pre-schooler toys

    Kim To Tse investigated the concerns in creating developmentally appropriate technology augmented pre-schooler toys. He found that parents and child development specialists care for pre-schoolers from different angles. His research outcomes advocate and support the vision of healthy implementation of technology in early childhood while sustaining the toy industry

    Pazarlama iletişimi çerçevesinde günümüz işletmeleri ve emoji kullanımı

    xii, 122 sayfarenkli resim, şekil : 29 cm. 1 CDÖZETEski medeniyetlerin kullandıkları hiyeroglif veya çivi yazılarının gün geçtikçepopülerliği artan emojilere dönüştüğü görebilmektedir. Emojiler, duyguları kelimelerkullanmadan ya da kelimelerin yetersiz kaldığı durumlarda hızlı ve doğru bir şekildeaktarabilmeye olanak sağlamaktadır. Emojilerin hızla gelişmesi ve 7’den 70’e birçokkişinin emojilere hızlı adaptasyonu, teknolojinin her geçen gün hız kesmedengelişmesinin bir sonucu olarak ortaya çıkmaktadır. Yurtdışında pazarlama iletişimiaraçlarından biri olarak kullanılan emojiler kelimelerin yerine geçerek duygularınaktarılması açısından önemli hale gelmiştir.Bu çalışma, keşifsel bir çalışma niteliği taşımasının yanı sıra örnek olay yöntemikullanılarak pazarlama iletişimi elemanlarının emojilere adaptasyonu ve başarılımarkaların emoji çalışmalarının doküman inceleme yöntemiyle değerlendirilmesinikapsamaktadır. Çalışma kapsamında birinci bölümde pazarlama iletişimi veelamanları açıklanırkenikinci bölümde emoji ve emotikon arasındaki farklaraçıklanarak literatür taraması yapılmış olup, üçüncü bölümde, kullanılan araştırmayöntemiyle markaların emojilere nasıl adapte oldukları ve başarılı emoji uygulamalarıincelenmiştir.ABSTRACTHieroglyphs or cuneiform scripts used by ancient civilizations can be seen toturn into emojis which is increasingly popular. Emojis enable emotions to be conveyed quickly and accurately without the use of words or when words are insufficient. The rapid growing of the emojis and the rapid adaptation of many people from 7 to 70 emojis emerge as a result of without pausing development in the technology day by day. Emojis, which are used as one of the marketing communication tools abroad, have become important for the transfer of emotions by replacing the words.This study, in addition to being an exploratory study, covers the adaptation ofmarketing communication elements to emojis by using case study method andevaluation of emoji studies of successful brands by document review method. In thefirst part of the study, marketing communication and elements are explainedin thesecond chapter, literature search was made by explaining the differences betweenemoji and emoticon, and in the third chapter, how the brands are adapted to emojis byusing the research method and successful emoji applications are examined

    Predicting Paid Certification in Massive Open Online Courses

    Massive open online courses (MOOCs) have been proliferating because of the free or low-cost offering of content for learners, attracting the attention of many stakeholders across the entire educational landscape. Since 2012, coined as “the Year of the MOOCs”, several platforms have gathered millions of learners in just a decade. Nevertheless, the certification rate of both free and paid courses has been low, and only about 4.5–13% and 1–3%, respectively, of the total number of enrolled learners obtain a certificate at the end of their courses. Still, most research concentrates on completion, ignoring the certification problem, and especially its financial aspects. Thus, the research described in the present thesis aimed to investigate paid certification in MOOCs, for the first time, in a comprehensive way, and as early as the first week of the course, by exploring its various levels. First, the latent correlation between learner activities and their paid certification decisions was examined by (1) statistically comparing the activities of non-paying learners with course purchasers and (2) predicting paid certification using different machine learning (ML) techniques. Our temporal (weekly) analysis showed statistical significance at various levels when comparing the activities of non-paying learners with those of the certificate purchasers across the five courses analysed. Furthermore, we used the learner’s activities (number of step accesses, attempts, correct and wrong answers, and time spent on learning steps) to build our paid certification predictor, which achieved promising balanced accuracies (BAs), ranging from 0.77 to 0.95. Having employed simple predictions based on a few clickstream variables, we then analysed more in-depth what other information can be extracted from MOOC interaction (namely discussion forums) for paid certification prediction. However, to better explore the learners’ discussion forums, we built, as an original contribution, MOOCSent, a cross- platform review-based sentiment classifier, using over 1.2 million MOOC sentiment-labelled reviews. MOOCSent addresses various limitations of the current sentiment classifiers including (1) using one single source of data (previous literature on sentiment classification in MOOCs was based on single platforms only, and hence less generalisable, with relatively low number of instances compared to our obtained dataset;) (2) lower model outputs, where most of the current models are based on 2-polar iii iv classifier (positive or negative only); (3) disregarding important sentiment indicators, such as emojis and emoticons, during text embedding; and (4) reporting average performance metrics only, preventing the evaluation of model performance at the level of class (sentiment). Finally, and with the help of MOOCSent, we used the learners’ discussion forums to predict paid certification after annotating learners’ comments and replies with the sentiment using MOOCSent. This multi-input model contains raw data (learner textual inputs), sentiment classification generated by MOOCSent, computed features (number of likes received for each textual input), and several features extracted from the texts (character counts, word counts, and part of speech (POS) tags for each textual instance). This experiment adopted various deep predictive approaches – specifically that allow multi-input architecture - to early (i.e., weekly) investigate if data obtained from MOOC learners’ interaction in discussion forums can predict learners’ purchase decisions (certification). Considering the staggeringly low rate of paid certification in MOOCs, this present thesis contributes to the knowledge and field of MOOC learner analytics with predicting paid certification, for the first time, at such a comprehensive (with data from over 200 thousand learners from 5 different discipline courses), actionable (analysing learners decision from the first week of the course) and longitudinal (with 23 runs from 2013 to 2017) scale. The present thesis contributes with (1) investigating various conventional and deep ML approaches for predicting paid certification in MOOCs using learner clickstreams (Chapter 5) and course discussion forums (Chapter 7), (2) building the largest MOOC sentiment classifier (MOOCSent) based on learners’ reviews of the courses from the leading MOOC platforms, namely Coursera, FutureLearn and Udemy, and handles emojis and emoticons using dedicated lexicons that contain over three thousand corresponding explanatory words/phrases, (3) proposing and developing, for the first time, multi-input model for predicting certification based on the data from discussion forums which synchronously processes the textual (comments and replies) and numerical (number of likes posted and received, sentiments) data from the forums, adapting the suitable classifier for each type of data as explained in detail in Chapter 7

    Information Visualisation Practices for Improving Patient Readability of Blood Pressure, Health Data, and Health Literacy

    Personal health data obtained through self-monitoring is often presented through standardised representations with little intrinsic meaning for those who may need it the most since low health literacy is associated with poor health. By failing to inform users about their health status, these representations can be dangerous, leaving patients feeling lost, confused, anxious, or even depressed. Information Visualisation can play an important role in aiding patients making sense of their health data and health status, as long as it's aligned with their needs, motivations, and goals. Following Human Centred Design practices, user research methods were applied in order to understand the context of self-monitorisation, as well as identifying which metrics differed the most from participants' mental models. Thanks to quantitative data obtained from a survey, Blood Pressure was identified as the most problematic health variable. A series of interviews allowed patients of chronic conditions to vocalize the challenges they faced in the management of their conditions. Taking into account information obtained from previous steps, multiple ways to map blood pressure data onto design elements were explored and different visualisations were designed. Finally, said visualisations were tested through guided interviews with patients with blood pressure problems. Results showed that participants prefered different visualisations for different goals, and enjoyed being able to choose freely from them; participants with lower literacy but who were deeply invested in monitoring their health found tables to be the most informative visualizations; finally, participants identified colour scales as the most intuitive method to represent health status and health risk


    Combining Augmented Reality with spatially and temporally robust Historical Spatial Data Infrastructures may have the potential to provide users with interpretive and educational opportunities they otherwise would not have. Adapting research oriented historical GIS projects such as the Copper Country Historical Spatial Data Infrastructure to usage as interpretive material through the utilization of “off the shelf” augmented reality applications such as AuGeo has the potential to expand the utility and reach of that research data outside of the lab, while providing new interpretive opportunities by allowing users to see that data in its original spatial context and giving them the freedom to explore it in their own way