115 research outputs found

    Geodatification: epistemologies of a metahuman presence

    Get PDF

    Suomenkielisen geojäsentimen kehittäminen: kuinka hankkia sijaintitietoa jäsentelemättömistä tekstiaineistoista

    Get PDF
    Alati enemmän aineistoa tuotetaan ja jaetaan internetin kautta. Aineistot ovat vaihtelevia muodoiltaan, kuten verkkoartikkelien ja sosiaalisen media julkaisujen kaltaiset digitaaliset tekstit, ja niillä on usein spatiaalinen ulottuvuus. Teksteissä geospatiaalisuutta ilmaistaan paikannimien kautta, mutta tavanomaisilla paikkatietomenetelmillä ei kyetä käsittelemään tietoa epätäsmällisessä kielellisessä asussaan. Tämä on luonut tarpeen muuntaa tekstimuotoisen sijaintitiedon näkyvään muotoon, koordinaateiksi. Ongelmaa ratkaisemaan on kehitetty geojäsentimiä, jotka tunnistavat ja paikantavat paikannimet vapaista teksteistä, ja jotka oikein toimiessaan voisivat toimia paikkatiedon lähteenä maantieteellisessä tutkimuksessa. Geojäsentämistä onkin sovellettu katastrofihallinnasta kirjallisuudentutkimukseen. Merkittävässä osassa geojäsentämisen tutkimusta tutkimusaineiston kielenä on ollut englanti ja geojäsentimetkin ovat kielikohtaisia – tämä jättää pimentoon paitsi geojäsentimien kehitykseen vaikuttavat havainnot pienemmistä kielistä myös kyseisten kielten puhujien näkemykset. Maisterintutkielmassani pyrin vastaamaan kolmeen tutkimuskysymykseen: Mitkä ovat edistyneimmät geojäsentämismenetelmät? Mitkä kielelliset ja maantieteelliset monitulkintaisuudet vaikeuttavat tämän monitahoisen ongelman ratkaisua? Ja miten arvioida geojäsentimien luotettavuutta ja käytettävyyttä? Tutkielman soveltavassa osuudessa esittelen Fingerin, geojäsentimen suomen kielelle, ja kuvaan sen kehitystä sekä suorituskyvyn arviointia. Arviointia varten loin kaksi testiaineistoa, joista toinen koostuu Twitter-julkaisuista ja toinen uutisartikkeleista. Finger-geojäsennin, testiaineistot ja relevantit ohjelmakoodit jaetaan avoimesti. Geojäsentäminen voidaan jakaa kahteen alitehtävään: paikannimien tunnistamiseen tekstivirrasta ja paikannimien ratkaisemiseen oikeaan koordinaattipisteeseen mahdollisesti useasta kandidaatista. Molemmissa vaiheissa uusimmat metodit nojaavat syväoppimismalleihin ja -menetelmiin, joiden syötteinä ovat sanaupotusten kaltaiset vektorit. Geojäsentimien suoriutumista testataan aineistoilla, joissa paikannimet ja niiden koordinaatit tiedetään. Mittatikkuna tunnistamisessa on vastaavuus ja ratkaisemisessa etäisyys oikeasta sijainnista. Finger käyttää paikannimitunnistinta, joka hyödyntää suomenkielistä BERT-kielimallia, ja suoraviivaista tietokantahakua paikannimien ratkaisemiseen. Ohjelmisto tuottaa taulukkomuotoiseksi jäsenneltyä paikkatietoa, joka sisältää syötetekstit ja niistä mahdollisesti tunnistetut paikannimet koordinaattisijainteineen. Testiaineistot eroavat aihepiireiltään, mutta Finger suoriutuu niillä likipitäen samoin, ja suoriutuu englanninkielisillä aineistoilla tehtyihin arviointeihin suhteutettuna kelvollisesti. Virheanalyysi paljastaa useita virhelähteitä, jotka johtuvat kielten tai maantieteellisen todellisuuden luontaisesta epäselvyydestä tai ovat prosessoinnin aiheuttamia, kuten perusmuotoistamisvirheet. Kaikkia osia Fingerissä voidaan parantaa, muun muassa kehittämällä kielellistä käsittelyä pidemmälle ja luomalla kattavampia testiaineistoja. Samoin tulevaisuuden geojäsentimien tulee kyetä käsittelemään monimutkaisempia kielellisiä ja maantieteellisiä kuvaustapoja kuin pelkät paikannimet ja koordinaattipisteet. Finger ei nykymuodossaan tuota valmista paikkatietoa, jota kannattaisi kritiikittä käyttää. Se on kuitenkin lupaava ensiaskel suomen kielen geojäsentimille ja astinlauta vastaisuuden soveltavalle tutkimukselle.Ever more data is available and shared through the internet. The big data masses often have a spatial dimension and can take many forms, one of which are digital texts, such as articles or social media posts. The geospatial links in these texts are made through place names, also called toponyms, but traditional GIS methods are unable to deal with the fuzzy linguistic information. This creates the need to transform the linguistic location information to an explicit coordinate form. Several geoparsers have been developed to recognize and locate toponyms in free-form texts: the task of these systems is to be a reliable source of location information. Geoparsers have been applied to topics ranging from disaster management to literary studies. Major language of study in geoparser research has been English and geoparsers tend to be language-specific, which threatens to leave the experiences provided by studying and expressed in smaller languages unexplored. This thesis seeks to answer three research questions related to geoparsing: What are the most advanced geoparsing methods? What linguistic and geographical features complicate this multi-faceted problem? And how to evaluate the reliability and usability of geoparsers? The major contributions of this work are an open-source geoparser for Finnish texts, Finger, and two test datasets, or corpora, for testing Finnish geoparsers. One of the datasets consists of tweets and the other of news articles. All of these resources, including the relevant code for acquiring the test data and evaluating the geoparser, are shared openly. Geoparsing can be divided into two sub-tasks: recognizing toponyms amid text flows and resolving them to the correct coordinate location. Both tasks have seen a recent turn to deep learning methods and models, where the input texts are encoded as, for example, word embeddings. Geoparsers are evaluated against gold standard datasets where toponyms and their coordinates are marked. Performance is measured on equivalence and distance-based metrics for toponym recognition and resolution respectively. Finger uses a toponym recognition classifier built on a Finnish BERT model and a simple gazetteer query to resolve the toponyms to coordinate points. The program outputs structured geodata, with input texts and the recognized toponyms and coordinate locations. While the datasets represent different text types in terms of formality and topics, there is little difference in performance when evaluating Finger against them. The overall performance is comparable to the performance of geoparsers of English texts. Error analysis reveals multiple error sources, caused either by the inherent ambiguousness of the studied language and the geographical world or are caused by the processing itself, for example by the lemmatizer. Finger can be improved in multiple ways, such as refining how it analyzes texts and creating more comprehensive evaluation datasets. Similarly, the geoparsing task should move towards more complex linguistic and geographical descriptions than just toponyms and coordinate points. Finger is not, in its current state, a ready source of geodata. However, the system has potential to be the first step for geoparsers for Finnish and it can be a steppingstone for future applied research

    APREGOAR: Development of a geospatial database applied to local news in Lisbon

    Get PDF
    Project Work presented as the partial requirement for obtaining a Master's degree in Geographic Information Systems and ScienceHá informações valiosas em formato de texto não estruturado sobre a localização, calendarização e a essências dos eventos disponíveis no conteúdo de notícias digitais. Vários trabalhos em curso já tentam extrair detalhes de eventos de fontes de notícias digitais, mas muitas vezes não com a nuance necssária para representar com precisão onde as coisas realmente acontecem. Alternativamente, os jornalistas poderiam associar manualmente atributos a eventos descritos nos seus artigos enquanto publicam, melhorando a exatidão e a confiança nestes atributos espaciais e temporais. Estes atributos poderiam então estar imediatamente disponíveis para avaliar a cobertura temática, temporal e espacial do conteúdo de uma agência, bem como melhorar a experiência do utilizador na exploração do conteúdo, fornecendo dimensões adicionais que podem ser filtradas. Embora a tecnologia de atribuição de dimensões geoespaciais e temporais para o emprego de aplicaçãoes voltadas para o consumidor não seja novidade, tem ainda de ser aplicada à escala das notícias. Além disso, a maioria dos sistemas existentes suporta apenas uma definição pontual da localização dos artigos, que pode não representar bem o(s) local(is) real(ais) dos eventos descritos. Este trabalho define uma aplicação web de código aberto e uma base de dados espacial subjacente que suporta i) a associação de múltiplos polígonos a representar o local onde cada evento ocorre, os prazos associados aos eventos, em linha com os atributos temáticos tradicionais associados aos artigos de notícias; ii) a contextualização de cada artigo através da adição de mapas de eventos em linha para esclarecer aos leitores onde os eventos do artigo ocorrem; e iii) a exploração dos corpora adicionados através de filtros temáticos, espaciais e temporais que exibem os resultados em mapas de cobertura interactivos e listas de artigos e eventos. O projeto foi aplicado na área da grande Lisboa de Portugal. Para além da funcionalidade acima referida, este projeto constroi gazetteers progressivos que podem ser reutilizados como associações de lugares, ou para uma meta-análise mais aprofundada do lugar, tal como é percebido coloquialmente. Demonstra a facilidade com que estas dimensões adicionais podem ser incorporadas com grade confiança na precisão da definição, geridas, e alavancadas para melhorar a gestão de conteúdo das agências noticiosas, a compreensão dos leitores, a exploração dos investigadores, ou extraídas para combinação com outros conjuntos dos dados para fornecer conhecimentos adicionais.There is valuable information in unstructured text format about the location, timing, and nature of events available in digital news content. Several ongoing efforts already attempt to extract event details from digital news sources, but often not with the nuance needed to accurately represent the where things actually happen. Alternatively, journalists could manually associate attributes to events described in their articles while publishing, improving accuracy and confidence in these spatial and temporal attributes. These attributes could then be immediately available for evaluating thematic, temporal, and spatial coverage of an agency’s content, as well as improve the user experience of content exploration by providing additional dimensions that can be filtered. Though the technology of assigning geospatial and temporal dimensions for the employ of consumer-facing applications is not novel, it has yet to be applied at scale to the news. Additionally, most existing systems only support a single point definition of article locations, which may not well represent the actual place(s) of events described within. This work defines an open source web application and underlying spatial database that supports i) the association of multiple polygons representing where each event occurs, time frames associated with the events, inline with the traditional thematic attributes associated with news articles; ii) the contextualization of each article via the addition of inline event maps to clarify to readers where the events of the article occur; and iii) the exploration of the added corpora via thematic, spatial, and temporal filters that display results in interactive coverage maps and lists of articles and events. The project was applied to the greater Lisbon area of Portugal. In addition to the above functionality, this project builds progressive gazetteers that can be reused as place associations, or for further meta analysis of place as it is colloquially understood. It demonstrates the ease of which these additional dimensions may be incorporated with a high confidence in definition accuracy, managed, and leveraged to improve news agency content management, reader understanding, researcher exploration, or extracted for combination with other datasets to provide additional insights

    Parsing Perceptions of Place: Locative and Textual Representations of Place Émilie-Gamelin on Twitter

    Get PDF
    We increasingly engage in geographies mediated by social media, which is changing how we experience and produce places. This raises questions about how ‘place’ is conceived and received in networked virtual spaces. Place has remained difficult to grasp in both geography and communications studies that utilize social media data. To attend to this, I first develop a conceptual framework that bridges the phenomenology of spatiality with the communication of place. I then present a case study of Place Émilie-Gamelin in Montreal: a plaza located atop the city’s busiest transit hub. Despite its geographic centrality, it is a liminal space appropriated by marginalized groups and contentious political movements. Since 2015, it has been subject to a city-led revitalization program with intentions of attracting party-goers and tourists. Using a communications geography framework, I collected a year’s worth of tweets, first, employing a filter to capture georeferenced tweets in and around the study site, and second, using the site’s toponyms to retrieve tweets through textual queries. To understand these representations, I coded them by relevance, theme and communicative function. Results showed a place evolving in scope, name and meaning, reflecting diverging flows and uses. I found that there were more textual connotations of the study site than there were geotweets, and that the former were more diverse in their representation of place. The thesis demonstrates how promotional content on Twitter should be more critically analyzed in concert with expressive and descriptive tweets and geotweets, and that this implies spatial ontologies and data collection methods that consider a place on social media as a discursive construction. This is especially so since Twitter has become increasingly ‘platial’ through internal changes and its entwinement with other social media platforms: changes which require consideration in all Twitter-based spatial and textual analyses. The study provides an updated perspective on Twitter’s use in the spatial humanities, GIScience and geography and contributes to those interested in applying more nuanced cartographies of places

    Parsing Perceptions of Place: Locative and Textual Representations of Place Émilie-Gamelin on Twitter

    Get PDF
    We increasingly engage in geographies mediated by social media, which is changing how we experience and produce places. This raises questions about how ‘place’ is conceived and received in networked virtual spaces. Place has remained difficult to grasp in both geography and communications studies that utilize social media data. To attend to this, I first develop a conceptual framework that bridges the phenomenology of spatiality with the communication of place. I then present a case study of Place Émilie-Gamelin in Montreal: a plaza located atop the city’s busiest transit hub. Despite its geographic centrality, it is a liminal space appropriated by marginalized groups and contentious political movements. Since 2015, it has been subject to a city-led revitalization program with intentions of attracting party-goers and tourists. Using a communications geography framework, I collected a year’s worth of tweets, first, employing a filter to capture georeferenced tweets in and around the study site, and second, using the site’s toponyms to retrieve tweets through textual queries. To understand these representations, I coded them by relevance, theme and communicative function. Results showed a place evolving in scope, name and meaning, reflecting diverging flows and uses. I found that there were more textual connotations of the study site than there were geotweets, and that the former were more diverse in their representation of place. The thesis demonstrates how promotional content on Twitter should be more critically analyzed in concert with expressive and descriptive tweets and geotweets, and that this implies spatial ontologies and data collection methods that consider a place on social media as a discursive construction. This is especially so since Twitter has become increasingly ‘platial’ through internal changes and its entwinement with other social media platforms: changes which require consideration in all Twitter-based spatial and textual analyses. The study provides an updated perspective on Twitter’s use in the spatial humanities, GIScience and geography and contributes to those interested in applying more nuanced cartographies of places

    Multimodal Content Delivery for Geo-services

    Get PDF
    This thesis describes a body of work carried out over several research projects in the area of multimodal interaction for location-based services. Research in this area has progressed from using simulated mobile environments to demonstrate the visual modality, to the ubiquitous delivery of rich media using multimodal interfaces (geo- services). To effectively deliver these services, research focused on innovative solutions to real-world problems in a number of disciplines including geo-location, mobile spatial interaction, location-based services, rich media interfaces and auditory user interfaces. My original contributions to knowledge are made in the areas of multimodal interaction underpinned by advances in geo-location technology and supported by the proliferation of mobile device technology into modern life. Accurate positioning is a known problem for location-based services, contributions in the area of mobile positioning demonstrate a hybrid positioning technology for mobile devices that uses terrestrial beacons to trilaterate position. Information overload is an active concern for location-based applications that struggle to manage large amounts of data, contributions in the area of egocentric visibility that filter data based on field-of-view demonstrate novel forms of multimodal input. One of the more pertinent characteristics of these applications is the delivery or output modality employed (auditory, visual or tactile). Further contributions in the area of multimodal content delivery are made, where multiple modalities are used to deliver information using graphical user interfaces, tactile interfaces and more notably auditory user interfaces. It is demonstrated how a combination of these interfaces can be used to synergistically deliver context sensitive rich media to users - in a responsive way - based on usage scenarios that consider the affordance of the device, the geographical position and bearing of the device and also the location of the device
    corecore