115 research outputs found
Suomenkielisen geojäsentimen kehittäminen: kuinka hankkia sijaintitietoa jäsentelemättömistä tekstiaineistoista
Alati enemmän aineistoa tuotetaan ja jaetaan internetin kautta. Aineistot ovat vaihtelevia muodoiltaan, kuten verkkoartikkelien ja sosiaalisen media julkaisujen kaltaiset digitaaliset tekstit, ja niillä on usein spatiaalinen ulottuvuus. Teksteissä geospatiaalisuutta ilmaistaan paikannimien kautta, mutta tavanomaisilla paikkatietomenetelmillä ei kyetä käsittelemään tietoa epätäsmällisessä kielellisessä asussaan. Tämä on luonut tarpeen muuntaa tekstimuotoisen sijaintitiedon näkyvään muotoon, koordinaateiksi. Ongelmaa ratkaisemaan on kehitetty geojäsentimiä, jotka tunnistavat ja paikantavat paikannimet vapaista teksteistä, ja jotka oikein toimiessaan voisivat toimia paikkatiedon lähteenä maantieteellisessä tutkimuksessa. Geojäsentämistä onkin sovellettu katastrofihallinnasta kirjallisuudentutkimukseen. Merkittävässä osassa geojäsentämisen tutkimusta tutkimusaineiston kielenä on ollut englanti ja geojäsentimetkin ovat kielikohtaisia – tämä jättää pimentoon paitsi geojäsentimien kehitykseen vaikuttavat havainnot pienemmistä kielistä myös kyseisten kielten puhujien näkemykset.
Maisterintutkielmassani pyrin vastaamaan kolmeen tutkimuskysymykseen: Mitkä ovat edistyneimmät geojäsentämismenetelmät? Mitkä kielelliset ja maantieteelliset monitulkintaisuudet vaikeuttavat tämän monitahoisen ongelman ratkaisua? Ja miten arvioida geojäsentimien luotettavuutta ja käytettävyyttä? Tutkielman soveltavassa osuudessa esittelen Fingerin, geojäsentimen suomen kielelle, ja kuvaan sen kehitystä sekä suorituskyvyn arviointia. Arviointia varten loin kaksi testiaineistoa, joista toinen koostuu Twitter-julkaisuista ja toinen uutisartikkeleista. Finger-geojäsennin, testiaineistot ja relevantit ohjelmakoodit jaetaan avoimesti.
Geojäsentäminen voidaan jakaa kahteen alitehtävään: paikannimien tunnistamiseen tekstivirrasta ja paikannimien ratkaisemiseen oikeaan koordinaattipisteeseen mahdollisesti useasta kandidaatista. Molemmissa vaiheissa uusimmat metodit nojaavat syväoppimismalleihin ja -menetelmiin, joiden syötteinä ovat sanaupotusten kaltaiset vektorit. Geojäsentimien suoriutumista testataan aineistoilla, joissa paikannimet ja niiden koordinaatit tiedetään. Mittatikkuna tunnistamisessa on vastaavuus ja ratkaisemisessa etäisyys oikeasta sijainnista.
Finger käyttää paikannimitunnistinta, joka hyödyntää suomenkielistä BERT-kielimallia, ja suoraviivaista tietokantahakua paikannimien ratkaisemiseen. Ohjelmisto tuottaa taulukkomuotoiseksi jäsenneltyä paikkatietoa, joka sisältää syötetekstit ja niistä mahdollisesti tunnistetut paikannimet koordinaattisijainteineen. Testiaineistot eroavat aihepiireiltään, mutta Finger suoriutuu niillä likipitäen samoin, ja suoriutuu englanninkielisillä aineistoilla tehtyihin arviointeihin suhteutettuna kelvollisesti. Virheanalyysi paljastaa useita virhelähteitä, jotka johtuvat kielten tai maantieteellisen todellisuuden luontaisesta epäselvyydestä tai ovat prosessoinnin aiheuttamia, kuten perusmuotoistamisvirheet.
Kaikkia osia Fingerissä voidaan parantaa, muun muassa kehittämällä kielellistä käsittelyä pidemmälle ja luomalla kattavampia testiaineistoja. Samoin tulevaisuuden geojäsentimien tulee kyetä käsittelemään monimutkaisempia kielellisiä ja maantieteellisiä kuvaustapoja kuin pelkät paikannimet ja koordinaattipisteet. Finger ei nykymuodossaan tuota valmista paikkatietoa, jota kannattaisi kritiikittä käyttää. Se on kuitenkin lupaava ensiaskel suomen kielen geojäsentimille ja astinlauta vastaisuuden soveltavalle tutkimukselle.Ever more data is available and shared through the internet. The big data masses often have a spatial dimension and can take many forms, one of which are digital texts, such as articles or social media posts. The geospatial links in these texts are made through place names, also called toponyms, but traditional GIS methods are unable to deal with the fuzzy linguistic information. This creates the need to transform the linguistic location information to an explicit coordinate form. Several geoparsers have been developed to recognize and locate toponyms in free-form texts: the task of these systems is to be a reliable source of location information. Geoparsers have been applied to topics ranging from disaster management to literary studies. Major language of study in geoparser research has been English and geoparsers tend to be language-specific, which threatens to leave the experiences provided by studying and expressed in smaller languages unexplored.
This thesis seeks to answer three research questions related to geoparsing: What are the most advanced geoparsing methods? What linguistic and geographical features complicate this multi-faceted problem? And how to evaluate the reliability and usability of geoparsers? The major contributions of this work are an open-source geoparser for Finnish texts, Finger, and two test datasets, or corpora, for testing Finnish geoparsers. One of the datasets consists of tweets and the other of news articles. All of these resources, including the relevant code for acquiring the test data and evaluating the geoparser, are shared openly.
Geoparsing can be divided into two sub-tasks: recognizing toponyms amid text flows and resolving them to the correct coordinate location. Both tasks have seen a recent turn to deep learning methods and models, where the input texts are encoded as, for example, word embeddings. Geoparsers are evaluated against gold standard datasets where toponyms and their coordinates are marked. Performance is measured on equivalence and distance-based metrics for toponym recognition and resolution respectively.
Finger uses a toponym recognition classifier built on a Finnish BERT model and a simple gazetteer query to resolve the toponyms to coordinate points. The program outputs structured geodata, with input texts and the recognized toponyms and coordinate locations. While the datasets represent different text types in terms of formality and topics, there is little difference in performance when evaluating Finger against them. The overall performance is comparable to the performance of geoparsers of English texts. Error analysis reveals multiple error sources, caused either by the inherent ambiguousness of the studied language and the geographical world or are caused by the processing itself, for example by the lemmatizer.
Finger can be improved in multiple ways, such as refining how it analyzes texts and creating more comprehensive evaluation datasets. Similarly, the geoparsing task should move towards more complex linguistic and geographical descriptions than just toponyms and coordinate points. Finger is not, in its current state, a ready source of geodata. However, the system has potential to be the first step for geoparsers for Finnish and it can be a steppingstone for future applied research
APREGOAR: Development of a geospatial database applied to local news in Lisbon
Project Work presented as the partial requirement for obtaining a Master's degree in Geographic Information Systems and ScienceHá informações valiosas em formato de texto não estruturado sobre a localização, calendarização
e a essências dos eventos disponíveis no conteúdo de notícias digitais. Vários
trabalhos em curso já tentam extrair detalhes de eventos de fontes de notícias digitais,
mas muitas vezes não com a nuance necssária para representar com precisão onde as
coisas realmente acontecem. Alternativamente, os jornalistas poderiam associar manualmente
atributos a eventos descritos nos seus artigos enquanto publicam, melhorando a
exatidão e a confiança nestes atributos espaciais e temporais. Estes atributos poderiam
então estar imediatamente disponíveis para avaliar a cobertura temática, temporal e
espacial do conteúdo de uma agência, bem como melhorar a experiência do utilizador
na exploração do conteúdo, fornecendo dimensões adicionais que podem ser filtradas.
Embora a tecnologia de atribuição de dimensões geoespaciais e temporais para o
emprego de aplicaçãoes voltadas para o consumidor não seja novidade, tem ainda de
ser aplicada à escala das notícias. Além disso, a maioria dos sistemas existentes suporta
apenas uma definição pontual da localização dos artigos, que pode não representar bem
o(s) local(is) real(ais) dos eventos descritos.
Este trabalho define uma aplicação web de código aberto e uma base de dados
espacial subjacente que suporta i) a associação de múltiplos polígonos a representar
o local onde cada evento ocorre, os prazos associados aos eventos, em linha com os
atributos temáticos tradicionais associados aos artigos de notícias; ii) a contextualização
de cada artigo através da adição de mapas de eventos em linha para esclarecer aos
leitores onde os eventos do artigo ocorrem; e iii) a exploração dos corpora adicionados
através de filtros temáticos, espaciais e temporais que exibem os resultados em mapas
de cobertura interactivos e listas de artigos e eventos.
O projeto foi aplicado na área da grande Lisboa de Portugal. Para além da funcionalidade
acima referida, este projeto constroi gazetteers progressivos que podem ser
reutilizados como associações de lugares, ou para uma meta-análise mais aprofundada
do lugar, tal como é percebido coloquialmente. Demonstra a facilidade com que estas
dimensões adicionais podem ser incorporadas com grade confiança na precisão da definição, geridas, e alavancadas para melhorar a gestão de conteúdo das agências noticiosas,
a compreensão dos leitores, a exploração dos investigadores, ou extraídas para
combinação com outros conjuntos dos dados para fornecer conhecimentos adicionais.There is valuable information in unstructured text format about the location, timing,
and nature of events available in digital news content. Several ongoing efforts already
attempt to extract event details from digital news sources, but often not with the
nuance needed to accurately represent the where things actually happen. Alternatively,
journalists could manually associate attributes to events described in their articles while
publishing, improving accuracy and confidence in these spatial and temporal attributes.
These attributes could then be immediately available for evaluating thematic, temporal,
and spatial coverage of an agency’s content, as well as improve the user experience of
content exploration by providing additional dimensions that can be filtered.
Though the technology of assigning geospatial and temporal dimensions for the
employ of consumer-facing applications is not novel, it has yet to be applied at scale to
the news. Additionally, most existing systems only support a single point definition of
article locations, which may not well represent the actual place(s) of events described
within.
This work defines an open source web application and underlying spatial database
that supports i) the association of multiple polygons representing where each event
occurs, time frames associated with the events, inline with the traditional thematic
attributes associated with news articles; ii) the contextualization of each article via the
addition of inline event maps to clarify to readers where the events of the article occur;
and iii) the exploration of the added corpora via thematic, spatial, and temporal filters
that display results in interactive coverage maps and lists of articles and events.
The project was applied to the greater Lisbon area of Portugal. In addition to the
above functionality, this project builds progressive gazetteers that can be reused as place
associations, or for further meta analysis of place as it is colloquially understood. It
demonstrates the ease of which these additional dimensions may be incorporated with a
high confidence in definition accuracy, managed, and leveraged to improve news agency
content management, reader understanding, researcher exploration, or extracted for
combination with other datasets to provide additional insights
Parsing Perceptions of Place: Locative and Textual Representations of Place Émilie-Gamelin on Twitter
We increasingly engage in geographies mediated by social media, which is changing how we experience and produce places. This raises questions about how ‘place’ is conceived and received in networked virtual spaces. Place has remained difficult to grasp in both geography and communications studies that utilize social media data. To attend to this, I first develop a conceptual framework that bridges the phenomenology of spatiality with the communication of place. I then present a case study of Place Émilie-Gamelin in Montreal: a plaza located atop the city’s busiest transit hub. Despite its geographic centrality, it is a liminal space appropriated by marginalized groups and contentious political movements. Since 2015, it has been subject to a city-led revitalization program with intentions of attracting party-goers and tourists. Using a communications geography framework, I collected a year’s worth of tweets, first, employing a filter to capture georeferenced tweets in and around the study site, and second, using the site’s toponyms to retrieve tweets through textual queries. To understand these representations, I coded them by relevance, theme and communicative function. Results showed a place evolving in scope, name and meaning, reflecting diverging flows and uses. I found that there were more textual connotations of the study site than there were geotweets, and that the former were more diverse in their representation of place. The thesis demonstrates how promotional content on Twitter should be more critically analyzed in concert with expressive and descriptive tweets and geotweets, and that this implies spatial ontologies and data collection methods that consider a place on social media as a discursive construction. This is especially so since Twitter has become increasingly ‘platial’ through internal changes and its entwinement with other social media platforms: changes which require consideration in all Twitter-based spatial and textual analyses. The study provides an updated perspective on Twitter’s use in the spatial humanities, GIScience and geography and contributes to those interested in applying more nuanced cartographies of places
Parsing Perceptions of Place: Locative and Textual Representations of Place Émilie-Gamelin on Twitter
We increasingly engage in geographies mediated by social media, which is changing how we experience and produce places. This raises questions about how ‘place’ is conceived and received in networked virtual spaces. Place has remained difficult to grasp in both geography and communications studies that utilize social media data. To attend to this, I first develop a conceptual framework that bridges the phenomenology of spatiality with the communication of place. I then present a case study of Place Émilie-Gamelin in Montreal: a plaza located atop the city’s busiest transit hub. Despite its geographic centrality, it is a liminal space appropriated by marginalized groups and contentious political movements. Since 2015, it has been subject to a city-led revitalization program with intentions of attracting party-goers and tourists. Using a communications geography framework, I collected a year’s worth of tweets, first, employing a filter to capture georeferenced tweets in and around the study site, and second, using the site’s toponyms to retrieve tweets through textual queries. To understand these representations, I coded them by relevance, theme and communicative function. Results showed a place evolving in scope, name and meaning, reflecting diverging flows and uses. I found that there were more textual connotations of the study site than there were geotweets, and that the former were more diverse in their representation of place. The thesis demonstrates how promotional content on Twitter should be more critically analyzed in concert with expressive and descriptive tweets and geotweets, and that this implies spatial ontologies and data collection methods that consider a place on social media as a discursive construction. This is especially so since Twitter has become increasingly ‘platial’ through internal changes and its entwinement with other social media platforms: changes which require consideration in all Twitter-based spatial and textual analyses. The study provides an updated perspective on Twitter’s use in the spatial humanities, GIScience and geography and contributes to those interested in applying more nuanced cartographies of places
Recommended from our members
FOSS4G 2016 Proceedings: Academic Program - selected papers and posters
This Conference Proceedings is a collection of selected papers and posters submitted to the Academic Program of the International Conference for Free and Open Source Software for Geospatial (FOSS4G 2016), 24th to 26th August 2016 in Bonn, Germany.
Like in previous FOSS4G conferences on national and international level the academic papers and posters cover an extensive wide range of topics reflecting the contribution of the academia to this field by the development of open source software components, in the design of open standards, in the proliferation of web-based solutions, in the dissemination of the open principles important in science and education, or in the collection and the hosting of freely available geo-data
Multimodal Content Delivery for Geo-services
This thesis describes a body of work carried out over several research projects in the area of multimodal interaction for location-based services. Research in this area has progressed from using simulated mobile environments to demonstrate the visual modality, to the ubiquitous delivery of rich media using multimodal interfaces (geo- services). To effectively deliver these services, research focused on innovative solutions to real-world problems in a number of disciplines including geo-location, mobile spatial interaction, location-based services, rich media interfaces and auditory user interfaces. My original contributions to knowledge are made in the areas of multimodal interaction underpinned by advances in geo-location technology and supported by the proliferation of mobile device technology into modern life. Accurate positioning is a known problem for location-based services, contributions in the area of mobile positioning demonstrate a hybrid positioning technology for mobile devices that uses terrestrial beacons to trilaterate position. Information overload is an active concern for location-based applications that struggle to manage large amounts of data, contributions in the area of egocentric visibility that filter data based on field-of-view demonstrate novel forms of multimodal input. One of the more pertinent characteristics of these applications is the delivery or output modality employed (auditory, visual or tactile). Further contributions in the area of multimodal content delivery are made, where multiple modalities are used to deliver information using graphical user interfaces, tactile interfaces and more notably auditory user interfaces. It is demonstrated how a combination of these interfaces can be used to synergistically deliver context sensitive rich media to users - in a responsive way - based on usage scenarios that consider the affordance of the device, the geographical position and bearing of the device and also the location of the device
Recommended from our members
A Digital Periegesis: Implementing Spatial Research Infrastructures for Classical History and Archaeology
Over the past ten years the spatial turn in the humanities in Scandinavia has resulted in a growing number of infrastructural projects aimed at facilitating interdisciplinary research into spatial aspects of a rich variety of materials, place-names, early modern inventories and cadastral maps, medieval literature and art, as well as Viking-Age and medieval runic inscriptions, to name just a few. This intensive development has brought about a number of challenges, as these projects differ with regard to their agendas, setups, and customized approaches to data, theories, and methods.
This volume provides the research community with an opportunity to revisit traditional research questions in the context of new infrastructural environments. Although primarily aimed at medievalists and scholars of the early modern period, the volume offers a broader spatial and temporal scope with a contribution from classical studies. The classics have in many ways pioneered the application of digital methods to narrative spatial analysis and developed strong collaborative engagement with infrastructure, producing Pelagios, an ever-growing platform for a plethora of spatial databases and gazetteers, as well as Recogito, a digital annotation tool. These two successful examples show a pressing need for community building around SRIs for early modern and medieval Scandinavia to ensure sustainable design, long-term preservation, and further collaborative developmen
- …