83 research outputs found

    Historical collaborative geocoding

    Full text link
    The latest developments in digital have provided large data sets that can increasingly easily be accessed and used. These data sets often contain indirect localisation information, such as historical addresses. Historical geocoding is the process of transforming the indirect localisation information to direct localisation that can be placed on a map, which enables spatial analysis and cross-referencing. Many efficient geocoders exist for current addresses, but they do not deal with the temporal aspect and are based on a strict hierarchy (..., city, street, house number) that is hard or impossible to use with historical data. Indeed historical data are full of uncertainties (temporal aspect, semantic aspect, spatial precision, confidence in historical source, ...) that can not be resolved, as there is no way to go back in time to check. We propose an open source, open data, extensible solution for geocoding that is based on the building of gazetteers composed of geohistorical objects extracted from historical topographical maps. Once the gazetteers are available, geocoding an historical address is a matter of finding the geohistorical object in the gazetteers that is the best match to the historical address. The matching criteriae are customisable and include several dimensions (fuzzy semantic, fuzzy temporal, scale, spatial precision ...). As the goal is to facilitate historical work, we also propose web-based user interfaces that help geocode (one address or batch mode) and display over current or historical topographical maps, so that they can be checked and collaboratively edited. The system is tested on Paris city for the 19-20th centuries, shows high returns rate and is fast enough to be used interactively.Comment: WORKING PAPE

    Improving the geospatial consistency of digital libraries metadata

    Get PDF
    Consistency is an essential aspect of the quality of metadata. Inconsistent metadata records are harmful: given a themed query, the set of retrieved metadata records would contain descriptions of unrelated or irrelevant resources, and may even not contain some resources considered obvious. This is even worse when the description of the location is inconsistent. Inconsistent spatial descriptions may yield invisible or hidden geographical resources that cannot be retrieved by means of spatially themed queries. Therefore, ensuring spatial consistency should be a primary goal when reusing, sharing and developing georeferenced digital collections. We present a methodology able to detect geospatial inconsistencies in metadata collections based on the combination of spatial ranking, reverse geocoding, geographic knowledge organization systems and information-retrieval techniques. This methodology has been applied to a collection of metadata records describing maps and atlases belonging to the Library of Congress. The proposed approach was able to automatically identify inconsistent metadata records (870 out of 10,575) and propose fixes to most of them (91.5%) These results support the ability of the proposed methodology to assess the impact of spatial inconsistency in the retrievability and visibility of metadata records and improve their spatial consistency

    Improvements in the geocoding process in organizational environments

    Get PDF
    Dissertação para obtenção do Grau de Mestre em Engenharia InformáticaThe current geocoding technologies are only able to handle addresses which fit a general case for the location in consideration. The more edge-case addresses are mostly ignored or wrongly geocoded, leading to imprecision and errors in the results obtained. To try overcoming this problem the current geocoding services accompany their results with confidence values, but the values and scales used vary between services, and are hard to understand by users without knowledge in the area, and, as we discovered, are not truly to be trusted. Novabase aims to make available to organizations a geocoding service which allows the improvement of the quality of the results obtained by mainstream geocoding services such as Google and Bing. The objective is to give quality results in the cases where we can act and, not being able to do, falling back to the results of other geocoding services. We pretend to handle addresses in areas where results are of inferior quality, either because the areas are not fully covered by the services or because those same services are not prepared to handle the address formats which do not match the general case (one example are addresses which are numbered by the use of Lotes). The geocoding is executed in two steps. The first one matches the address with a knowledge base owned by organizations, in which we assume full trust of the quality. If the knowledge base returns a valid result, it is output with maximum confidence. When it fails, we fall back to using the mainstream geocoding services, and use their results for output

    Wrapper Maintenance: A Machine Learning Approach

    Full text link
    The proliferation of online information sources has led to an increased use of wrappers for extracting data from Web sources. While most of the previous research has focused on quick and efficient generation of wrappers, the development of tools for wrapper maintenance has received less attention. This is an important research problem because Web sources often change in ways that prevent the wrappers from extracting data correctly. We present an efficient algorithm that learns structural information about data from positive examples alone. We describe how this information can be used for two wrapper maintenance applications: wrapper verification and reinduction. The wrapper verification system detects when a wrapper is not extracting correct data, usually because the Web source has changed its format. The reinduction algorithm automatically recovers from changes in the Web source by identifying data on Web pages so that a new wrapper may be generated for this source. To validate our approach, we monitored 27 wrappers over a period of a year. The verification algorithm correctly discovered 35 of the 37 wrapper changes, and made 16 mistakes, resulting in precision of 0.73 and recall of 0.95. We validated the reinduction algorithm on ten Web sources. We were able to successfully reinduce the wrappers, obtaining precision and recall values of 0.90 and 0.80 on the data extraction task

    Automatic Generation of Geospatial Metadata for Web Resources

    Get PDF
    Web resources that are not part of any Spatial Data Infrastructure can be an important source of information. However, the incorporation of Web resources within a Spatial Data Infrastructure requires a significant effort to create metadata. This work presents an extensible architecture for an automatic characterisation of Web resources and a strategy for assignation of their geographic scope. The implemented prototype generates automatically geospatial metadata for Web pages. The metadata model conforms to the Common Element Set, a set of core properties, which is encouraged by the OGC Catalogue Service Specification to permit the minimal implementation of a catalogue service independent of an application profile. The performed experiments consisted in the creation of metadata for Web pages of providers of Geospatial Web resources. The Web pages have been gathered by a Web crawler focused on OGC Web Services. The manual revision of the results has shown that the coverage estimation method applied produces acceptable results for more than 80% of tested Web resources

    A large-scale study of fashion influencers on Twitter

    Get PDF
    The rise of social media has changed the nature of the fashion industry. Influence is no longer concentrated in the hands of an elite few: social networks distribute power across a broad set of tastemakers; trends are driven bottom-up and top-down; and designers, retailers, and consumers are regularly inundated with new styles and looks. This thesis presents a large-scale study of fashion influencers on Twitter and proposes a fashion graph visualization dashboard to explore the social interactions between these Twitter accounts. Leveraging a dataset of 11.5k Twitter fashion accounts, a content-based classifier was trained to predict which accounts are fashion-centric. With the classifier, I identified more than 300k fashion-related accounts through a snowball crawling and then defined a stable group of 1000 influencers as the fashion core. I further human-labeled these influencers’ Twitter accounts and mine their recent tweets. Finally, I built a fashion graph visualization dashboard that allows users to visualize the interactions and relationships between individuals, brands, and media influencers
    • …
    corecore