Search CORE

1,007 research outputs found

Automating the production of map interfaces for digital collections using Google API's

Author: Neatrour Anna
Publication venue: D-Lib Magazine
Publication date: 01/01/2011
Field of study

Journal ArticleMany digital Libraries are interested in taking advantage of the GIS mapping capabilities provided by GoogLe Maps and GoogLe Earth. The DigitaL Ventures Division of the University of Utah J. Willard Marriott Library has successfully completed an innovative automated process in which descriptive metadata in the form of pLace names was used to determine Latitude and Longitude coordinates for digitaL collection items. By enhancing digitaL collection metadata in this fashion, hundreds of records were updated without data entry from project staff. This article wiLL provide an overview of using the GoogLe application programming interface (API) to return geographic coordinate data, the scripting process with XML digitaL collection data, and the use of online tooLs and Microsoft Excel to upLoad digitaL collection data to GoogLe Earth and Google Maps. The ability to automate metadata changes opens up a variety of possibilities for digitaL Library administrators and collection managers

The University of Utah: J. Willard Marriott Digital Library

A summary of geospatial initiatives in the University of Utah's Marriott Library

Author: Rockwell Kenneth W.
Publication venue: Western Association of Map Libraries
Publication date: 01/03/2013
Field of study

journal articleAbstract: The Marriott Library's Geospatial Initiatives Committee consists of librarians and staff involved in projects designed to provide access to different library resources through geospatial interfaces. We are creating maps that link to resources in our digital collections, including the Western Soundscape Archives and historical photographs, and applying georeferencing to scanned geological thesis maps to manipulate them with Google Earth. The library's home page now has a clickable map for accessing digital collections by county, and we are working with a Geography professor on creating a "Historical GIS" that utilizes Sanborn fire insurance maps of Salt Lake City and recreates the downtown area as it appeared a century ago. To pull these various projects together, we set up a geospatial portal through CampusGuides. See: http://campusguides.lib.utah.edu/GI

The University of Utah: J. Willard Marriott Digital Library

Text Mining with HathiTrust: Empowering Librarians to Support Digital Scholarship Research

Author: Dickson Koehl Eleanor
Publication venue: Digital USD
Publication date: 29/04/2019
Field of study

This workshop will introduce attendees to text analysis research and the common methods and tools used in this emerging area of scholarship, with particular attention to the HathiTrust Research Center. The workshop\u27s train the trainer curriculum will provide a framework for how librarians can support text data mining, as well as teach transferable skills useful for many other areas of digital scholarly inquiry. Topics include: introduction to gathering, managing, analyzing, and visualizing textual data; hands-on experience with text analysis tools, including the HTRC\u27s off-the-shelf algorithms and datasets, such as the HTRC Extracted Features; and using the command line to run basic text analysis processes. No experience necessary! Attendees must bring a laptop

University of San Diego

Enhancing Geospatial Data: Collecting and Visualising User-Generated Content Through Custom Toolkits and Cloud Computing Workflows

Author: Gray Steven James
Publication venue: UCL (University College London)
Publication date: 28/04/2023
Field of study

Through this thesis we set the hypothesis that, via the creation of a set of custom toolkits, using cloud computing, online user-generated content, can be extracted from emerging large-scale data sets, allowing the collection, analysis and visualisation of geospatial data by social scientists. By the use of a custom-built suite of software, known as the ‘BigDataToolkit’, we examine the need and use of cloud computing and custom workflows to open up access to existing online data as well as setting up processes to enable the collection of new data. We examine the use of the toolkit to collect large amounts of data from various online sources, such as Social Media Application Programming Interfaces (APIs) and data stores, to visualise the data collected in real-time. Through the execution of these workflows, this thesis presents an implementation of a smart collector framework to automate the collection process to significantly increase the amount of data that can be obtained from the standard API endpoints. By the use of these interconnected methods and distributed collection workflows, the final system is able to collect and visualise a larger amount of data in real time than single system data collection processes used within traditional social media analysis. Aimed at allowing researchers without a core understanding of the intricacies of computer science, this thesis provides a methodology to open up new data sources to not only academics but also wider participants, allowing the collection of user-generated geographic and textual content, en masse. A series of case studies are provided, covering applications from the single researcher collecting data through to collection via the use of televised media. These are examined in terms of the tools created and the opportunities opened, allowing real-time analysis of data, collected via the use of the developed toolkit

UCL Discovery

Distributional semantic modeling: a revised technique to train term/word vector space models applying the ontology-related approach

Author: Malakhov Kyrylo
Palagin Oleksandr
Shchurov Oleksandr
Velychko Vitalii
Publication venue
Publication date: 01/01/2020
Field of study

We design a new technique for the distributional semantic modeling with a neural network-based approach to learn distributed term representations (or term embeddings) - term vector space models as a result, inspired by the recent ontology-related approach (using different types of contextual knowledge such as syntactic knowledge, terminological knowledge, semantic knowledge, etc.) to the identification of terms (term extraction) and relations between them (relation extraction) called semantic pre-processing technology - SPT. Our method relies on automatic term extraction from the natural language texts and subsequent formation of the problem-oriented or application-oriented (also deeply annotated) text corpora where the fundamental entity is the term (includes non-compositional and compositional terms). This gives us an opportunity to changeover from distributed word representations (or word embeddings) to distributed term representations (or term embeddings). This transition will allow to generate more accurate semantic maps of different subject domains (also, of relations between input terms - it is useful to explore clusters and oppositions, or to test your hypotheses about them). The semantic map can be represented as a graph using Vec2graph - a Python library for visualizing word embeddings (term embeddings in our case) as dynamic and interactive graphs. The Vec2graph library coupled with term embeddings will not only improve accuracy in solving standard NLP tasks, but also update the conventional concept of automated ontology development. The main practical result of our work is the development kit (set of toolkits represented as web service APIs and web application), which provides all necessary routines for the basic linguistic pre-processing and the semantic pre-processing of the natural language texts in Ukrainian for future training of term vector space models.Comment: In English, 9 pages, 2 figures. Not published yet. Prepared for special issue (UkrPROG 2020 conference) of the scientific journal "Problems in programming" (Founder: National Academy of Sciences of Ukraine, Institute of Software Systems of NAS Ukraine

arXiv.org e-Print Archive

Наукова електронна бібліотека періодичних видань НАН України (Vernadsky National Library of Ukraine)

Use existing data first: Reconcile metadata before creating new controlled vocabularies

Author: Myntti Jeremy
Neatrour Anna
Publication venue: Taylor & Francis Online
Publication date: 01/01/2015
Field of study

pre-printThe use of controlled vocabularies is essential in the creation of metadata for digital collections in order to provide consistency and ease of use for patrons and researchers. The University of Utah has been working to clean up metadata for digital collections to ensure that data adheres to best practices with the use of specific, controlled vocabularies. This has included a major data-cleanup project utilizing multiple approaches including a vendor's authority control service, data reconciliation in OpenRefine, and the exploration of different tools used for the creation and maintenance of local controlled vocabularies

The University of Utah: J. Willard Marriott Digital Library

Railroads and the Making of Modern America -- Tools for Spatio-Temporal Correlation, Analysis, and Visualization

Author: Richard Healey
Richard Healey
William G. Thomas
William G. Thomas
Publication venue: 'Modern Language Association'
Publication date: 01/01/2011
Field of study

This project aims to integrate large-scale data sources from the Digging into Data repositories with other types of relevant data on the railroad system, already assembled by the project directors. Our project seeks to develop useful tools for spatio-temporal visualization of these data and the relationships among them. Our interdisciplinary team includes computer science, history, and geography researchers. Because the railroad "system" and its spatio-temporal configuration appeared differently from locality-to-locality and region-to-region, we need to adjust how we "locate" and "see" the system. By applying data mining and pattern recognition techniques, software systems can be created that dynamically redefine the way spatial data are represented. Utilizing processes common to analysis in Computer Science, we propose to develop a software framework that allows these embedded concepts to be visualized and further studied

Humanities Commons

Enhancing marine industry risk management through semantic reconciliation of underwater IoT data streams

Author: Boniface Michael
Correndo Gianluca
Crowle Simon
Papay Juri
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

The “Rio+20” United Nations Conference on Sustainable Development (UNCSD) focused on the "Green economy" as the main concept to fight poverty and achieve a sustainable way to feed the planet. For coastal countries, this concept translates into "Blue economy", the sustainable exploitation of marine environments to fulfill humanity needs for resources, energy, and food. This puts a stress on marine industries to better articulate their processes to gain and share knowledge of different marine habitats, and to reevaluate the data value chains established in the past and to support a data fueled market that is going only to in the near future.The EXPOSURES project is working in conjunction with the SUNRISE project to establish a new marine information ecosystem and demonstrate how the ‘Internet of Things’ (IoT) can be exploited for marine applications. In particular EXPOSURES engaged with the community of stakeholders in order to identify a new data value chain which includes IoT data providers, data analysts, and harbor authorities. Moreover we integrated the key technological assets that couple OGC standards for raster data management and manipulation and semantic technologies to better manage data assets.This paper presents the identified data value chain along with the use cases for validating it, and the system developed to semantically reconcile and manage such data collections

Southampton (e-Prints Soton)

Web Data Extraction, Applications and Techniques: A Survey

Author: Abel
Amalfitano
Balduzzi
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Berger
Berthold
Bettencourt
Califf
Catanese
Chang
Chen
Chen
Chen
Collins
Conover
Crandall
Crescenzi
Crescenzi
Dalvi
Dalvi
De Meo
De Meo
Doan
Emilio Ferrara
Ferrara
Ferrara
Ferrara
Ferrara
Ferrara
Flesca
Freitag
Furche
Gatterbauer
Gatterbauer
Giacomo Fiumara
Gjoka
Gkotsis
Gottlob
Gottlob
Hammersley
Han
Hecht
Hsu
Irmak
Khare
Kim
Kinsella
Kleinberg
Kleinberg
Kohlschütter
Kokkoras
Kokkoras
Kokkoras
Krüpl
Kushmerick
Kwak
Laender
Liu
Manning
Masanès
Mathes
Meng
Mislove
Monge
Muslea
Oro
Pan
Pasquale De Meo
Perito
Phan
Plake
Rahm
Rahm
Reis
Robert Baumgartner
Sahuguet
Sarawagi
Schifanella
Selkow
Shi
Soderland
Szomszor
Turmo
Vosecky
Wang
Wang
Weikum
Wilson
Winograd
Yang
Ye
Zafarani
Zanasi
Zhai
Zhang
Zhang
Publication venue: 'Elsevier BV'
Publication date: 09/06/2014
Field of study

Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction. This survey aims at providing a structured and comprehensive overview of the literature in the field of Web Data Extraction. We provided a simple classification framework in which existing Web Data Extraction applications are grouped into two main classes, namely applications at the Enterprise level and at the Social Web level. At the Enterprise level, Web Data Extraction techniques emerge as a key tool to perform data analysis in Business and Competitive Intelligence systems as well as for business process re-engineering. At the Social Web level, Web Data Extraction techniques allow to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users and this offers unprecedented opportunities to analyze human behavior at a very large scale. We discuss also the potential of cross-fertilization, i.e., on the possibility of re-using Web Data Extraction techniques originally designed to work in a given domain, in other domains.Comment: Knowledge-based System

arXiv.org e-Print Archive

Crossref

Complete LibTech 2013 Print Program

Author: Library Technology Conference
Publication venue: DigitalCommons@Macalester College
Publication date: 20/03/2013
Field of study

PDF of the complete print program from the 2013 Library Technology Conferenc

DigitalCommons@Macalester College