1,007 research outputs found
Automating the production of map interfaces for digital collections using Google API's
Journal ArticleMany digital Libraries are interested in taking advantage of the GIS mapping capabilities provided by GoogLe Maps and GoogLe Earth. The DigitaL Ventures Division of the University of Utah J. Willard Marriott Library has successfully completed an innovative automated process in which descriptive metadata in the form of pLace names was used to determine Latitude and Longitude coordinates for digitaL collection items. By enhancing digitaL collection metadata in this fashion, hundreds of records were updated without data entry from project staff. This article wiLL provide an overview of using the GoogLe application programming interface (API) to return geographic coordinate data, the scripting process with XML digitaL collection data, and the use of online tooLs and Microsoft Excel to upLoad digitaL collection data to GoogLe Earth and Google Maps. The ability to automate metadata changes opens up a variety of possibilities for digitaL Library administrators and collection managers
A summary of geospatial initiatives in the University of Utah's Marriott Library
journal articleAbstract: The Marriott Library's Geospatial Initiatives Committee consists of librarians and staff involved in projects designed to provide access to different library resources through geospatial interfaces. We are creating maps that link to resources in our digital collections, including the Western Soundscape Archives and historical photographs, and applying georeferencing to scanned geological thesis maps to manipulate them with Google Earth. The library's home page now has a clickable map for accessing digital collections by county, and we are working with a Geography professor on creating a "Historical GIS" that utilizes Sanborn fire insurance maps of Salt Lake City and recreates the downtown area as it appeared a century ago. To pull these various projects together, we set up a geospatial portal through CampusGuides. See: http://campusguides.lib.utah.edu/GI
Text Mining with HathiTrust: Empowering Librarians to Support Digital Scholarship Research
This workshop will introduce attendees to text analysis research and the common methods and tools used in this emerging area of scholarship, with particular attention to the HathiTrust Research Center. The workshop\u27s train the trainer curriculum will provide a framework for how librarians can support text data mining, as well as teach transferable skills useful for many other areas of digital scholarly inquiry. Topics include: introduction to gathering, managing, analyzing, and visualizing textual data; hands-on experience with text analysis tools, including the HTRC\u27s off-the-shelf algorithms and datasets, such as the HTRC Extracted Features; and using the command line to run basic text analysis processes. No experience necessary! Attendees must bring a laptop
Enhancing Geospatial Data: Collecting and Visualising User-Generated Content Through Custom Toolkits and Cloud Computing Workflows
Through this thesis we set the hypothesis that, via the creation of a set of custom toolkits, using cloud computing, online user-generated content, can be extracted from emerging large-scale data sets, allowing the collection, analysis and visualisation of geospatial data by social scientists. By the use of a custom-built suite of software, known as the ‘BigDataToolkit’, we examine the need and use of cloud computing and custom workflows to open up access to existing online data as well as setting up processes to enable the collection of new data. We examine the use of the toolkit to collect large amounts of data from various online sources, such as Social Media Application Programming Interfaces (APIs) and data stores, to visualise the data collected in real-time. Through the execution of these workflows, this thesis presents an implementation of a smart collector framework to automate the collection process to significantly increase the amount of data that can be obtained from the standard API endpoints. By the use of these interconnected methods and distributed collection workflows, the final system is able to collect and visualise a larger amount of data in real time than single system data collection processes used within traditional social media analysis. Aimed at allowing researchers without a core understanding of the intricacies of computer science, this thesis provides a methodology to open up new data sources to not only academics but also wider participants, allowing the collection of user-generated geographic and textual content, en masse. A series of case studies are provided, covering applications from the single researcher collecting data through to collection via the use of televised media. These are examined in terms of the tools created and the opportunities opened, allowing real-time analysis of data, collected via the use of the developed toolkit
Distributional semantic modeling: a revised technique to train term/word vector space models applying the ontology-related approach
We design a new technique for the distributional semantic modeling with a
neural network-based approach to learn distributed term representations (or
term embeddings) - term vector space models as a result, inspired by the recent
ontology-related approach (using different types of contextual knowledge such
as syntactic knowledge, terminological knowledge, semantic knowledge, etc.) to
the identification of terms (term extraction) and relations between them
(relation extraction) called semantic pre-processing technology - SPT. Our
method relies on automatic term extraction from the natural language texts and
subsequent formation of the problem-oriented or application-oriented (also
deeply annotated) text corpora where the fundamental entity is the term
(includes non-compositional and compositional terms). This gives us an
opportunity to changeover from distributed word representations (or word
embeddings) to distributed term representations (or term embeddings). This
transition will allow to generate more accurate semantic maps of different
subject domains (also, of relations between input terms - it is useful to
explore clusters and oppositions, or to test your hypotheses about them). The
semantic map can be represented as a graph using Vec2graph - a Python library
for visualizing word embeddings (term embeddings in our case) as dynamic and
interactive graphs. The Vec2graph library coupled with term embeddings will not
only improve accuracy in solving standard NLP tasks, but also update the
conventional concept of automated ontology development. The main practical
result of our work is the development kit (set of toolkits represented as web
service APIs and web application), which provides all necessary routines for
the basic linguistic pre-processing and the semantic pre-processing of the
natural language texts in Ukrainian for future training of term vector space
models.Comment: In English, 9 pages, 2 figures. Not published yet. Prepared for
special issue (UkrPROG 2020 conference) of the scientific journal "Problems
in programming" (Founder: National Academy of Sciences of Ukraine, Institute
of Software Systems of NAS Ukraine
Use existing data first: Reconcile metadata before creating new controlled vocabularies
pre-printThe use of controlled vocabularies is essential in the creation of metadata for digital collections in order to provide consistency and ease of use for patrons and researchers. The University of Utah has been working to clean up metadata for digital collections to ensure that data adheres to best practices with the use of specific, controlled vocabularies. This has included a major data-cleanup project utilizing multiple approaches including a vendor's authority control service, data reconciliation in OpenRefine, and the exploration of different tools used for the creation and maintenance of local controlled vocabularies
Railroads and the Making of Modern America -- Tools for Spatio-Temporal Correlation, Analysis, and Visualization
This project aims to integrate large-scale data sources from the Digging into Data repositories with other types of relevant data on the railroad system, already assembled by the project directors. Our project seeks to develop useful tools for spatio-temporal visualization of these data and the relationships among them. Our interdisciplinary team includes computer science, history, and geography researchers. Because the railroad "system" and its spatio-temporal configuration appeared differently from locality-to-locality and region-to-region, we need to adjust how we "locate" and "see" the system. By applying data mining and pattern recognition techniques, software systems can be created that dynamically redefine the way spatial data are represented. Utilizing processes common to analysis in Computer Science, we propose to develop a software framework that allows these embedded concepts to be visualized and further studied
Enhancing marine industry risk management through semantic reconciliation of underwater IoT data streams
The “Rio+20” United Nations Conference on Sustainable Development (UNCSD) focused on the "Green economy" as the main concept to fight poverty and achieve a sustainable way to feed the planet. For coastal countries, this concept translates into "Blue economy", the sustainable exploitation of marine environments to fulfill humanity needs for resources, energy, and food. This puts a stress on marine industries to better articulate their processes to gain and share knowledge of different marine habitats, and to reevaluate the data value chains established in the past and to support a data fueled market that is going only to in the near future.The EXPOSURES project is working in conjunction with the SUNRISE project to establish a new marine information ecosystem and demonstrate how the ‘Internet of Things’ (IoT) can be exploited for marine applications. In particular EXPOSURES engaged with the community of stakeholders in order to identify a new data value chain which includes IoT data providers, data analysts, and harbor authorities. Moreover we integrated the key technological assets that couple OGC standards for raster data management and manipulation and semantic technologies to better manage data assets.This paper presents the identified data value chain along with the use cases for validating it, and the system developed to semantically reconcile and manage such data collections
Web Data Extraction, Applications and Techniques: A Survey
Web Data Extraction is an important problem that has been studied by means of
different scientific tools and in a broad range of applications. Many
approaches to extracting data from the Web have been designed to solve specific
problems and operate in ad-hoc domains. Other approaches, instead, heavily
reuse techniques and algorithms developed in the field of Information
Extraction.
This survey aims at providing a structured and comprehensive overview of the
literature in the field of Web Data Extraction. We provided a simple
classification framework in which existing Web Data Extraction applications are
grouped into two main classes, namely applications at the Enterprise level and
at the Social Web level. At the Enterprise level, Web Data Extraction
techniques emerge as a key tool to perform data analysis in Business and
Competitive Intelligence systems as well as for business process
re-engineering. At the Social Web level, Web Data Extraction techniques allow
to gather a large amount of structured data continuously generated and
disseminated by Web 2.0, Social Media and Online Social Network users and this
offers unprecedented opportunities to analyze human behavior at a very large
scale. We discuss also the potential of cross-fertilization, i.e., on the
possibility of re-using Web Data Extraction techniques originally designed to
work in a given domain, in other domains.Comment: Knowledge-based System
Complete LibTech 2013 Print Program
PDF of the complete print program from the 2013 Library Technology Conferenc
- …