71 research outputs found

    Hierarchical Geographical Identifiers As An Indexing Technique For Geographic Information Retrieval

    Get PDF
    Location plays an ever increasing role in modern web-based applications. Many of these applications leverage off-the-shelf search engine technology to provide interactive access to large collections of data. Unfortunately, these commodity search engines do not provide special support for location-based indexing and retrieval. Many applications overcome this constraint by applying geographic bounding boxes in conjunction with range queries. We propose an alternative technique based on geographic identifiers and suggest it will yield faster query evaluation and provide higher search precision. Our experiment compared the two approaches by executing thousands of unique queries on a dataset with 1.8 million records. Based on the quantitative results obtained, our technique yielded drastic performance improvements in both query execution time and precision

    Symbolic and Visual Retrieval of Mathematical Notation using Formula Graph Symbol Pair Matching and Structural Alignment

    Get PDF
    Large data collections containing millions of math formulae in different formats are available on-line. Retrieving math expressions from these collections is challenging. We propose a framework for retrieval of mathematical notation using symbol pairs extracted from visual and semantic representations of mathematical expressions on the symbolic domain for retrieval of text documents. We further adapt our model for retrieval of mathematical notation on images and lecture videos. Graph-based representations are used on each modality to describe math formulas. For symbolic formula retrieval, where the structure is known, we use symbol layout trees and operator trees. For image-based formula retrieval, since the structure is unknown we use a more general Line of Sight graph representation. Paths of these graphs define symbol pairs tuples that are used as the entries for our inverted index of mathematical notation. Our retrieval framework uses a three-stage approach with a fast selection of candidates as the first layer, a more detailed matching algorithm with similarity metric computation in the second stage, and finally when relevance assessments are available, we use an optional third layer with linear regression for estimation of relevance using multiple similarity scores for final re-ranking. Our model has been evaluated using large collections of documents, and preliminary results are presented for videos and cross-modal search. The proposed framework can be adapted for other domains like chemistry or technical diagrams where two visually similar elements from a collection are usually related to each other

    GeoAnnotator: A Collaborative Semi-Automatic Platform for Constructing Geo-Annotated Text Corpora

    Get PDF
    Ground-truth datasets are essential for the training and evaluation of any automated algorithm. As such, gold-standard annotated corpora underlie most advances in natural language processing (NLP). However, only a few relatively small (geo-)annotated datasets are available for geoparsing, i.e., the automatic recognition and geolocation of place references in unstructured text. The creation of geoparsing corpora that include both the recognition of place names in text and matching of those names to toponyms in a geographic gazetteer (a process we call geo-annotation), is a laborious, time-consuming and expensive task. The field lacks efficient geo-annotation tools to support corpus building and lacks design guidelines for the development of such tools. Here, we present the iterative design of GeoAnnotator, a web-based, semi-automatic and collaborative visual analytics platform for geo-annotation. GeoAnnotator facilitates collaborative, multi-annotator creation of large corpora of geo-annotated text by generating computationally-generated pre-annotations that can be improved by human-annotator users. The resulting corpora can be used in improving and benchmarking geoparsing algorithms as well as various other spatial language-related methods. Further, the iterative design process and the resulting design decisions can be used in annotation platforms tailored for other application domains of NLP

    Helmholtz Portfolio Theme Large-Scale Data Management and Analysis (LSDMA)

    Get PDF
    The Helmholtz Association funded the "Large-Scale Data Management and Analysis" portfolio theme from 2012-2016. Four Helmholtz centres, six universities and another research institution in Germany joined to enable data-intensive science by optimising data life cycles in selected scientific communities. In our Data Life cycle Labs, data experts performed joint R&D together with scientific communities. The Data Services Integration Team focused on generic solutions applied by several communities

    Enabling collaborative numerical modeling in earth sciences using knowledge infrastructure

    Get PDF
    Knowledge Infrastructure is an intellectual framework for creating, sharing, and distributing knowledge. In this paper, we use Knowledge Infrastructure to address common barriers to entry to numerical modeling in Earth sciences: computational modeling education, replicating published model results, and reusing published models to extend research. We outline six critical functional requirements: 1) workflows designed for new users; 2) a community-supported collaborative web platform; 3) distributed data storage; 4) a software environment; 5) a personalized cloud-based high-performance computing platform; and 6) a standardized open source modeling framework. Our methods meet these functional requirements by providing three interactive computational narratives for hands-on, problem-based research demonstrating how to use Landlab on HydroShare. Landlab is an open-source toolkit for building, coupling, and exploring two-dimensional numerical models. HydroShare is an online collaborative environment for the sharing of data and models. We describe the methods we are using to accelerate knowledge development by providing a suite of modular and interoperable process components that allows students, domain experts, collaborators, researchers, and sponsors to learn by exploring shared data and modeling resources. The system is designed to support uses on the continuum from fully-developed modeling applications to prototyping research software tools

    Report of the Stanford Linked Data Workshop

    No full text
    The Stanford University Libraries and Academic Information Resources (SULAIR) with the Council on Library and Information Resources (CLIR) conducted at week-long workshop on the prospects for a large scale, multi-national, multi-institutional prototype of a Linked Data environment for discovery of and navigation among the rapidly, chaotically expanding array of academic information resources. As preparation for the workshop, CLIR sponsored a survey by Jerry Persons, Chief Information Architect emeritus of SULAIR that was published originally for workshop participants as background to the workshop and is now publicly available. The original intention of the workshop was to devise a plan for such a prototype. However, such was the diversity of knowledge, experience, and views of the potential of Linked Data approaches that the workshop participants turned to two more fundamental goals: building common understanding and enthusiasm on the one hand and identifying opportunities and challenges to be confronted in the preparation of the intended prototype and its operation on the other. In pursuit of those objectives, the workshop participants produced:1. a value statement addressing the question of why a Linked Data approach is worth prototyping;2. a manifesto for Linked Libraries (and Museums and Archives and 
);3. an outline of the phases in a life cycle of Linked Data approaches;4. a prioritized list of known issues in generating, harvesting & using Linked Data;5. a workflow with notes for converting library bibliographic records and other academic metadata to URIs;6. examples of potential “killer apps” using Linked Data: and7. a list of next steps and potential projects.This report includes a summary of the workshop agenda, a chart showing the use of Linked Data in cultural heritage venues, and short biographies and statements from each of the participants

    A Web GIS-based Integration of 3D Digital Models with Linked Open Data for Cultural Heritage Exploration

    Get PDF
    This PhD project explores how geospatial semantic web concepts, 3D web-based visualisation, digital interactive map, and cloud computing concepts could be integrated to enhance digital cultural heritage exploration; to offer long-term archiving and dissemination of 3D digital cultural heritage models; to better interlink heterogeneous and sparse cultural heritage data. The research findings were disseminated via four peer-reviewed journal articles and a conference article presented at GISTAM 2020 conference (which received the ‘Best Student Paper Award’)

    Development of implementation methods of water quality trading policy : using the Hydrological simulation program-Fortran (HSPF)

    Get PDF
    The entire thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file; a non-technical public abstract appears in the public.pdf file.Title from PDF of title page (University of Missouri--Columbia, viewed on August 17, 2010).Thesis advisor: Dr. Kathleen M. Trauth.Vita.Ph. D. University of Missouri--Columbia 2010.The recently promulgated water quality trading (WQT) policy is an innovative approach for achieving water quality standards with flexibility and economic efficiency. The policy allows for the trading of point and nonpoint source pollutant discharges between different locations within a watershed, as long as water quality standards are not violated along the stream. Many pilot programs and projects have generated useful information on how to implement water quality trading, but the number of actual trades is relatively small. The difficulty in determining the equality of trading locations and the uncertainty of nonpoint source pollutant concentrations in streams hinder the implementation of the trading program. The hydrological simulation program-fortran (HSPF) was used to estimate the hydrology and sediment loading throughout the Brush Creek, MO watershed for future land use development scenarios between upstream and downstream locations. Brush Creek does not have a proper monitoring station for calibration and validation of the watershed model. Thus, the Meramec River watershed which drains to the Meramec River near Eureka, MO station (07019000) was selected for input parameter calibration because the watershed contains the Brush Creek watershed as a subbasin. The development scenarios considered include upstream and downstream development from agricultural, forest, and range land to urbanized development with 25, 50, and 75 percent impervious surface through manually modified land use maps. Restoration scenarios would return agricultural areas to range land in both the upstream and downstream locations. Their hydrologic and sediment impacts to the outlet of the watershed were simulated in order to provide an estimate of how this particular land use change might be incorporated into a water quality trade. After sediment calculations for 20 different scenarios were performed in HSPF, equivalent acreages for sediment generation between upstream and downstream locations were developed as potential water quality trading units. Recommended equivalent acreages for the nonimparied and impaired stream cases were provided as references for a trading program manager in order to implement the water quality trading policy.Includes bibliographical reference
    • 

    corecore