1,932 research outputs found

    Techniques for creating ground-truthed sketch corpora

    Get PDF
    The problem of recognizing handwritten mathematics notation has been studied for over forty years with little practical success. The poor performance of math recognition systems is due, at least in part, to a lack of realistic data for use in training recognition systems and evaluating their accuracy. In fields for which such data is available, such as face and voice recognition, the data, along with objectively-evaluated recognition contests, has contributed to the rapid advancement of the state of the art. This thesis proposes a method for constructing data corpora not only for hand- written math recognition, but for sketch recognition in general. The method consists of automatically generating template expressions, transcribing these expressions by hand, and automatically labelling them with ground-truth. This approach is motivated by practical considerations and is shown to be more extensible and objective than other potential methods. We introduce a grammar-based approach for the template generation task. In this approach, random derivations in a context-free grammar are controlled so as to generate math expressions for transcription. The generation process may be controlled in terms of expression size and distribution over mathematical semantics. Finally, we present a novel ground-truthing method based on matching terminal symbols in grammar derivations to recognized symbols. The matching is produced by a best-first search through symbol recognition results. Experiments show that this method is highly accurate but rejects many of its inputs

    Digital Libraries, Intelligent Data Analytics, and Augmented Description: A Demonstration Project

    Get PDF
    From July 16-to November 8, 2019, the Aida digital libraries research team at the University of Nebraska-Lincoln collaborated with the Library of Congress on “Digital Libraries, Intelligent Data Analytics, and Augmented Description: A Demonstration Project.“ This demonstration project sought to (1) develop and investigate the viability and feasibility of textual and image-based data analytics approaches to support and facilitate discovery; (2) understand technical tools and requirements for the Library of Congress to improve access and discovery of its digital collections; and (3) enable the Library of Congress to plan for future possibilities. In pursuit of these goals, we focused our work around two areas: extracting and foregrounding visual content from Chronicling America (chroniclingamerica.loc.gov) and applying a series of image processing and machine learning methods to minimally processed manuscript collections featured in By the People (crowd.loc.gov). We undertook a series of explorations and investigated a range of issues and challenges related to machine learning and the Library’s collections. This final report details the explorations, addresses social and technical challenges with regard to the explorations and that are critical context for the development of machine learning in the cultural heritage sector, and makes several recommendations to the Library of Congress as it plans for future possibilities. We propose two top-level recommendations. First, the Library should focus the weight of its machine learning efforts and energies on social and technical infrastructures for the development of machine learning in cultural heritage organizations, research libraries, and digital libraries. Second, we recommend that the Library invest in continued, ongoing, intentional explorations and investigations of particular machine learning applications to its collections. Both of these top-level recommendations map to the three goals of the Library’s 2019 digital strategy. Within each top-level recommendation, we offer three more concrete, short- and medium-term recommendations. They include, under social and technical infrastructures: (1) Develop a statement of values or principles that will guide how the Library of Congress pursues the use, application, and development of machine learning for cultural heritage. (2) Create and scope a machine learning roadmap for the Library that looks both internally to the Library of Congress and its needs and goals and externally to the larger cultural heritage and other research communities. (3) Focus efforts on developing ground truth sets and benchmarking data and making these easily available. Nested under the recommendation to support ongoing explorations and investigations, we recommend that the Library: (4) Join the Library of Congress’s emergent efforts in machine learning with its existing expertise and leadership in crowdsourcing. Combine these areas as “informed crowdsourcing” as appropriate. (5) Sponsor challenges for teams to create additional metadata for digital collections in the Library of Congress. As part of these challenges, require teams to engage across a range of social and technical questions and problem areas. (6) Continue to create and support opportunities for researchers to partner in substantive ways with the Library of Congress on machine learning explorations. Each of these recommendations speak to the investigation and challenge areas identified by Thomas Padilla in Responsible Operations: Data Science, Machine Learning, and AI in Libraries. This demonstration project—via its explorations, discussion, and recommendations—shows the potential of machine learning toward a variety of goals and use cases, and it argues that the technology itself will not be the hardest part of this work. The hardest part will be the myriad challenges to undertaking this work in ways that are socially and culturally responsible, while also upholding responsibility to make the Library of Congress’s materials available in timely and accessible ways. Fortunately, the Library of Congress is in a remarkable position to advance machine learning for cultural heritage organizations, through its size, the diversity of its collections, and its commitment to digital strategy

    An "All Hands" Call to the Social Science Community: Establishing a Community Framework for Complexity Modeling Using Agent Based Models and Cyberinfrastructure

    Get PDF
    To date, many communities of practice (COP) in the social sciences have been struggling with how to deal with rapidly growing bodies of information. Many CoPs across broad disciplines have turned to community frameworks for complexity modeling (CFCMs) but this strategy has been slow to be discussed let alone adopted by the social sciences communities of practice (SS-CoPs). In this paper we urge the SS-CoPs that it is timely to develop and establish a CBCF for the social sciences for two major reasons: the rapid acquisition of data and the emergence of critical cybertools which can facilitate agent-based, spatially-explicit models. The goal of this paper is not to prescribe how a CFCM might be set up but to suggest of what components it might consist and what its advantages would be. Agent based models serve the establishment of a CFCM because they allow robust and diverse inputs and are amenable to output-driven modifications. In other words, as phenomena are resolved by a SS-CoP it is possible to adjust and refine ABMs (and their predictive ability) as a recursive and collective process. Existing and emerging cybertools such as computer networks, digital data collections and advances in programming languages mean the SS-CoP must now carefully consider committing the human organization to enabling a cyberinfrastructure tool. The combination of technologies with human interfaces can allow scenarios to be incorporated through 'if' 'then' rules and provide a powerful basis for addressing the dynamics of coupled and complex social ecological systems (cSESs). The need for social scientists to be more engaged participants in the growing challenges of characterizing chaotic, self-organizing social systems and predicting emergent patterns makes the application of ABMs timely. The enabling of a SS-CoP CFCM human-cyberinfrastructure represents an unprecedented opportunity to synthesize, compare and evaluate diverse sociological phenomena as a cohesive and recursive community-driven process.Community-Based Complex Models, Mathematics, Social Sciences

    Reproducibility and Replicability in Unmanned Aircraft Systems and Geographic Information Science

    Get PDF
    Multiple scientific disciplines face a so-called crisis of reproducibility and replicability (R&R) in which the validity of methodologies is questioned due to an inability to confirm experimental results. Trust in information technology (IT)-intensive workflows within geographic information science (GIScience), remote sensing, and photogrammetry depends on solutions to R&R challenges affecting multiple computationally driven disciplines. To date, there have only been very limited efforts to overcome R&R-related issues in remote sensing workflows in general, let alone those tied to disruptive technologies such as unmanned aircraft systems (UAS) and machine learning (ML). To accelerate an understanding of this crisis, a review was conducted to identify the issues preventing R&R in GIScience. Key barriers included: (1) awareness of time and resource requirements, (2) accessibility of provenance, metadata, and version control, (3) conceptualization of geographic problems, and (4) geographic variability between study areas. As a case study, a replication of a GIScience workflow utilizing Yolov3 algorithms to identify objects in UAS imagery was attempted. Despite the ability to access source data and workflow steps, it was discovered that the lack of accessibility to provenance and metadata of each small step of the work prohibited the ability to successfully replicate the work. Finally, a novel method for provenance generation was proposed to address these issues. It was found that artificial intelligence (AI) could be used to quickly create robust provenance records for workflows that do not exceed time and resource constraints and provide the information needed to replicate work. Such information can bolster trust in scientific results and provide access to cutting edge technology that can improve everyday life

    Reproducibility and Replicability in Unmanned Aircraft Systems and Geographic Information Science

    Get PDF
    Multiple scientific disciplines face a so-called crisis of reproducibility and replicability (R&R) in which the validity of methodologies is questioned due to an inability to confirm experimental results. Trust in information technology (IT)-intensive workflows within geographic information science (GIScience), remote sensing, and photogrammetry depends on solutions to R&R challenges affecting multiple computationally driven disciplines. To date, there have only been very limited efforts to overcome R&R-related issues in remote sensing workflows in general, let alone those tied to disruptive technologies such as unmanned aircraft systems (UAS) and machine learning (ML). To accelerate an understanding of this crisis, a review was conducted to identify the issues preventing R&R in GIScience. Key barriers included: (1) awareness of time and resource requirements, (2) accessibility of provenance, metadata, and version control, (3) conceptualization of geographic problems, and (4) geographic variability between study areas. As a case study, a replication of a GIScience workflow utilizing Yolov3 algorithms to identify objects in UAS imagery was attempted. Despite the ability to access source data and workflow steps, it was discovered that the lack of accessibility to provenance and metadata of each small step of the work prohibited the ability to successfully replicate the work. Finally, a novel method for provenance generation was proposed to address these issues. It was found that artificial intelligence (AI) could be used to quickly create robust provenance records for workflows that do not exceed time and resource constraints and provide the information needed to replicate work. Such information can bolster trust in scientific results and provide access to cutting edge technology that can improve everyday life

    Challenges in Annotation of useR Data for UbiquitOUs Systems: Results from the 1st ARDUOUS Workshop

    Full text link
    Labelling user data is a central part of the design and evaluation of pervasive systems that aim to support the user through situation-aware reasoning. It is essential both in designing and training the system to recognise and reason about the situation, either through the definition of a suitable situation model in knowledge-driven applications, or through the preparation of training data for learning tasks in data-driven models. Hence, the quality of annotations can have a significant impact on the performance of the derived systems. Labelling is also vital for validating and quantifying the performance of applications. In particular, comparative evaluations require the production of benchmark datasets based on high-quality and consistent annotations. With pervasive systems relying increasingly on large datasets for designing and testing models of users' activities, the process of data labelling is becoming a major concern for the community. In this work we present a qualitative and quantitative analysis of the challenges associated with annotation of user data and possible strategies towards addressing these challenges. The analysis was based on the data gathered during the 1st International Workshop on Annotation of useR Data for UbiquitOUs Systems (ARDUOUS) and consisted of brainstorming as well as annotation and questionnaire data gathered during the talks, poster session, live annotation session, and discussion session

    Benthic habitat mapping in coastal waters of south–east Australia

    Full text link
    The Victorian Marine Mapping Project will improve knowledge on the location, spatial distribution, condition and extent of marine habitats and associated biodiversity in Victorian State waters. This information will guide informed decision making, enable priority setting, and assist in targeted natural resource management planning. This project entails benthic habitat mapping over 500 square kilometers of Victorian State waters using multibeam sonar, towed video and image classification techniques. Information collected includes seafloor topography, seafloor softness and hardness (reflectivity), and information on geology and benthic flora and fauna assemblages collectively comprising habitat. Computerized semi-automated classification techniques are also being developed to provide a cost effective approach to rapid mapping and assessment of coastal habitats.Habitat mapping is important for understanding and communicating the distribution of natural values within the marine environment. The coastal fringe of Victoria encompasses a rich and diverse ecosystem representative of coastal waters of South-east Australia. To date, extensive knowledge of these systems is limited due to the lack of available data. Knowledge of the distribution and extent of habitat is required to target management activities most effectively, and provide the basis to monitor and report on their status in the future.<br /

    Understanding citizen science and environmental monitoring: final report on behalf of UK Environmental Observation Framework

    Get PDF
    Citizen science can broadly be defined as the involvement of volunteers in science. Over the past decade there has been a rapid increase in the number of citizen science initiatives. The breadth of environmental-based citizen science is immense. Citizen scientists have surveyed for and monitored a broad range of taxa, and also contributed data on weather and habitats reflecting an increase in engagement with a diverse range of observational science. Citizen science has taken many varied approaches from citizen-led (co-created) projects with local community groups to, more commonly, scientist-led mass participation initiatives that are open to all sectors of society. Citizen science provides an indispensable means of combining environmental research with environmental education and wildlife recording. Here we provide a synthesis of extant citizen science projects using a novel cross-cutting approach to objectively assess understanding of citizen science and environmental monitoring including: 1. Brief overview of knowledge on the motivations of volunteers. 2. Semi-systematic review of environmental citizen science projects in order to understand the variety of extant citizen science projects. 3. Collation of detailed case studies on a selection of projects to complement the semi-systematic review. 4. Structured interviews with users of citizen science and environmental monitoring data focussing on policy, in order to more fully understand how citizen science can fit into policy needs. 5. Review of technology in citizen science and an exploration of future opportunities

    PICES Press, Vol. 21, No. 2, Summer 2013

    Get PDF
    •The 2013 Inter-sessional Science Board Meeting: A Note from the Science Board Chairman (pp. 1-4) •ICES/PICES Workshop on Global Assessment of the Implications of Climate Change on the Spatial Distribution of Fish and Fisheries (pp. 5-8) •PICES participates in a Convention on Biological Diversity Regional Workshop (pp. 9-11) •Social and Economic Indicators for Status and Change within North Pacific Ecosystems (pp. 12-13) •The Fourth International Jellyfish Bloom Symposium (pp. 14-15) •Workshop on Radionuclide Science and Environmental Quality in the North Pacific (pp. 16-17) •PICES-MAFF Project on Marine Ecosystem Health and Human Well-Being: Indonesia Workshop (pp. 18-19) •Socioeconomic Indicators for United States Fisheries and Fishing Communities (pp. 20-23) •Harmful Algal Blooms in a Changing World (pp. 24-25, 27) •Enhancing Scientific Cooperation between PICES and NPAFC (pp. 26-27) •Workshop on Marine Biodiversity Conservation and Marine Protected Areas in the Northwest Pacific (pp. 28-29) •The State of the Western North Pacific in the Second Half of 2012 (pp. 30-31) •Stuck in Neutral in the Northeast Pacific Ocean (pp. 32-33) •The Bering Sea: Current Status and Recent Trends (pp. 34-36) •For your Bookshelf (p. 37) •Howard Freeland takes home Canadian awards (p. 38

    A PHYSIOCRATIC SYSTEMS FRAMEWORK FOR OPEN SOURCE AGRICULTURAL RESEARCH AND DEVELOPMENT

    Get PDF
    This dissertation presents a new participatory approach to agricultural research and development. It surveys the biological, sociological, economic, and technical landscape and proposes a framework for adaptive management based on the 18th century Physiocratic school of land-based economics. Industrial specialization and heavy emphasis on deductive approaches to science have contributed to the disconnection of large portions of the population from natural systems. Conventional agriculture and agricultural research methods following this pattern have created expensive social, environmental, and economic external costs, while adaptive management and resilient agricultural systems have been hindered by the cost and complexity of quantifying environmental services. However, the convergence of low cost computing, sensors, memory, and resulting data analytic methods, combined with new collaborative tools and social media, have created an exciting open source environment with the potential to engage more people in analyzing and managing our natural environment
    • …
    corecore