4,655 research outputs found

    Sciunits: Reusable Research Objects

    Full text link
    Science is conducted collaboratively, often requiring knowledge sharing about computational experiments. When experiments include only datasets, they can be shared using Uniform Resource Identifiers (URIs) or Digital Object Identifiers (DOIs). An experiment, however, seldom includes only datasets, but more often includes software, its past execution, provenance, and associated documentation. The Research Object has recently emerged as a comprehensive and systematic method for aggregation and identification of diverse elements of computational experiments. While a necessary method, mere aggregation is not sufficient for the sharing of computational experiments. Other users must be able to easily recompute on these shared research objects. In this paper, we present the sciunit, a reusable research object in which aggregated content is recomputable. We describe a Git-like client that efficiently creates, stores, and repeats sciunits. We show through analysis that sciunits repeat computational experiments with minimal storage and processing overhead. Finally, we provide an overview of sharing and reproducible cyberinfrastructure based on sciunits gaining adoption in the domain of geosciences

    The preliminary SOL (Sizing and Optimization Language) reference manual

    Get PDF
    The Sizing and Optimization Language, SOL, a high-level special-purpose computer language has been developed to expedite application of numerical optimization to design problems and to make the process less error-prone. This document is a reference manual for those wishing to write SOL programs. SOL is presently available for DEC VAX/VMS systems. A SOL package is available which includes the SOL compiler and runtime library routines. An overview of SOL appears in NASA TM 100565

    Automatic summarization of rushes video using bipartite graphs

    Get PDF
    In this paper we present a new approach for automatic summarization of rushes, or unstructured video. Our approach is composed of three major steps. First, based on shot and sub-shot segmentations, we filter sub-shots with low information content not likely to be useful in a summary. Second, a method using maximal matching in a bipartite graph is adapted to measure similarity between the remaining shots and to minimize inter-shot redundancy by removing repetitive retake shots common in rushes video. Finally, the presence of faces and motion intensity are characterised in each sub-shot. A measure of how representative the sub-shot is in the context of the overall video is then proposed. Video summaries composed of keyframe slideshows are then generated. In order to evaluate the effectiveness of this approach we re-run the evaluation carried out by TRECVid, using the same dataset and evaluation metrics used in the TRECVid video summarization task in 2007 but with our own assessors. Results show that our approach leads to a significant improvement on our own work in terms of the fraction of the TRECVid summary ground truth included and is competitive with the best of other approaches in TRECVid 2007

    Scalable Multi-document Summarization Using Natural Language Processing

    Get PDF
    In this age of Internet, Natural Language Processing (NLP) techniques are the key sources for providing information required by users. However, with the extensive usage of available data, a secondary level of wrappers that interact with NLP tools have become necessary. These tools must extract a concise summary from the primary data set retrieved. The main reason for using text summarization techniques is to obtain this secondary level of information. Text summarization using NLP techniques is an interesting area of research with various implications for information retrieval. This report deals with the use of Latent Semantic Analysis (LSA) for generic text summarization and compares it with other models available. It proposes text summarization using LDS in conjunction with open-source NLP frameworks such as Mahout and Lucene. The LSA algorithm can be scaled to multiple large-sized documents using these framworks. The performance of this algorithm is then compared with other models commonly used for summarization and Recall-Oriented Understudy of Gisting Evaluation (ROUGE) scores. This project implements a text summarization framework, which uses available open-source tools and cloud resources to summarize documents from many languages such as, in the case of this study, English and Hindi

    EcoGIS – GIS tools for ecosystem approaches to fisheries management

    Get PDF
    Executive Summary: The EcoGIS project was launched in September 2004 to investigate how Geographic Information Systems (GIS), marine data, and custom analysis tools can better enable fisheries scientists and managers to adopt Ecosystem Approaches to Fisheries Management (EAFM). EcoGIS is a collaborative effort between NOAA’s National Ocean Service (NOS) and National Marine Fisheries Service (NMFS), and four regional Fishery Management Councils. The project has focused on four priority areas: Fishing Catch and Effort Analysis, Area Characterization, Bycatch Analysis, and Habitat Interactions. Of these four functional areas, the project team first focused on developing a working prototype for catch and effort analysis: the Fishery Mapper Tool. This ArcGIS extension creates time-and-area summarized maps of fishing catch and effort from logbook, observer, or fishery-independent survey data sets. Source data may come from Oracle, Microsoft Access, or other file formats. Feedback from beta-testers of the Fishery Mapper was used to debug the prototype, enhance performance, and add features. This report describes the four priority functional areas, the development of the Fishery Mapper tool, and several themes that emerged through the parallel evolution of the EcoGIS project, the concept and implementation of the broader field of Ecosystem Approaches to Management (EAM), data management practices, and other EAM toolsets. In addition, a set of six succinct recommendations are proposed on page 29. One major conclusion from this work is that there is no single “super-tool” to enable Ecosystem Approaches to Management; as such, tools should be developed for specific purposes with attention given to interoperability and automation. Future work should be coordinated with other GIS development projects in order to provide “value added” and minimize duplication of efforts. In addition to custom tools, the development of cross-cutting Regional Ecosystem Spatial Databases will enable access to quality data to support the analyses required by EAM. GIS tools will be useful in developing Integrated Ecosystem Assessments (IEAs) and providing pre- and post-processing capabilities for spatially-explicit ecosystem models. Continued funding will enable the EcoGIS project to develop GIS tools that are immediately applicable to today’s needs. These tools will enable simplified and efficient data query, the ability to visualize data over time, and ways to synthesize multidimensional data from diverse sources. These capabilities will provide new information for analyzing issues from an ecosystem perspective, which will ultimately result in better understanding of fisheries and better support for decision-making. (PDF file contains 45 pages.

    Interactive Search and Exploration in Online Discussion Forums Using Multimodal Embeddings

    Get PDF
    In this paper we present a novel interactive multimodal learning system, which facilitates search and exploration in large networks of social multimedia users. It allows the analyst to identify and select users of interest, and to find similar users in an interactive learning setting. Our approach is based on novel multimodal representations of users, words and concepts, which we simultaneously learn by deploying a general-purpose neural embedding model. We show these representations to be useful not only for categorizing users, but also for automatically generating user and community profiles. Inspired by traditional summarization approaches, we create the profiles by selecting diverse and representative content from all available modalities, i.e. the text, image and user modality. The usefulness of the approach is evaluated using artificial actors, which simulate user behavior in a relevance feedback scenario. Multiple experiments were conducted in order to evaluate the quality of our multimodal representations, to compare different embedding strategies, and to determine the importance of different modalities. We demonstrate the capabilities of the proposed approach on two different multimedia collections originating from the violent online extremism forum Stormfront and the microblogging platform Twitter, which are particularly interesting due to the high semantic level of the discussions they feature
    corecore