95 research outputs found

    Extending Science Gateway Frameworks to Support Big Data Applications in the Cloud

    Get PDF
    Cloud computing offers massive scalability and elasticity required by many scientific and commercial applications. Combining the computational and data handling capabilities of clouds with parallel processing also has the potential to tackle Big Data problems efficiently. Science gateway frameworks and workflow systems enable application developers to implement complex applications and make these available for end-users via simple graphical user interfaces. The integration of such frameworks with Big Data processing tools on the cloud opens new oppor-tunities for application developers. This paper investigates how workflow sys-tems and science gateways can be extended with Big Data processing capabilities. A generic approach based on infrastructure aware workflows is suggested and a proof of concept is implemented based on the WS-PGRADE/gUSE science gateway framework and its integration with the Hadoop parallel data processing solution based on the MapReduce paradigm in the cloud. The provided analysis demonstrates that the methods described to integrate Big Data processing with workflows and science gateways work well in different cloud infrastructures and application scenarios, and can be used to create massively parallel applications for scientific analysis of Big Data

    Usage Bibliometrics

    Full text link
    Scholarly usage data provides unique opportunities to address the known shortcomings of citation analysis. However, the collection, processing and analysis of usage data remains an area of active research. This article provides a review of the state-of-the-art in usage-based informetric, i.e. the use of usage data to study the scholarly process.Comment: Publisher's PDF (by permission). Publisher web site: books.infotoday.com/asist/arist44.shtm

    Using Workflows to Explore and Optimise Named Entity Recognition for Chemistry

    Get PDF
    Chemistry text mining tools should be interoperable and adaptable regardless of system-level implementation, installation or even programming issues. We aim to abstract the functionality of these tools from the underlying implementation via reconfigurable workflows for automatically identifying chemical names. To achieve this, we refactored an established named entity recogniser (in the chemistry domain), OSCAR and studied the impact of each component on the net performance. We developed two reconfigurable workflows from OSCAR using an interoperable text mining framework, U-Compare. These workflows can be altered using the drag-&-drop mechanism of the graphical user interface of U-Compare. These workflows also provide a platform to study the relationship between text mining components such as tokenisation and named entity recognition (using maximum entropy Markov model (MEMM) and pattern recognition based classifiers). Results indicate that, for chemistry in particular, eliminating noise generated by tokenisation techniques lead to a slightly better performance than others, in terms of named entity recognition (NER) accuracy. Poor tokenisation translates into poorer input to the classifier components which in turn leads to an increase in Type I or Type II errors, thus, lowering the overall performance. On the Sciborg corpus, the workflow based system, which uses a new tokeniser whilst retaining the same MEMM component, increases the F-score from 82.35% to 84.44%. On the PubMed corpus, it recorded an F-score of 84.84% as against 84.23% by OSCAR

    Wrapping Web Pages into XML Documents

    No full text

    Referential actions as logical rules

    No full text
    Referential actions are specialized triggers used to automatically maintain referential integrity. While their local behavior can be grasped easily, it is far from clear what the combined e ect of a set of referential actions, i.e., their global semantics should be. For example, di erent execution orders may lead to ambiguities in determining the nal set of updates to be applied. To resolve these problems, we propose an abstract logical framework for rule-based maintenance of referential integrity: First, we identify desirable abstract properties like admissibility of updates which lead to a non-constructive global semantics of referential actions. We obtain a constructive de nition by formalizing a set of referential actions RA as logical rules, and show that the declarative semantics of the resulting logic program PRA captures the intended abstract semantics: The well-founded model of PRA yields a unique set of updates, which is a safe, sceptical approximation of the set of all maximal admissible updates � the third truth-value unde ned is assigned to all controversial updates. Finally, we show howtoobtaina characterization of all maximal admissible subsets of a given set of updates using certain maximal stable models.
    • …
    corecore