48 research outputs found

    The Sea of Stuff: a model to manage shared mutable data in a distributed environment

    Get PDF
    Managing data is one of the main challenges in distributed systems and computer science in general. Data is created, shared, and managed across heterogeneous distributed systems of users, services, applications, and devices without a clear and comprehensive data model. This technological fragmentation and lack of a common data model result in a poor understanding of what data is, how it evolves over time, how it should be managed in a distributed system, and how it should be protected and shared. From a user perspective, for example, backing up data over multiple devices is a hard and error-prone process, or synchronising data with a cloud storage service can result in conflicts and unpredictable behaviours. This thesis identifies three challenges in data management: (1) how to extend the current data abstractions so that content, for example, is accessible irrespective of its location, versionable, and easy to distribute; (2) how to enable transparent data storage relative to locations, users, applications, and services; and (3) how to allow data owners to protect data against malicious users and automatically control content over a distributed system. These challenges are studied in detail in relation to the current state of the art and addressed throughout the rest of the thesis. The artefact of this work is the Sea of Stuff (SOS), a generic data model of immutable self-describing location-independent entities that allow the construction of a distributed system where data is accessible and organised irrespective of its location, easy to protect, and can be automatically managed according to a set of user-defined rules. The evaluation of this thesis demonstrates the viability of the SOS model for managing data in a distributed system and using user-defined rules to automatically manage data across multiple nodes."This work was supported by Adobe Systems, Inc. and EPSRC [grant number EP/M506631/1]" - from the Acknowledgements pag

    ImageJ2: ImageJ for the next generation of scientific image data

    Full text link
    ImageJ is an image analysis program extensively used in the biological sciences and beyond. Due to its ease of use, recordable macro language, and extensible plug-in architecture, ImageJ enjoys contributions from non-programmers, amateur programmers, and professional developers alike. Enabling such a diversity of contributors has resulted in a large community that spans the biological and physical sciences. However, a rapidly growing user base, diverging plugin suites, and technical limitations have revealed a clear need for a concerted software engineering effort to support emerging imaging paradigms, to ensure the software's ability to handle the requirements of modern science. Due to these new and emerging challenges in scientific imaging, ImageJ is at a critical development crossroads. We present ImageJ2, a total redesign of ImageJ offering a host of new functionality. It separates concerns, fully decoupling the data model from the user interface. It emphasizes integration with external applications to maximize interoperability. Its robust new plugin framework allows everything from image formats, to scripting languages, to visualization to be extended by the community. The redesigned data model supports arbitrarily large, N-dimensional datasets, which are increasingly common in modern image acquisition. Despite the scope of these changes, backwards compatibility is maintained such that this new functionality can be seamlessly integrated with the classic ImageJ interface, allowing users and developers to migrate to these new methods at their own pace. ImageJ2 provides a framework engineered for flexibility, intended to support these requirements as well as accommodate future needs

    Internet based molecular collaborative and publishing tools

    No full text
    The scientific electronic publishing model has hitherto been an Internet based delivery of electronic articles that are essentially replicas of their paper counterparts. They contain little in the way of added semantics that may better expose the science, assist the peer review process and facilitate follow on collaborations, even though the enabling technologies have been around for some time and are mature. This thesis will examine the evolution of chemical electronic publishing over the past 15 years. It will illustrate, which the help of two frameworks, how publishers should be exploiting technologies to improve the semantics of chemical journal articles, namely their value added features and relationships with other chemical resources on the Web. The first framework is an early exemplar of structured and scalable electronic publishing where a Web content management system and a molecular database are integrated. It employs a test bed of articles from several RSC journals and supporting molecular coordinate and connectivity information. The value of converting 3D molecular expressions in chemical file formats, such as the MOL file, into more generic 3D graphics formats, such as Web3D, is assessed. This exemplar highlights the use of metadata management for bidirectional hyperlink maintenance in electronic publishing. The second framework repurposes this metadata management concept into a Semantic Web application called SemanticEye. SemanticEye demonstrates how relationships between chemical electronic articles and other chemical resources are established. It adapts the successful semantic model used for digital music metadata management by popular applications such as iTunes. Globally unique identifiers enable relationships to be established between articles and other resources on the Web and SemanticEye implements two: the Document Object Identifier (DOI) for articles and the IUPAC International Chemical Identifier (InChI) for molecules. SemanticEye’s potential as a framework for seeding collaborations between researchers, who have hitherto never met, is explored using FOAF, the friend-of-a-friend Semantic Web standard for social networks

    Connected Information Management

    Get PDF
    Society is currently inundated with more information than ever, making efficient management a necessity. Alas, most of current information management suffers from several levels of disconnectedness: Applications partition data into segregated islands, small notes don’t fit into traditional application categories, navigating the data is different for each kind of data; data is either available at a certain computer or only online, but rarely both. Connected information management (CoIM) is an approach to information management that avoids these ways of disconnectedness. The core idea of CoIM is to keep all information in a central repository, with generic means for organization such as tagging. The heterogeneity of data is taken into account by offering specialized editors. The central repository eliminates the islands of application-specific data and is formally grounded by a CoIM model. The foundation for structured data is an RDF repository. The RDF editing meta-model (REMM) enables form-based editing of this data, similar to database applications such as MS access. Further kinds of data are supported by extending RDF, as follows. Wiki text is stored as RDF and can both contain structured text and be combined with structured data. Files are also supported by the CoIM model and are kept externally. Notes can be quickly captured and annotated with meta-data. Generic means for organization and navigation apply to all kinds of data. Ubiquitous availability of data is ensured via two CoIM implementations, the web application HYENA/Web and the desktop application HYENA/Eclipse. All data can be synchronized between these applications. The applications were used to validate the CoIM ideas

    A Model for Managing Information Flow on the World Wide Web

    Get PDF
    Metadata merged with duplicate record (http://hdl.handle.net/10026.1/330) on 20.12.2016 by CS (TIS).This is a digitised version of a thesis that was deposited in the University Library. If you are the author please contact PEARL Admin ([email protected]) to discuss options.This thesis considers the nature of information management on the World Wide Web. The web has evolved into a global information system that is completely unregulated, permitting anyone to publish whatever information they wish. However, this information is almost entirely unmanaged, which, together with the enormous number of users who access it, places enormous strain on the web's architecture. This has led to the exposure of inherent flaws, which reduce its effectiveness as an information system. The thesis presents a thorough analysis of the state of this architecture, and identifies three flaws that could render the web unusable: link rot; a shrinking namespace; and the inevitable increase of noise in the system. A critical examination of existing solutions to these flaws is provided, together with a discussion on why the solutions have not been deployed or adopted. The thesis determines that they have failed to take into account the nature of the information flow between information provider and consumer, or the open philosophy of the web. The overall aim of the research has therefore been to design a new solution to these flaws in the web, based on a greater understanding of the nature of the information that flows upon it. The realization of this objective has included the development of a new model for managing information flow on the web, which is used to develop a solution to the flaws. The solution comprises three new additions to the web's architecture: a temporal referencing scheme; an Oracle Server Network for more effective web browsing; and a Resource Locator Service, which provides automatic transparent resource migration. The thesis describes their design and operation, and presents the concept of the Request Router, which provides a new way of integrating such distributed systems into the web's existing architecture without breaking it. The design of the Resource Locator Service, including the development of new protocols for resource migration, is covered in great detail, and a prototype system that has been developed to prove the effectiveness of the design is presented. The design is further validated by comprehensive performance measurements of the prototype, which show that it will scale to manage a web whose size is orders of magnitude greater than it is today

    Collaboratory for Multi-scale Chemical Science DOE grant FG02-01ER25444

    Get PDF
    Motivation for the Project Progress on the many multi-scale problems in the chemical sciences is significantly hindered by the difficulties researchers working at each scale have in accessing and translating the best available information and methods from the other scales. Very often there are "gaps" between scales which cannot be bridged at present, often because there is an unresolved technical or mathematical issue in addition to the pervasive lack of translation software and problems with connecting the mismatched data models used at each scale. Problems are particularly severe for complex systems involving combustion and pyrolysis chemistry. For example, simulations used to design high-efficiency, low-emission homogeneous-charge compression-ignition (HCCI) engines typically contain thousands of different chemical species and reactions. The engine designer running the macroscopic simulation is typically not an expert in chemistry -the macroscopic engine scale is quite complicated enough -so he or she needs all the important microscopic chemical details to be handled more or less automatically by software, and in a way that the chemistry models can be easily updated as additional information becomes available. All these microscopic chemistry details must be documented electronically in a way that is easy visible to the chemistry community, and these chemistry databases must be extensible, to make it practical to capture the benefits of the very large, but also very thinly spread (i.e. each chemist is expert in only a few types of molecules and reactions, under a limited range of conditions), expertise in the chemistry community. The numerical methods used by the engine designer were not designed to handle all this chemical detail, so intermediate preprocessing model-reduction software is needed to reduce the size of the chemical model. It is crucial that the approximation errors introduced in this step be properly controlled, so we do not lose significant accuracy in the final simulation results. Again, all the assumptions and calculations involved in this model-reduction process need to be documented, to facilitate future progress and to allow the engine model to be updated as more information on the combustion chemistry becomes available