48 research outputs found
The Sea of Stuff: a model to manage shared mutable data in a distributed environment
Managing data is one of the main challenges in distributed systems and computer science in general. Data is created, shared, and managed across heterogeneous distributed systems of users, services, applications, and devices without a clear and comprehensive data model. This technological fragmentation and lack of a common data model result in a poor understanding of what data is, how it evolves over time, how it should be managed in a distributed system, and how it should be protected and shared. From a user perspective, for example, backing up data over multiple devices is a hard and error-prone process, or synchronising data with a cloud storage service can result in conflicts and unpredictable behaviours.
This thesis identifies three challenges in data management:
(1) how to extend the current data abstractions so that content, for example, is accessible irrespective of its location, versionable, and easy to distribute;
(2) how to enable transparent data storage relative to locations, users, applications, and services;
and (3) how to allow data owners to protect data against malicious users and automatically control content over a distributed system. These challenges are studied in detail in relation to the current state of the art and addressed throughout the rest of the thesis.
The artefact of this work is the Sea of Stuff (SOS), a generic data model of immutable self-describing location-independent entities that allow the construction of a distributed system where data is accessible and organised irrespective of its location, easy to protect, and can be automatically managed according to a set of user-defined rules.
The evaluation of this thesis demonstrates the viability of the SOS model for managing data in a distributed system and using user-defined rules to automatically manage data across multiple nodes."This work was supported by Adobe Systems, Inc. and EPSRC [grant number EP/M506631/1]" - from the Acknowledgements pag
ImageJ2: ImageJ for the next generation of scientific image data
ImageJ is an image analysis program extensively used in the biological
sciences and beyond. Due to its ease of use, recordable macro language, and
extensible plug-in architecture, ImageJ enjoys contributions from
non-programmers, amateur programmers, and professional developers alike.
Enabling such a diversity of contributors has resulted in a large community
that spans the biological and physical sciences. However, a rapidly growing
user base, diverging plugin suites, and technical limitations have revealed a
clear need for a concerted software engineering effort to support emerging
imaging paradigms, to ensure the software's ability to handle the requirements
of modern science. Due to these new and emerging challenges in scientific
imaging, ImageJ is at a critical development crossroads.
We present ImageJ2, a total redesign of ImageJ offering a host of new
functionality. It separates concerns, fully decoupling the data model from the
user interface. It emphasizes integration with external applications to
maximize interoperability. Its robust new plugin framework allows everything
from image formats, to scripting languages, to visualization to be extended by
the community. The redesigned data model supports arbitrarily large,
N-dimensional datasets, which are increasingly common in modern image
acquisition. Despite the scope of these changes, backwards compatibility is
maintained such that this new functionality can be seamlessly integrated with
the classic ImageJ interface, allowing users and developers to migrate to these
new methods at their own pace. ImageJ2 provides a framework engineered for
flexibility, intended to support these requirements as well as accommodate
future needs
Internet based molecular collaborative and publishing tools
The scientific electronic publishing model has hitherto been an Internet based delivery of electronic articles that are essentially replicas of their paper counterparts. They contain little in the way of added semantics that may better expose the science, assist the peer review process and facilitate follow on collaborations, even though the enabling technologies have been around for some time and are mature. This thesis will examine the evolution of chemical electronic publishing over the past 15 years. It will illustrate, which the help of two frameworks, how publishers should be exploiting technologies to improve the semantics of chemical journal articles, namely their value added features and relationships with other chemical resources on the Web.
The first framework is an early exemplar of structured and scalable electronic publishing where a Web content management system and a molecular database are integrated. It employs a test bed of articles from several RSC journals and supporting molecular coordinate and connectivity information. The value of converting 3D molecular expressions in chemical file formats, such as the MOL file, into more generic 3D graphics formats, such as Web3D, is assessed. This exemplar highlights the use of metadata management for bidirectional hyperlink maintenance in electronic publishing.
The second framework repurposes this metadata management concept into a Semantic Web application called SemanticEye. SemanticEye demonstrates how relationships between chemical electronic articles and other chemical resources are established. It adapts the successful semantic model used for digital music metadata management by popular applications such as iTunes. Globally unique identifiers enable relationships to be established between articles and other resources on the Web and SemanticEye implements two: the Document Object Identifier (DOI) for articles and the IUPAC International Chemical Identifier (InChI) for molecules. SemanticEye’s potential as a framework for seeding collaborations between researchers, who have hitherto never met, is explored using FOAF, the friend-of-a-friend Semantic Web standard for social networks
Connected Information Management
Society is currently inundated with more information than ever, making efficient management
a necessity. Alas, most of current information management suffers from several
levels of disconnectedness: Applications partition data into segregated islands,
small notes don’t fit into traditional application categories, navigating the data is different
for each kind of data; data is either available at a certain computer or only online,
but rarely both. Connected information management (CoIM) is an approach to information
management that avoids these ways of disconnectedness. The core idea of
CoIM is to keep all information in a central repository, with generic means for organization
such as tagging. The heterogeneity of data is taken into account by offering
specialized editors.
The central repository eliminates the islands of application-specific data and is formally
grounded by a CoIM model. The foundation for structured data is an RDF repository.
The RDF editing meta-model (REMM) enables form-based editing of this data,
similar to database applications such as MS access. Further kinds of data are supported
by extending RDF, as follows. Wiki text is stored as RDF and can both contain
structured text and be combined with structured data. Files are also supported by the
CoIM model and are kept externally. Notes can be quickly captured and annotated with
meta-data. Generic means for organization and navigation apply to all kinds of data.
Ubiquitous availability of data is ensured via two CoIM implementations, the web application
HYENA/Web and the desktop application HYENA/Eclipse. All data can be
synchronized between these applications. The applications were used to validate the
CoIM ideas
A Model for Managing Information Flow on the World Wide Web
Metadata merged with duplicate record (http://hdl.handle.net/10026.1/330) on 20.12.2016 by CS (TIS).This is a digitised version of a thesis that was deposited in the University Library. If you are the author please contact PEARL Admin ([email protected]) to discuss options.This thesis considers the nature of information management on the World Wide Web. The
web has evolved into a global information system that is completely unregulated, permitting
anyone to publish whatever information they wish. However, this information is almost
entirely unmanaged, which, together with the enormous number of users who access it, places
enormous strain on the web's architecture. This has led to the exposure of inherent flaws,
which reduce its effectiveness as an information system.
The thesis presents a thorough analysis of the state of this architecture, and identifies three
flaws that could render the web unusable: link rot; a shrinking namespace; and the inevitable
increase of noise in the system. A critical examination of existing solutions to these flaws is
provided, together with a discussion on why the solutions have not been deployed or adopted.
The thesis determines that they have failed to take into account the nature of the information
flow between information provider and consumer, or the open philosophy of the web. The
overall aim of the research has therefore been to design a new solution to these flaws in the
web, based on a greater understanding of the nature of the information that flows upon it.
The realization of this objective has included the development of a new model for managing
information flow on the web, which is used to develop a solution to the flaws. The solution
comprises three new additions to the web's architecture: a temporal referencing scheme; an
Oracle Server Network for more effective web browsing; and a Resource Locator Service,
which provides automatic transparent resource migration. The thesis describes their design
and operation, and presents the concept of the Request Router, which provides a new way of
integrating such distributed systems into the web's existing architecture without breaking it.
The design of the Resource Locator Service, including the development of new protocols for
resource migration, is covered in great detail, and a prototype system that has been developed
to prove the effectiveness of the design is presented. The design is further validated by
comprehensive performance measurements of the prototype, which show that it will scale to
manage a web whose size is orders of magnitude greater than it is today
Collaboratory for Multi-scale Chemical Science DOE grant FG02-01ER25444
Motivation for the Project Progress on the many multi-scale problems in the chemical sciences is significantly hindered by the difficulties researchers working at each scale have in accessing and translating the best available information and methods from the other scales. Very often there are "gaps" between scales which cannot be bridged at present, often because there is an unresolved technical or mathematical issue in addition to the pervasive lack of translation software and problems with connecting the mismatched data models used at each scale. Problems are particularly severe for complex systems involving combustion and pyrolysis chemistry. For example, simulations used to design high-efficiency, low-emission homogeneous-charge compression-ignition (HCCI) engines typically contain thousands of different chemical species and reactions. The engine designer running the macroscopic simulation is typically not an expert in chemistry -the macroscopic engine scale is quite complicated enough -so he or she needs all the important microscopic chemical details to be handled more or less automatically by software, and in a way that the chemistry models can be easily updated as additional information becomes available. All these microscopic chemistry details must be documented electronically in a way that is easy visible to the chemistry community, and these chemistry databases must be extensible, to make it practical to capture the benefits of the very large, but also very thinly spread (i.e. each chemist is expert in only a few types of molecules and reactions, under a limited range of conditions), expertise in the chemistry community. The numerical methods used by the engine designer were not designed to handle all this chemical detail, so intermediate preprocessing model-reduction software is needed to reduce the size of the chemical model. It is crucial that the approximation errors introduced in this step be properly controlled, so we do not lose significant accuracy in the final simulation results. Again, all the assumptions and calculations involved in this model-reduction process need to be documented, to facilitate future progress and to allow the engine model to be updated as more information on the combustion chemistry becomes available
Recommended from our members
Collaborating for Multi-Scale Chemical Science
Advanced model reduction methods were developed and integrated into the CMCS multiscale chemical science simulation software. The new technologies were used to simulate HCCI engines and burner flames with exceptional fidelity