19 research outputs found
General pilot model and use case definition
This report describes the concepts and elements of the General Model of E-ARK pilot site activities
Open archival information systems for database preservation
Tese de mestrado integrado. Engenharia Informática e Computação. Universidade do Porto. Faculdade de Engenharia. 201
Recommended Practices and Final Public Report on Pilots
This report summarizes pilot activities, achievements and best practice recommendations using the following chapter structure:
Chapter 1 - This introductory chapter.
Chapter 2 - Planning and executing the E-ARK pilots Summary of all pilot related activities in the 3 years of the pilot, from planning to evaluation.
Chapter 3 - Pilot overview A brief overview of the full-scale and additional pilots.
Chapter 4 - Pilot report Summary of the pilot execution and results with recommended practices and further development recommendations. The chapter consists of the following sections for each full-scale pilot:
Pilot scenario details
Execution report
Changes to previous plans
Feedback report, and
Recommended practices and lessons learnt.
Chapter 4 ends with an overview of the external evaluations performed by non-EARK member organizations.
Chapter 5 - Pilot evaluation Evaluation of the full-scale pilot against project objectives and success criteria.
Chapter 6 - Referenced documents and web pages
Appendix 1 – Extract from E-ARK Description of Wor
E‐ARK Search, Access and Display Interfaces
The aim of this report is to describe the “Search, Access and Display Interfaces” that have been developed in the Access component of the E-ARK project.
The deliverable associated with this report is mainly a software deliverable and therefore this document provides only underpinning descriptions of and links to the software itself.
The tools that are described and provided allow Consumers (ie. end-users and archivists) to:
1. Search and order records (primarily end-users, but also archivists)
2. Manage orders of records and manage the records themselves, including the AIP to DIP conversion (archivists only)
3. Access ordered records as DIPs (primarily end-users, but also archivists)
In addition to the the introductory remarks in chapter 2, the functionality of the tools that allow the Consumers to search, manage, and access records is described in chapter 3. After the description of each tool, links are provided to code and documentation
Report on the “Digital Preservation - The Planets Way” Workshop
A report on the Planets a “Digital Preservation – The Planets Way” workshop, which took place on June 22-24, 2009 at the Royal Library, Copenhagen, Denmark. The workshop brought together representatives from archives, libraries, museums, academia, media and other institutions to consider the activities necessary to maintain content in the long-term and establish the methodologies and software tools developed by the EU-funded PLANETS project as a potential solution for preservation concerns. The event was the first of a series of three-day workshops that the Planets (Preservation and Long-term Access through NETworked Services) project is organizing across Europe during 2009-2010
Relational databases digital preservation
Tese doutoramento - Programa Doutoral em InformáticaWith the expansion and growth of information technologies, much of human knowledge
is now recorded on digital media. It began in the 20th century, it has been
occurring continuously and it seems that there is no turning back. This paradigm
brings scenarios where humans need mediators to understand digital information
{ computer platforms. These platforms are constantly changing and evolving and
nothing can guarantee the continuity of access to digital artifacts in their absence.
A new problem in the digital universe arises: Digital Preservation. There are huge
volumes of information stored digitally and there are also a panoply of di erent
classes, formats and types of digital objects. Our work addresses the problematic
Digital Preservation and focuses on the logic and conceptual models within a speci
c class of digital objects: Relational Databases. This family of digital objects is
used by organizations to record their data produced on daily basis by information
systems at operational levels or others. This structures are complex and the relational
databases software support may di er from one organization to another. It
can be proprietary, free or open source.
Previously, a neutral format { Database Markup Language (DBML) { was
adopted to pursue the goal of platform independence and to achieve standardization
concerning the format in the digital preservation of relational databases. This format
is able to describe both data and structure (logical model). The key strategies
we are adopting are migration and normalization with refreshment. From our rst
approach, we evolved the work to address the preservation of relational databases
and we focused on the conceptual model of the database. The conceptual model of
the database corresponds to the ideas and concepts that in the basis of the designed
and/or modeled database, conceived to support a certain information system. We
are referring to the semantics of the database and considering it as an important
preservation "property". For the representation of this higher layer of abstraction present in databases
we use an ontology based approach. At this higher abstraction level exists inherent
Knowledge associated to the database semantics that we tentatively represent
using Web Ontology Language (OWL). From the initial prototype, we developed a framework (supported by case studies) and establish a mapping algorithm for the
conversion between the database and OWL. The ontology approach is adopted to
formalize the knowledge associated to the conceptual model of the database and
also a methodology to create an abstract representation of it. The system is based
on the functional axes (ingestion, administration, dissemination and preservation)
of the Open Archival Information System (OAIS) reference model and its information
packages, where we include the two levels/layers of abstraction within the
digital objects that are the subject of our research: Relational Databases.
The framework o ers a set of web interfaces where it is possible to migrate
a database into normalized and neutral formats (DBML + OWL) and perform
some minor administration tasks on the repository. The system also enables the
navigation or browsing through the database (concepts) without loosing technical
details on the database relational model. The end consumers will have at their
disposal a broad overview of the preserved object: a) the lower level data and
structure of the relational database logical model and b) the higher level semantics
and knowledge of the database conceptual model!
Considering the unpredicted future access to a preserved database content and
structure, our preservation policy tries to capture the signi cant properties of
databases that should enable the future interpretability and understanding of the
digital object.Através do crescimento das tecnologias de informação, grande parte do conhecimento humano passou a ser armazenado em suportes digitais. Esta transformação iniciou-se no seculo XX, tem vido a ocorrer de forma contínua, e tudo indica que fora já ultrapassado o "ponto-sem-retorno". Este novo paradigma implica cenários substancialmente diferentes, cenários estes onde os seres humanos necessitam de mediadores para compreender a informação digital { plataformas computacionais.
Estas plataformas estão em constante evolução e não existe nada que nos possa garantir a continuidade de acesso aos artefactos digitais na sua ausência. Surge um novo problema associado ao mundo digital: Preservação Digital. Grandes quantidades de informação estão armazenadas digitalmente numa panóplia de diferentes classes, formatos e tipos. O nosso trabalho concentra-se na problemática da preservação digital, focando concretamente os modelos lógico e conceptual de uma classe específica de objectos digitais: as Bases de Dados Relacionais. Esta família de objectos digitais é amplamente usada pelas organizações para guardar os dados produzidos diariamente pelos seus sistemas de informação, tanto ao nível operacional como a outros níveis. Falamos de estruturas complexas em que os Sistemas Gestores de Bases de Dados que as suportam podem variar de organização para organização. Os sistemas podem ser proprietários, livres e ou de código aberto ("open source").
Inicialmente, um formato neutro { Database Markup Language (DBML) { foi adotado no sentido de garantir a independência de plataformas, e com o objectivo de conseguir estabelecer um formato normalizado para a preservação de bases de dados relacionais; isto tanto para os dados como para a estrutura (modelo lógico). As estratégias que adoptamos são a migração e normalização com refrescamento. A partir da abordagem inicial, evoluímos o nosso trabalho no que concerne à preservação digital de bases de dados relacionais, focando o estudo também no modelo conceptual da base de dados. O modelo conceptual corresponde às ideias e conceitos na base do desenho e/ou modelação de uma determinada base de dados, e concebido para dar suporte a um determinado cenário "real", i.e., a um determinado sistema de informação. Referimo-nos à semântica da base de dados considerando-a como uma importante "propriedade" na preservação.
Para a representação desta camada de abstração mais elevada que estão presente nas bases de dados, utilizamos uma abordagem baseada em ontologias. A este nível mais elevado de abstração existe informação e conhecimento intrínseco que estão associados à semântica da base de dados que se pretende representar através de Web Ontology Language (OWL). A partir do protótipo inicial, desenvolvemos uma plataforma aplicacional (suportada por casos de estudo) e estabelecemos um algoritmo de mapeamento para a conversão entre bases de dados e OWL. A abordagem através da ontologia foi adoptada para formalizar o conhecimento associado ao modelo conceptual da base de dados e também foi usada como uma metodologia para criar uma representação abstracta da base de dados. O sistema baseia-se nos eixos funcionais (ingestão, administração, disseminação e preservação) do modelo de referência Open Archival Information System (OAIS) assim como nos seus pacotes de informação (information packages) onde são incluídos dois níveis/camadas de abstração, relativamente aos objectos digitais que são objecto de preservação neste estudo: Bases de Dados Relacionais.
O sistema (framework) fornece um conjunto de interfaces web, onde é possível migrar a base de dados para formatos neutros e normalizados (DBML + OWL), e permitem também executar algumas tarefas de administração do repositório. O sistema possibilita ainda a navegação e pesquisa pelas bases de dados (conceitos), sem que se perca aspectos técnicos associados ao modelo relacional das mesmas. Os consumidores finais têm ao seu dispor uma visão global do objecto preservado: a) a um nível inferior os dados e estrutura do modelo relacional lógico e b) a um nível mais elevado a semântica e conhecimento associado ao modelo conceptual da base de dados!
Considerando a imprevisibilidade no acesso futuro ao conteúdo e estrutura de bases de dados preservadas, a nossa política de preservação pretende capturar as propriedades significativas das bases de dados capazes de possibilitar futuramente a interpretação e compreensão do objecto digital
Web Archive Services Framework for Tighter Integration Between the Past and Present Web
Web archives have contained the cultural history of the web for many years, but they still have a limited capability for access. Most of the web archiving research has focused on crawling and preservation activities, with little focus on the delivery methods. The current access methods are tightly coupled with web archive infrastructure, hard to replicate or integrate with other web archives, and do not cover all the users\u27 needs. In this dissertation, we focus on the access methods for archived web data to enable users, third-party developers, researchers, and others to gain knowledge from the web archives. We build ArcSys, a new service framework that extracts, preserves, and exposes APIs for the web archive corpus. The dissertation introduces a novel categorization technique to divide the archived corpus into four levels. For each level, we will propose suitable services and APIs that enable both users and third-party developers to build new interfaces. The first level is the content level that extracts the content from the archived web data. We develop ArcContent to expose the web archive content processed through various filters. The second level is the metadata level; we extract the metadata from the archived web data and make it available to users. We implement two services, ArcLink for temporal web graph and ArcThumb for optimizing the thumbnail creation in the web archives. The third level is the URI level that focuses on using the URI HTTP redirection status to enhance the user query. Finally, the highest level in the web archiving service framework pyramid is the archive level. In this level, we define the web archive by the characteristics of its corpus and building Web Archive Profiles. The profiles are used by the Memento Aggregator for query optimization