Search CORE

5,095 research outputs found

Extracting, Transforming and Archiving Scientific Data

Author: Lemire Daniel
Vellino Andre
Publication venue
Publication date: 01/03/2011
Field of study

It is becoming common to archive research datasets that are not only large but also numerous. In addition, their corresponding metadata and the software required to analyse or display them need to be archived. Yet the manual curation of research data can be difficult and expensive, particularly in very large digital repositories, hence the importance of models and tools for automating digital curation tasks. The automation of these tasks faces three major challenges: (1) research data and data sources are highly heterogeneous, (2) future research needs are difficult to anticipate, (3) data is hard to index. To address these problems, we propose the Extract, Transform and Archive (ETA) model for managing and mechanizing the curation of research data. Specifically, we propose a scalable strategy for addressing the research-data problem, ranging from the extraction of legacy data to its long-term storage. We review some existing solutions and propose novel avenues of research.Comment: 8 pages, Fourth Workshop on Very Large Digital Libraries, 201

arXiv.org e-Print Archive

R-libre

Web Data Extraction, Applications and Techniques: A Survey

Author: Abel
Amalfitano
Balduzzi
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Berger
Berthold
Bettencourt
Califf
Catanese
Chang
Chen
Chen
Chen
Collins
Conover
Crandall
Crescenzi
Crescenzi
Dalvi
Dalvi
De Meo
De Meo
Doan
Emilio Ferrara
Ferrara
Ferrara
Ferrara
Ferrara
Ferrara
Flesca
Freitag
Furche
Gatterbauer
Gatterbauer
Giacomo Fiumara
Gjoka
Gkotsis
Gottlob
Gottlob
Hammersley
Han
Hecht
Hsu
Irmak
Khare
Kim
Kinsella
Kleinberg
Kleinberg
Kohlschütter
Kokkoras
Kokkoras
Kokkoras
Krüpl
Kushmerick
Kwak
Laender
Liu
Manning
Masanès
Mathes
Meng
Mislove
Monge
Muslea
Oro
Pan
Pasquale De Meo
Perito
Phan
Plake
Rahm
Rahm
Reis
Robert Baumgartner
Sahuguet
Sarawagi
Schifanella
Selkow
Shi
Soderland
Szomszor
Turmo
Vosecky
Wang
Wang
Weikum
Wilson
Winograd
Yang
Ye
Zafarani
Zanasi
Zhai
Zhang
Zhang
Publication venue: 'Elsevier BV'
Publication date: 09/06/2014
Field of study

Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction. This survey aims at providing a structured and comprehensive overview of the literature in the field of Web Data Extraction. We provided a simple classification framework in which existing Web Data Extraction applications are grouped into two main classes, namely applications at the Enterprise level and at the Social Web level. At the Enterprise level, Web Data Extraction techniques emerge as a key tool to perform data analysis in Business and Competitive Intelligence systems as well as for business process re-engineering. At the Social Web level, Web Data Extraction techniques allow to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users and this offers unprecedented opportunities to analyze human behavior at a very large scale. We discuss also the potential of cross-fertilization, i.e., on the possibility of re-using Web Data Extraction techniques originally designed to work in a given domain, in other domains.Comment: Knowledge-based System

arXiv.org e-Print Archive

Crossref

Invest to Save: Report and Recommendations of the NSF-DELOS Working Group on Digital Archiving and Preservation

Author: Ashley K.
Christensen-Dalsgaard B.
Duff W.
Gladney H.
Hedstrom M.
Huc C.
Kenney A. R.
Moore R.
Neuhold E.
Ross S.
Publication venue
Publication date: 01/01/2003
Field of study

Digital archiving and preservation are important areas for research and development, but there is no agreed upon set of priorities or coherent plan for research in this area. Research projects in this area tend to be small and driven by particular institutional problems or concerns. As a consequence, proposed solutions from experimental projects and prototypes tend not to scale to millions of digital objects, nor do the results from disparate projects readily build on each other. It is also unclear whether it is worthwhile to seek general solutions or whether different strategies are needed for different types of digital objects and collections. The lack of coordination in both research and development means that there are some areas where researchers are reinventing the wheel while other areas are neglected. Digital archiving and preservation is an area that will benefit from an exercise in analysis, priority setting, and planning for future research. The WG aims to survey current research activities, identify gaps, and develop a white paper proposing future research directions in the area of digital preservation. Some of the potential areas for research include repository architectures and inter-operability among digital archives; automated tools for capture, ingest, and normalization of digital objects; and harmonization of preservation formats and metadata. There can also be opportunities for development of commercial products in the areas of mass storage systems, repositories and repository management systems, and data management software and tools.

The Reality of the Application of e-DMS in Governmental Institutions - an Empirical Study on the PPA

Author: Abu-Naser Samy S.
Al-Shobaki Mazen J.
Kassab Mohammed Khair I.
Publication venue
Publication date: 01/01/2017
Field of study

The research aims to identify the status of the application of electronic document management system in governmental institutions – the study was applied on the Palestinian Pension Agency. The population of this study is composed of all employees in the Palestinian Pension Agency. In order to achieve the objectives of the study, the researchers used the descriptive and analytical approach, through which try to describe the phenomenon of the subject of the study, analyze the data and the relationship between the components and the views put around it. Census method was used due to the small size of the study population and ease of access to the target group. (108) questionnaires were distributed to all members of the study population, were (65) employees in the Gaza Strip and (43) employees in the West Bank. All questionnaires were recovered. The study found the following results: There were no statistically significant differences in the members of the population in response to differences in the study about the reality of the application of electronic document management system in governmental institutions - case study on the Palestinian Pension Authority due to the age. There are no statistically significant differences in population members in response to the reality of the application of electronic document management system in governmental institutions - case Study on the Palestinian Pension Authority due to the variable nature of the job. As well as there are no statistically significant differences in the members of the population in response to the study about the reality of the application of electronic document management system in governmental institutions - case study on the Palestinian Pension Authority due to the variable of specialization. There are statistically significant differences in the study about the reality of the application of electronic document management system in governmental institutions - case study on the Palestinian Pension Authority due to Qualification variable for the benefit of members of the population study who are holding a Bachelor degree. There are statistically significant differences in the study about the reality of the application of electronic document management system in governmental institutions – case study on the Palestinian Pension Authority due to the variable number of years of experience for the benefit of members of the study population who have experience between 11-15 years. The study found a group of recommendations, including: the need to focus on the establishment of a general management of electronic documents in the organization structure that takes care of all the technical processes in it an contains scientifically qualified persons in the field of electronic document management. The need is for the attention in developing strategic plans, policies and mechanisms of action commensurate with the electronic document management system

PhilPapers

Harnessing data flow and modelling potentials for sustainable development

Author: Bugrien Jamal
Mwitondi Kassim
Publication venue: 'Codata'
Publication date: 01/01/2012
Field of study

Tackling some of the global challenges relating to health, poverty, business and the environment is known to be heavily dependent on the flow and utilisation of data. However, while enhancements in data generation, storage, modelling, dissemination and the related integration of global economies and societies are fast transforming the way we live and interact, the resulting dynamic, globalised and information society remains digitally divided. On the African continent, in particular, the division has resulted into a gap between knowledge generation and its transformation into tangible products and services which Kirsop and Chan (2005) attribute to a broken information flow. This paper proposes some fundamental approaches for a sustainable transformation of data into knowledge for the purpose of improving the peoples' quality of life. Its main strategy is based on a generic data sharing model providing access to data utilising and generating entities in a multi disciplinary environment. It highlights the great potentials in using unsupervised and supervised modelling in tackling the typically predictive-in-nature challenges we face. Using both simulated and real data, the paper demonstrates how some of the key parameters may be generated and embedded in models to enhance their predictive power and reliability. Its main outcomes include a proposed implementation framework setting the scene for the creation of decision support systems capable of addressing the key issues in society. It is expected that a sustainable data flow will forge synergies between the private sector, academic and research institutions within and between countries. It is also expected that the paper's findings will help in the design and development of knowledge extraction from data in the wake of cloud computing and, hence, contribute towards the improvement in the peoples' overall quality of life. To void running high implementation costs, selected open source tools are recommended for developing and sustaining the system. Key words: Cloud Computing, Data Mining, Digital Divide, Globalisation, Grid Computing, Information Society, KTP, Predictive Modelling and STI

Crossref

Directory of Open Access Journals

Sheffield Hallam University Research Archive

Communication and re-use of chemical information in bioscience.

Author: Mitchell John BO
Murray-Rust Peter
Rzepa Henry S
Publication venue: BMC Bioinformatics
Publication date: 18/07/2005
Field of study

The current methods of publishing chemical information in bioscience articles are analysed. Using 3 papers as use-cases, it is shown that conventional methods using human procedures, including cut-and-paste are time-consuming and introduce errors. The meaning of chemical terms and the identity of compounds is often ambiguous. valuable experimental data such as spectra and computational results are almost always omitted. We describe an Open XML architecture at proof-of-concept which addresses these concerns. Compounds are identified through explicit connection tables or links to persistent Open resources such as PubChem. It is argued that if publishers adopt these tools and protocols, then the quality and quantity of chemical information available to bioscientists will increase and the authors, publishers and readers will find the process cost-effective.An article submitted to BiomedCentral Bioinformatics, created on request with their Publicon system. The transformed manuscript is archived as PDF. Although it has been through the publishers system this is purely automatic and the contents are those of a pre-refereed preprint. The formatting is provided by the system and tables and figures appear at the end. An accommpanying submission, http://www.dspace.cam.ac.uk/handle/1810/34580, describes the rationale and cultural aspects of publishing , abstracting and aggregating chemical information. BMC is an Open Access publisher and we emphasize that all content is re-usable under Creative Commons Licens

Springer - Publisher Connector

PubMed Central

Spiral - Imperial College Digital Repository

Apollo (Cambridge)

University of St. Andrews - Pure

St Andrews Research Repository

Secondary Analysis of Archived Data

Author: Corti Louise
Thompson Paul
Publication venue: 'Academy of Traumatology'
Publication date: 01/01/2004
Field of study

University of Essex Research Repository

Crossref

Nanoinformatics: developing new computing applications for nanomedicine

Author: Alberto Anguita
Alejandro Pazos
Antoine Geissbuhler
B Smith
BY Kim
C Kulikowski
C Rosse
CA Kulikowski
Casimir Kulikowski
Cristian Munteanu
D Dela Iglesia
David Perez-Rey
DG Thomas
Diana De la Iglesia
ED Green
F Martin-Sanchez
Fernando Gonzalez-Nilo
Fernando Martin-Sanchez
Ferran Sanz
George Potamias
Guillermo De la Calle
Guillermo Lopez-Campos
H Berman
IS Kohane
Isabel Hermosilla
Jose Crespo
Jose Maria Barreiro
Josipa Kern
Joyce A. Mitchell
Julio C. Facelli
K Jain
Luciano Milanesi
M Gerstein
M Viceconti
Martin Fritts
Miguel Garcia-Remesal
N Gordon
NA Baker
Nathan Baker
Norbert Graf
P Kiberstis
Paula Otero
Peter Ghazal
Pierre Grangeat
Rada Hussein
Raul E. Cachau
RB Altman
S Bewick
Sabine Koch
SI O’Donoghue
Sonia E. Benitez
V Maojo
V Maojo
V Maojo
V Maojo
Vassilis Moustakis
Victor Maojo
Victoria Lopez-Alonso
Yannick Legre
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Nanoinformatics has recently emerged to address the need of computing applications at the nano level. In this regard, the authors have participated in various initiatives to identify its concepts, foundations and challenges. While nanomaterials open up the possibility for developing new devices in many industrial and scientific areas, they also offer breakthrough perspectives for the prevention, diagnosis and treatment of diseases. In this paper, we analyze the different aspects of nanoinformatics and suggest five research topics to help catalyze new research and development in the area, particularly focused on nanomedicine. We also encompass the use of informatics to further the biological and clinical applications of basic research in nanoscience and nanotechnology, and the related concept of an extended ?nanotype? to coalesce information related to nanoparticles. We suggest how nanoinformatics could accelerate developments in nanomedicine, similarly to what happened with the Human Genome and other -omics projects, on issues like exchanging modeling and simulation methods and tools, linking toxicity information to clinical and personal databases or developing new approaches for scientific ontologies, among many others

Repositorio da Universidade da Coruña

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Online Research @ Cardiff

Springer - Publisher Connector

DSpace Universidad de Talca

PubMed Central

Edinburgh Research Explorer

Archivo Digital UPM

Archive ouverte UNIGE