5,095 research outputs found

    Extracting, Transforming and Archiving Scientific Data

    Get PDF
    It is becoming common to archive research datasets that are not only large but also numerous. In addition, their corresponding metadata and the software required to analyse or display them need to be archived. Yet the manual curation of research data can be difficult and expensive, particularly in very large digital repositories, hence the importance of models and tools for automating digital curation tasks. The automation of these tasks faces three major challenges: (1) research data and data sources are highly heterogeneous, (2) future research needs are difficult to anticipate, (3) data is hard to index. To address these problems, we propose the Extract, Transform and Archive (ETA) model for managing and mechanizing the curation of research data. Specifically, we propose a scalable strategy for addressing the research-data problem, ranging from the extraction of legacy data to its long-term storage. We review some existing solutions and propose novel avenues of research.Comment: 8 pages, Fourth Workshop on Very Large Digital Libraries, 201

    Web Data Extraction, Applications and Techniques: A Survey

    Full text link
    Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction. This survey aims at providing a structured and comprehensive overview of the literature in the field of Web Data Extraction. We provided a simple classification framework in which existing Web Data Extraction applications are grouped into two main classes, namely applications at the Enterprise level and at the Social Web level. At the Enterprise level, Web Data Extraction techniques emerge as a key tool to perform data analysis in Business and Competitive Intelligence systems as well as for business process re-engineering. At the Social Web level, Web Data Extraction techniques allow to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users and this offers unprecedented opportunities to analyze human behavior at a very large scale. We discuss also the potential of cross-fertilization, i.e., on the possibility of re-using Web Data Extraction techniques originally designed to work in a given domain, in other domains.Comment: Knowledge-based System

    Invest to Save: Report and Recommendations of the NSF-DELOS Working Group on Digital Archiving and Preservation

    Get PDF
    Digital archiving and preservation are important areas for research and development, but there is no agreed upon set of priorities or coherent plan for research in this area. Research projects in this area tend to be small and driven by particular institutional problems or concerns. As a consequence, proposed solutions from experimental projects and prototypes tend not to scale to millions of digital objects, nor do the results from disparate projects readily build on each other. It is also unclear whether it is worthwhile to seek general solutions or whether different strategies are needed for different types of digital objects and collections. The lack of coordination in both research and development means that there are some areas where researchers are reinventing the wheel while other areas are neglected. Digital archiving and preservation is an area that will benefit from an exercise in analysis, priority setting, and planning for future research. The WG aims to survey current research activities, identify gaps, and develop a white paper proposing future research directions in the area of digital preservation. Some of the potential areas for research include repository architectures and inter-operability among digital archives; automated tools for capture, ingest, and normalization of digital objects; and harmonization of preservation formats and metadata. There can also be opportunities for development of commercial products in the areas of mass storage systems, repositories and repository management systems, and data management software and tools.

    The Reality of the Application of e-DMS in Governmental Institutions - an Empirical Study on the PPA

    Get PDF
    The research aims to identify the status of the application of electronic document management system in governmental institutions – the study was applied on the Palestinian Pension Agency. The population of this study is composed of all employees in the Palestinian Pension Agency. In order to achieve the objectives of the study, the researchers used the descriptive and analytical approach, through which try to describe the phenomenon of the subject of the study, analyze the data and the relationship between the components and the views put around it. Census method was used due to the small size of the study population and ease of access to the target group. (108) questionnaires were distributed to all members of the study population, were (65) employees in the Gaza Strip and (43) employees in the West Bank. All questionnaires were recovered. The study found the following results: There were no statistically significant differences in the members of the population in response to differences in the study about the reality of the application of electronic document management system in governmental institutions - case study on the Palestinian Pension Authority due to the age. There are no statistically significant differences in population members in response to the reality of the application of electronic document management system in governmental institutions - case Study on the Palestinian Pension Authority due to the variable nature of the job. As well as there are no statistically significant differences in the members of the population in response to the study about the reality of the application of electronic document management system in governmental institutions - case study on the Palestinian Pension Authority due to the variable of specialization. There are statistically significant differences in the study about the reality of the application of electronic document management system in governmental institutions - case study on the Palestinian Pension Authority due to Qualification variable for the benefit of members of the population study who are holding a Bachelor degree. There are statistically significant differences in the study about the reality of the application of electronic document management system in governmental institutions – case study on the Palestinian Pension Authority due to the variable number of years of experience for the benefit of members of the study population who have experience between 11-15 years. The study found a group of recommendations, including: the need to focus on the establishment of a general management of electronic documents in the organization structure that takes care of all the technical processes in it an contains scientifically qualified persons in the field of electronic document management. The need is for the attention in developing strategic plans, policies and mechanisms of action commensurate with the electronic document management system

    Harnessing data flow and modelling potentials for sustainable development

    Get PDF
    Tackling some of the global challenges relating to health, poverty, business and the environment is known to be heavily dependent on the flow and utilisation of data. However, while enhancements in data generation, storage, modelling, dissemination and the related integration of global economies and societies are fast transforming the way we live and interact, the resulting dynamic, globalised and information society remains digitally divided. On the African continent, in particular, the division has resulted into a gap between knowledge generation and its transformation into tangible products and services which Kirsop and Chan (2005) attribute to a broken information flow. This paper proposes some fundamental approaches for a sustainable transformation of data into knowledge for the purpose of improving the peoples' quality of life. Its main strategy is based on a generic data sharing model providing access to data utilising and generating entities in a multi disciplinary environment. It highlights the great potentials in using unsupervised and supervised modelling in tackling the typically predictive-in-nature challenges we face. Using both simulated and real data, the paper demonstrates how some of the key parameters may be generated and embedded in models to enhance their predictive power and reliability. Its main outcomes include a proposed implementation framework setting the scene for the creation of decision support systems capable of addressing the key issues in society. It is expected that a sustainable data flow will forge synergies between the private sector, academic and research institutions within and between countries. It is also expected that the paper's findings will help in the design and development of knowledge extraction from data in the wake of cloud computing and, hence, contribute towards the improvement in the peoples' overall quality of life. To void running high implementation costs, selected open source tools are recommended for developing and sustaining the system. Key words: Cloud Computing, Data Mining, Digital Divide, Globalisation, Grid Computing, Information Society, KTP, Predictive Modelling and STI

    Communication and re-use of chemical information in bioscience.

    Get PDF
    The current methods of publishing chemical information in bioscience articles are analysed. Using 3 papers as use-cases, it is shown that conventional methods using human procedures, including cut-and-paste are time-consuming and introduce errors. The meaning of chemical terms and the identity of compounds is often ambiguous. valuable experimental data such as spectra and computational results are almost always omitted. We describe an Open XML architecture at proof-of-concept which addresses these concerns. Compounds are identified through explicit connection tables or links to persistent Open resources such as PubChem. It is argued that if publishers adopt these tools and protocols, then the quality and quantity of chemical information available to bioscientists will increase and the authors, publishers and readers will find the process cost-effective.An article submitted to BiomedCentral Bioinformatics, created on request with their Publicon system. The transformed manuscript is archived as PDF. Although it has been through the publishers system this is purely automatic and the contents are those of a pre-refereed preprint. The formatting is provided by the system and tables and figures appear at the end. An accommpanying submission, http://www.dspace.cam.ac.uk/handle/1810/34580, describes the rationale and cultural aspects of publishing , abstracting and aggregating chemical information. BMC is an Open Access publisher and we emphasize that all content is re-usable under Creative Commons Licens

    Secondary Analysis of Archived Data

    Get PDF

    Nanoinformatics: developing new computing applications for nanomedicine

    Get PDF
    Nanoinformatics has recently emerged to address the need of computing applications at the nano level. In this regard, the authors have participated in various initiatives to identify its concepts, foundations and challenges. While nanomaterials open up the possibility for developing new devices in many industrial and scientific areas, they also offer breakthrough perspectives for the prevention, diagnosis and treatment of diseases. In this paper, we analyze the different aspects of nanoinformatics and suggest five research topics to help catalyze new research and development in the area, particularly focused on nanomedicine. We also encompass the use of informatics to further the biological and clinical applications of basic research in nanoscience and nanotechnology, and the related concept of an extended ?nanotype? to coalesce information related to nanoparticles. We suggest how nanoinformatics could accelerate developments in nanomedicine, similarly to what happened with the Human Genome and other -omics projects, on issues like exchanging modeling and simulation methods and tools, linking toxicity information to clinical and personal databases or developing new approaches for scientific ontologies, among many others
    • …
    corecore