3,902 research outputs found

    Towards information profiling: data lake content metadata management

    Get PDF
    There is currently a burst of Big Data (BD) processed and stored in huge raw data repositories, commonly called Data Lakes (DL). These BD require new techniques of data integration and schema alignment in order to make the data usable by its consumers and to discover the relationships linking their content. This can be provided by metadata services which discover and describe their content. However, there is currently a lack of a systematic approach for such kind of metadata discovery and management. Thus, we propose a framework for the profiling of informational content stored in the DL, which we call information profiling. The profiles are stored as metadata to support data analysis. We formally define a metadata management process which identifies the key activities required to effectively handle this.We demonstrate the alternative techniques and performance of our process using a prototype implementation handling a real-life case-study from the OpenML DL, which showcases the value and feasibility of our approach.Peer ReviewedPostprint (author's final draft

    Towards improving web service repositories through semantic web techniques

    Get PDF
    The success of the Web services technology has brought topicsas software reuse and discovery once again on the agenda of software engineers. While there are several efforts towards automating Web service discovery and composition, many developers still search for services via online Web service repositories and then combine them manually. However, from our analysis of these repositories, it yields that, unlike traditional software libraries, they rely on little metadata to support service discovery. We believe that the major cause is the difficulty of automatically deriving metadata that would describe rapidly changing Web service collections. In this paper, we discuss the major shortcomings of state of the art Web service repositories and, as a solution, we report on ongoing work and ideas on how to use techniques developed in the context of the Semantic Web (ontology learning, mapping, metadata based presentation) to improve the current situation

    BlogForever D2.6: Data Extraction Methodology

    Get PDF
    This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform

    Automatic annotation of bioinformatics workflows with biomedical ontologies

    Full text link
    Legacy scientific workflows, and the services within them, often present scarce and unstructured (i.e. textual) descriptions. This makes it difficult to find, share and reuse them, thus dramatically reducing their value to the community. This paper presents an approach to annotating workflows and their subcomponents with ontology terms, in an attempt to describe these artifacts in a structured way. Despite a dearth of even textual descriptions, we automatically annotated 530 myExperiment bioinformatics-related workflows, including more than 2600 workflow-associated services, with relevant ontological terms. Quantitative evaluation of the Information Content of these terms suggests that, in cases where annotation was possible at all, the annotation quality was comparable to manually curated bioinformatics resources.Comment: 6th International Symposium on Leveraging Applications (ISoLA 2014 conference), 15 pages, 4 figure

    Development of an ontology for aerospace engine components degradation in service

    Get PDF
    This paper presents the development of an ontology for component service degradation. In this paper, degradation mechanisms in gas turbine metallic components are used for a case study to explain how a taxonomy within an ontology can be validated. The validation method used in this paper uses an iterative process and sanity checks. Data extracted from on-demand textual information are filtered and grouped into classes of degradation mechanisms. Various concepts are systematically and hierarchically arranged for use in the service maintenance ontology. The allocation of the mechanisms to the AS-IS ontology presents a robust data collection hub. Data integrity is guaranteed when the TO-BE ontology is introduced to analyse processes relative to various failure events. The initial evaluation reveals improvement in the performance of the TO-BE domain ontology based on iterations and updates with recognised mechanisms. The information extracted and collected is required to improve service k nowledge and performance feedback which are important for service engineers. Existing research areas such as natural language processing, knowledge management, and information extraction were also examined

    Semantic Technologies for Manuscript Descriptions — Concepts and Visions

    Get PDF
    The contribution at hand relates recent developments in the area of the World Wide Web to codicological research. In the last number of years, an informational extension of the internet has been discussed and extensively researched: the Semantic Web. It has already been applied in many areas, including digital information processing of cultural heritage data. The Semantic Web facilitates the organisation and linking of data across websites, according to a given semantic structure. Software can then process this structural and semantic information to extract further knowledge. In the area of codicological research, many institutions are making efforts to improve the online availability of handwritten codices. If these resources could also employ Semantic Web techniques, considerable research potential could be unleashed. However, data acquisition from less structured data sources will be problematic. In particular, data stemming from unstructured sources needs to be made accessible to SemanticWeb tools through information extraction techniques. In the area of museum research, the CIDOC Conceptual Reference Model (CRM) has been widely examined and is being adopted successfully. The CRM translates well to Semantic Web research, and its concentration on contextualization of objects could support approaches in codicological research. Further concepts for the creation and management of bibliographic coherences and structured vocabularies related to the CRM will be considered in this chapter. Finally, a user scenario showing all processing steps in their context will be elaborated on

    Ontology-Based Recommendation of Editorial Products

    Get PDF
    Major academic publishers need to be able to analyse their vast catalogue of products and select the best items to be marketed in scientific venues. This is a complex exercise that requires characterising with a high precision the topics of thousands of books and matching them with the interests of the relevant communities. In Springer Nature, this task has been traditionally handled manually by publishing editors. However, the rapid growth in the number of scientific publications and the dynamic nature of the Computer Science landscape has made this solution increasingly inefficient. We have addressed this issue by creating Smart Book Recommender (SBR), an ontology-based recommender system developed by The Open University (OU) in collaboration with Springer Nature, which supports their Computer Science editorial team in selecting the products to market at specific venues. SBR recommends books, journals, and conference proceedings relevant to a conference by taking advantage of a semantically enhanced representation of about 27K editorial products. This is based on the Computer Science Ontology, a very large-scale, automatically generated taxonomy of research areas. SBR also allows users to investigate why a certain publication was suggested by the system. It does so by means of an interactive graph view that displays the topic taxonomy of the recommended editorial product and compares it with the topic-centric characterization of the input conference. An evaluation carried out with seven Springer Nature editors and seven OU researchers has confirmed the effectiveness of the solution

    Nanoinformatics: developing new computing applications for nanomedicine

    Get PDF
    Nanoinformatics has recently emerged to address the need of computing applications at the nano level. In this regard, the authors have participated in various initiatives to identify its concepts, foundations and challenges. While nanomaterials open up the possibility for developing new devices in many industrial and scientific areas, they also offer breakthrough perspectives for the prevention, diagnosis and treatment of diseases. In this paper, we analyze the different aspects of nanoinformatics and suggest five research topics to help catalyze new research and development in the area, particularly focused on nanomedicine. We also encompass the use of informatics to further the biological and clinical applications of basic research in nanoscience and nanotechnology, and the related concept of an extended ?nanotype? to coalesce information related to nanoparticles. We suggest how nanoinformatics could accelerate developments in nanomedicine, similarly to what happened with the Human Genome and other -omics projects, on issues like exchanging modeling and simulation methods and tools, linking toxicity information to clinical and personal databases or developing new approaches for scientific ontologies, among many others
    • 

    corecore