101 research outputs found

    PERICLES Deliverable 4.3:Content Semantics and Use Context Analysis Techniques

    Get PDF
    The current deliverable summarises the work conducted within task T4.3 of WP4, focusing on the extraction and the subsequent analysis of semantic information from digital content, which is imperative for its preservability. More specifically, the deliverable defines content semantic information from a visual and textual perspective, explains how this information can be exploited in long-term digital preservation and proposes novel approaches for extracting this information in a scalable manner. Additionally, the deliverable discusses novel techniques for retrieving and analysing the context of use of digital objects. Although this topic has not been extensively studied by existing literature, we believe use context is vital in augmenting the semantic information and maintaining the usability and preservability of the digital objects, as well as their ability to be accurately interpreted as initially intended.PERICLE

    Technologies and Applications for Big Data Value

    Get PDF
    This open access book explores cutting-edge solutions and best practices for big data and data-driven AI applications for the data-driven economy. It provides the reader with a basis for understanding how technical issues can be overcome to offer real-world solutions to major industrial areas. The book starts with an introductory chapter that provides an overview of the book by positioning the following chapters in terms of their contributions to technology frameworks which are key elements of the Big Data Value Public-Private Partnership and the upcoming Partnership on AI, Data and Robotics. The remainder of the book is then arranged in two parts. The first part “Technologies and Methods” contains horizontal contributions of technologies and methods that enable data value chains to be applied in any sector. The second part “Processes and Applications” details experience reports and lessons from using big data and data-driven approaches in processes and applications. Its chapters are co-authored with industry experts and cover domains including health, law, finance, retail, manufacturing, mobility, and smart cities. Contributions emanate from the Big Data Value Public-Private Partnership and the Big Data Value Association, which have acted as the European data community's nucleus to bring together businesses with leading researchers to harness the value of data to benefit society, business, science, and industry. The book is of interest to two primary audiences, first, undergraduate and postgraduate students and researchers in various fields, including big data, data science, data engineering, and machine learning and AI. Second, practitioners and industry experts engaged in data-driven systems, software design and deployment projects who are interested in employing these advanced methods to address real-world problems

    Platforms for deployment of scalable on- and off-line data analytics.

    Get PDF
    The ability to exploit the intelligence concealed in bulk data to generate actionable insights is increasingly providing competitive advantages to businesses, government agencies, and charitable organisations. The burgeoning field of Data Science, and its related applications in the field of Data Analytics, finds broader applicability with each passing year. This expansion of users and applications is matched by an explosion in tools, platforms, and techniques designed to exploit more types of data in larger volumes, with more techniques, and at higher frequencies than ever before. This diversity in platforms and tools presents a new challenge for organisations aiming to integrate Data Science into their daily operations. Designing an analytic for a particular platform necessarily involves “lock-in” to that specific implementation – there are few opportunities for algorithmic portability. It is increasingly challenging to find engineers with experience in the diverse suite of tools available as well as understanding the precise details of the domain in which they work: the semantics of the data, the nature of queries and analyses to be executed, and the interpretation and presentation of results. The work presented in this thesis addresses these challenges by introducing a number of techniques to facilitate the creation of analytics for equivalent deployment across a variety of runtime frameworks and capabilities. In the first instance, this capability is demonstrated using the first Domain Specific Language and associated runtime environments to target multiple best-in-class frameworks for data analysis from the streaming and off-line paradigms. This capability is extended with a new approach to modelling analytics based around a semantically rich type system. An analytic planner using this model is detailed, thus empowering domain experts to build their own scalable analyses, without any specific programming or distributed systems knowledge. This planning technique is used to assemble complex ensembles of hybrid analytics: automatically applying multiple frameworks in a single workflow. Finally, this thesis demonstrates a novel approach to the speculative construction, compilation, and deployment of analytic jobs based around the observation of user interactions with an analytic planning system

    Technologies and Applications for Big Data Value

    Get PDF
    This open access book explores cutting-edge solutions and best practices for big data and data-driven AI applications for the data-driven economy. It provides the reader with a basis for understanding how technical issues can be overcome to offer real-world solutions to major industrial areas. The book starts with an introductory chapter that provides an overview of the book by positioning the following chapters in terms of their contributions to technology frameworks which are key elements of the Big Data Value Public-Private Partnership and the upcoming Partnership on AI, Data and Robotics. The remainder of the book is then arranged in two parts. The first part “Technologies and Methods” contains horizontal contributions of technologies and methods that enable data value chains to be applied in any sector. The second part “Processes and Applications” details experience reports and lessons from using big data and data-driven approaches in processes and applications. Its chapters are co-authored with industry experts and cover domains including health, law, finance, retail, manufacturing, mobility, and smart cities. Contributions emanate from the Big Data Value Public-Private Partnership and the Big Data Value Association, which have acted as the European data community's nucleus to bring together businesses with leading researchers to harness the value of data to benefit society, business, science, and industry. The book is of interest to two primary audiences, first, undergraduate and postgraduate students and researchers in various fields, including big data, data science, data engineering, and machine learning and AI. Second, practitioners and industry experts engaged in data-driven systems, software design and deployment projects who are interested in employing these advanced methods to address real-world problems

    Adaptivity of 3D web content in web-based virtual museums : a quality of service and quality of experience perspective

    Get PDF
    The 3D Web emerged as an agglomeration of technologies that brought the third dimension to the World Wide Web. Its forms spanned from being systems with limited 3D capabilities to complete and complex Web-Based Virtual Worlds. The advent of the 3D Web provided great opportunities to museums by giving them an innovative medium to disseminate collections' information and associated interpretations in the form of digital artefacts, and virtual reconstructions thus leading to a new revolutionary way in cultural heritage curation, preservation and dissemination thereby reaching a wider audience. This audience consumes 3D Web material on a myriad of devices (mobile devices, tablets and personal computers) and network regimes (WiFi, 4G, 3G, etc.). Choreographing and presenting 3D Web components across all these heterogeneous platforms and network regimes present a significant challenge yet to overcome. The challenge is to achieve a good user Quality of Experience (QoE) across all these platforms. This means that different levels of fidelity of media may be appropriate. Therefore, servers hosting those media types need to adapt to the capabilities of a wide range of networks and devices. To achieve this, the research contributes the design and implementation of Hannibal, an adaptive QoS & QoE-aware engine that allows Web-Based Virtual Museums to deliver the best possible user experience across those platforms. In order to ensure effective adaptivity of 3D content, this research furthers the understanding of the 3D web in terms of Quality of Service (QoS) through empirical investigations studying how 3D Web components perform and what are their bottlenecks and in terms of QoE studying the subjective perception of fidelity of 3D Digital Heritage artefacts. Results of these experiments lead to the design and implementation of Hannibal

    Développement de méthodes d'intégration de données biologiques à l'aide d'Elasticsearch

    Get PDF
    En biologie, les donnĂ©es apparaissent Ă  toutes les Ă©tapes des projets, de la prĂ©paration des Ă©tudes Ă  la publication des rĂ©sultats. Toutefois, de nombreux aspects limitent leur utilisation. Le volume, la vitesse de production ainsi que la variĂ©tĂ© des donnĂ©es produites ont fait entrer la biologie dans une Ăšre dominĂ©e par le phĂ©nomĂšne des donnĂ©es massives. Depuis 1980 et afin d'organiser les donnĂ©es gĂ©nĂ©rĂ©es, la communautĂ© scientifique a produit de nombreux dĂ©pĂŽts de donnĂ©es. Ces dĂ©pĂŽts peuvent contenir des donnĂ©es de divers Ă©lĂ©ments biologiques par exemple les gĂšnes, les transcrits, les protĂ©ines et les mĂ©tabolites, mais aussi d'autres concepts comme les toxines, le vocabulaire biologique et les publications scientifiques. Stocker l'ensemble de ces donnĂ©es nĂ©cessite des infrastructures matĂ©rielles et logicielles robustes et pĂ©rennes. À ce jour, de par la diversitĂ© biologique et les architectures informatiques prĂ©sentes, il n'existe encore aucun dĂ©pĂŽt centralisĂ© contenant toutes les bases de donnĂ©es publiques en biologie. Les nombreux dĂ©pĂŽts existants sont dispersĂ©s et gĂ©nĂ©ralement autogĂ©rĂ©s par des Ă©quipes de recherche les ayant publiĂ©es. Avec l'Ă©volution rapide des technologies de l'information, les interfaces de partage de donnĂ©es ont, elles aussi, Ă©voluĂ©, passant de protocoles de transfert de fichiers Ă  des interfaces de requĂȘtes de donnĂ©es. En consĂ©quence, l'accĂšs Ă  l'ensemble des donnĂ©es dispersĂ©es sur les nombreux dĂ©pĂŽts est disparate. Cette diversitĂ© d'accĂšs nĂ©cessite l'appui d'outils d'automatisation pour la rĂ©cupĂ©ration de donnĂ©es. Lorsque plusieurs sources de donnĂ©es sont requises dans une Ă©tude, le cheminement des donnĂ©es suit diffĂ©rentes Ă©tapes. La premiĂšre est l'intĂ©gration de donnĂ©es, notamment en combinant de multiples sources de donnĂ©es sous une interface d'accĂšs unifiĂ©e. Viennent ensuite des exploitations diverses comme l'exploration au travers de scripts ou de visualisations, les transformations et les analyses. La littĂ©rature a montrĂ© de nombreuses initiatives de systĂšmes informatiques de partage et d'uniformisation de donnĂ©es. Toutefois, la complexitĂ© induite par ces multiples systĂšmes continue de contraindre la diffusion des donnĂ©es biologiques. En effet, la production toujours plus forte de donnĂ©es, leur gestion et les multiples aspects techniques font obstacle aux chercheurs qui veulent exploiter ces donnĂ©es et les mettre Ă  disposition. L'hypothĂšse testĂ©e pour cette thĂšse est que l'exploitation large des donnĂ©es pouvait ĂȘtre actualisĂ©e avec des outils et mĂ©thodes rĂ©cents, notamment un outil nommĂ© Elasticsearch. Cet outil devait permettre de combler les besoins dĂ©jĂ  identifiĂ©s dans la littĂ©rature, mais Ă©galement devait permettre d'ajouter des considĂ©rations plus rĂ©centes comme le partage facilitĂ© des donnĂ©es. La construction d'une architecture basĂ©e sur cet outil de gestion de donnĂ©es permet de les partager selon des standards d'interopĂ©rabilitĂ©. La diffusion des donnĂ©es selon ces standards peut ĂȘtre autant appliquĂ©e Ă  des opĂ©rations de fouille de donnĂ©es biologiques que pour de la transformation et de l'analyse de donnĂ©es. Les rĂ©sultats prĂ©sentĂ©s dans le cadre de ma thĂšse se basent sur des outils pouvant ĂȘtre utilisĂ©s par l'ensemble des chercheurs, en biologie mais aussi dans d'autres domaines. Il restera cependant Ă  les appliquer et Ă  les tester dans les divers autres domaines afin d'en identifier prĂ©cisĂ©ment les limites.In biology, data appear at all stages of projects, from study preparation to publication of results. However, many aspects limit their use. The volume, the speed of production and the variety of data produced have brought biology into an era dominated by the phenomenon of "Big Data" (or massive data). Since 1980 and in order to organize the generated data, the scientific community has produced numerous data repositories. These repositories can contain data of various biological elements such as genes, transcripts, proteins and metabolites, but also other concepts such as toxins, biological vocabulary and scientific publications. Storing all of this data requires robust and durable hardware and software infrastructures. To date, due to the diversity of biology and computer architectures present, there is no centralized repository containing all the public databases in biology. Many existing repositories are scattered and generally self-managed by research teams that have published them. With the rapid evolution of information technology, data sharing interfaces have also evolved from file transfer protocols to data query interfaces. As a result, access to data set dispersed across the many repositories is disparate. This diversity of access requires the support of automation tools for data retrieval. When multiple data sources are required in a study, the data flow follows several steps, first of which is data integration, combining multiple data sources under a unified access interface. It is followed by various exploitations such as exploration through scripts or visualizations, transformations and analyses. The literature has shown numerous initiatives of computerized systems for sharing and standardizing data. However, the complexity induced by these multiple systems continues to constrain the dissemination of biological data. Indeed, the ever-increasing production of data, its management and multiple technical aspects hinder researchers who want to exploit these data and make them available. The hypothesis tested for this thesis is that the wide exploitation of data can be updated with recent tools and methods, in particular a tool named Elasticsearch. This tool should fill the needs already identified in the literature, but also should allow adding more recent considerations, such as easy data sharing. The construction of an architecture based on this data management tool allows sharing data according to interoperability standards. Data dissemination according to these standards can be applied to biological data mining operations as well as to data transformation and analysis. The results presented in my thesis are based on tools that can be used by all researchers, in biology but also in other fields. However, applying and testing them in various other fields remains to be studied in order to identify more precisely their limits

    Ontology Ranking: Finding the Right Ontologies on the Web

    No full text
    Ontology search, which is the process of finding ontologies or ontological terms for users’ defined queries from an ontology collection, is an important task to facilitate ontology reuse of ontology engineering. Ontology reuse is desired to avoid the tedious process of building an ontology from scratch and to limit the design of several competing ontologies that represent similar knowledge. Since many organisations in both the private and public sectors are publishing their data in RDF, they increasingly require to find or design ontologies for data annotation and/or integration. In general, there exist multiple ontologies representing a domain, therefore, finding the best matching ontologies or their terms is required to facilitate manual or dynamic ontology selection for both ontology design and data annotation. The ranking is a crucial component in the ontology retrieval process which aims at listing the ‘relevant0 ontologies or their terms as high as possible in the search results to reduce the human intervention. Most existing ontology ranking techniques inherit one or more information retrieval ranking parameter(s). They linearly combine the values of these parameters for each ontology to compute the relevance score against a user query and rank the results in descending order of the relevance score. A significant aspect of achieving an effective ontology ranking model is to develop novel metrics and dynamic techniques that can optimise the relevance score of the most relevant ontology for a user query. In this thesis, we present extensive research in ontology retrieval and ranking, where several research gaps in the existing literature are identified and addressed. First, we begin the thesis with a review of the literature and propose a taxonomy of Semantic Web data (i.e., ontologies and linked data) retrieval approaches. That allows us to identify potential research directions in the field. In the remainder of the thesis, we address several of the identified shortcomings in the ontology retrieval domain. We develop a framework for the empirical and comparative evaluation of different ontology ranking solutions, which has not been studied in the literature so far. Second, we propose an effective relationship-based concept retrieval framework and a concept ranking model through the use of learning to rank approach which addresses the limitation of the existing linear ranking models. Third, we propose RecOn, a framework that helps users in finding the best matching ontologies to a multi-keyword query. There the relevance score of an ontology to the query is computed by formulating and solving the ontology recommendation problem as a linear and an optimisation problem. Finally, the thesis also reports on an extensive comparative evaluation of our proposed solutions with several other state-of-the-art techniques using real-world ontologies. This thesis will be useful for researchers and practitioners interested in ontology search, for methods and performance benchmark on ranking approaches to ontology search

    Acta Polytechnica Hungarica 2017

    Get PDF
    • 

    corecore