296 research outputs found
Algorithmes passant aÌ lâeÌchelle pour la gestion de donneÌes du Web seÌmantique sur les platformes cloud
In order to build smart systems, where machines are able to reason exactly like humans, data with semantics is a major requirement. This need led to the advent of the Semantic Web, proposing standard ways for representing and querying data with semantics. RDF is the prevalent data model used to describe web resources, and SPARQL is the query language that allows expressing queries over RDF data. Being able to store and query data with semantics triggered the development of many RDF data management systems. The rapid evolution of the Semantic Web provoked the shift from centralized data management systems to distributed ones. The first systems to appear relied on P2P and client-server architectures, while recently the focus moved to cloud computing.Cloud computing environments have strongly impacted research and development in distributed software platforms. Cloud providers offer distributed, shared-nothing infrastructures that may be used for data storage and processing. The main features of cloud computing involve scalability, fault-tolerance, and elastic allocation of computing and storage resources following the needs of the users.This thesis investigates the design and implementation of scalable algorithms and systems for cloud-based Semantic Web data management. In particular, we study the performance and cost of exploiting commercial cloud infrastructures to build Semantic Web data repositories, and the optimization of SPARQL queries for massively parallel frameworks.First, we introduce the basic concepts around Semantic Web and the main components and frameworks interacting in massively parallel cloud-based systems. In addition, we provide an extended overview of existing RDF data management systems in the centralized and distributed settings, emphasizing on the critical concepts of storage, indexing, query optimization, and infrastructure. Second, we present AMADA, an architecture for RDF data management using public cloud infrastructures. We follow the Software as a Service (SaaS) model, where the complete platform is running in the cloud and appropriate APIs are provided to the end-users for storing and retrieving RDF data. We explore various storage and querying strategies revealing pros and cons with respect to performance and also to monetary cost, which is a important new dimension to consider in public cloud services. Finally, we present CliqueSquare, a distributed RDF data management system built on top of Hadoop, incorporating a novel optimization algorithm that is able to produce massively parallel plans for SPARQL queries. We present a family of optimization algorithms, relying on n-ary (star) equality joins to build flat plans, and compare their ability to find the flattest possibles. Inspired by existing partitioning and indexing techniques we present a generic storage strategy suitable for storing RDF data in HDFS (Hadoopâs Distributed File System). Our experimental results validate the efficiency and effectiveness of the optimization algorithm demonstrating also the overall performance of the system.Afin de construire des systĂšmes intelligents, ouÌ les machines sont capables de raisonner exactement comme les humains, les donnĂ©es avec sĂ©mantique sont une exigence majeure. Ce besoin a conduit aÌ lâapparition du Web sĂ©mantique, qui propose des technologies standards pour reprĂ©senter et interroger les donnĂ©es avec sĂ©mantique. RDF est le modĂšle rĂ©pandu destineÌ aÌ dĂ©crire de façon formelle les ressources Web, et SPARQL est le langage de requĂȘte qui permet de rechercher, dâajouter, de modifier ou de supprimer des donnĂ©es RDF. Ătre capable de stocker et de rechercher des donnĂ©es avec sĂ©mantique a engendreÌ le dĂ©veloppement des nombreux systĂšmes de gestion des donnĂ©es RDF.LâĂ©volution rapide du Web sĂ©mantique a provoqueÌ le passage de systĂšmes de gestion des donnĂ©es centralisĂ©es aÌ ceux distribuĂ©s. Les premiers systĂšmes Ă©taient fondĂ©s sur les architectures pair-aÌ-pair et client-serveur, alors que rĂ©cemment lâattention se porte sur le cloud computing.Les environnements de cloud computing ont fortement impacteÌ la recherche et dĂ©veloppement dans les systĂšmes distribuĂ©s. Les fournisseurs de cloud offrent des infrastructures distribuĂ©es autonomes pouvant ĂȘtre utilisĂ©es pour le stockage et le traitement des donnĂ©es. Les principales caractĂ©ristiques du cloud computing impliquent lâĂ©volutivitĂ©Ì, la tolĂ©rance aux pannes et lâallocation Ă©lastique des ressources informatiques et de stockage en fonction des besoins des utilisateurs.Cette thĂšse Ă©tudie la conception et la mise en Ćuvre dâalgorithmes et de systĂšmes passant aÌ lâĂ©chelle pour la gestion des donnĂ©es du Web sĂ©mantique sur des platformes cloud. Plus particuliĂšrement, nous Ă©tudions la performance et le coĂ»t dâexploitation des services de cloud computing pour construire des entrepĂŽts de donneÌes du Web sĂ©mantique, ainsi que lâoptimisation de requĂȘtes SPARQL pour les cadres massivement parallĂšles.Tout dâabord, nous introduisons les concepts de base concernant le Web seÌmantique et les principaux composants des systeÌmes fondeÌs sur le cloud. En outre, nous preÌsentons un aperçu des systeÌmes de gestion des donneÌes RDF (centraliseÌs et distribueÌs), en mettant lâaccent sur les concepts critiques de stockage, dâindexation, dâoptimisation des requeÌtes et dâinfrastructure.Ensuite, nous preÌsentons AMADA, une architecture de gestion de donneÌes RDF utilisant les infrastructures de cloud public. Nous adoptons le modeÌle de logiciel en tant que service (software as a service - SaaS), ouÌ la plateforme reÌside dans le cloud et des APIs approprieÌes sont mises aÌ disposition des utilisateurs, afin quâils soient capables de stocker et de reÌcupeÌrer des donneÌes RDF. Nous explorons diverses strateÌgies de stockage et dâinterrogation, et nous eÌtudions leurs avantages et inconveÌnients au regard de la performance et du couÌt moneÌtaire, qui est une nouvelle dimension importante aÌ consideÌrer dans les services de cloud public.Enfin, nous preÌsentons CliqueSquare, un systeÌme distribueÌ de gestion des donneÌes RDF baseÌ sur Hadoop. CliqueSquare inteÌgre un nouvel algorithme dâoptimisation qui est capable de produire des plans massivement paralleÌles pour des requeÌtes SPARQL. Nous preÌsentons une famille dâalgorithmes dâoptimisation, sâappuyant sur les eÌquijointures n- aires pour geÌneÌrer des plans plats, et nous comparons leur capaciteÌ aÌ trouver les plans les plus plats possibles. InspireÌs par des techniques de partitionnement et dâindexation existantes, nous preÌsentons une strateÌgie de stockage geÌneÌrique approprieÌe au stockage de donneÌes RDF dans HDFS (Hadoop Distributed File System). Nos reÌsultats expeÌrimentaux valident lâeffectiviteÌ et lâefficaciteÌ de lâalgorithme dâoptimisation deÌmontrant eÌgalement la performance globale du systeÌme
Digital Preservation, Archival Science and Methodological Foundations for Digital Libraries
Digital libraries, whether commercial, public or personal, lie at the heart of the information society. Yet, research into their longâterm viability and the meaningful accessibility of their contents remains in its infancy. In general, as we have pointed out elsewhere, âafter more
than twenty years of research in digital curation and preservation the actual theories, methods and technologies that can either foster or ensure digital longevity remain
startlingly limited.â Research led by DigitalPreservationEurope (DPE) and the Digital
Preservation Cluster of DELOS has allowed us to refine the key research challenges â theoretical, methodological and technological â that need attention by researchers in digital libraries during the coming five to ten years, if we are to ensure that the materials held in our emerging digital libraries are to remain sustainable, authentic, accessible and understandable over time. Building on this work and taking the theoretical framework of archival science as bedrock, this paper investigates digital preservation and its foundational role if digital libraries are to have longâterm viability at the centre of the
global information society.
On mining complex sequential data by means of FCA and pattern structures
Nowadays data sets are available in very complex and heterogeneous ways.
Mining of such data collections is essential to support many real-world
applications ranging from healthcare to marketing. In this work, we focus on
the analysis of "complex" sequential data by means of interesting sequential
patterns. We approach the problem using the elegant mathematical framework of
Formal Concept Analysis (FCA) and its extension based on "pattern structures".
Pattern structures are used for mining complex data (such as sequences or
graphs) and are based on a subsumption operation, which in our case is defined
with respect to the partial order on sequences. We show how pattern structures
along with projections (i.e., a data reduction of sequential structures), are
able to enumerate more meaningful patterns and increase the computing
efficiency of the approach. Finally, we show the applicability of the presented
method for discovering and analyzing interesting patient patterns from a French
healthcare data set on cancer. The quantitative and qualitative results (with
annotations and analysis from a physician) are reported in this use case which
is the main motivation for this work.
Keywords: data mining; formal concept analysis; pattern structures;
projections; sequences; sequential data.Comment: An accepted publication in International Journal of General Systems.
The paper is created in the wake of the conference on Concept Lattice and
their Applications (CLA'2013). 27 pages, 9 figures, 3 table
The Evolution of Local Government Electronic Services in Victoria
There is an increasing interest amongst governments around the world in the potential for delivering government services on the WWW. While there are examples of substantial transition to electronic service delivery in some sectors in some countries, it is generally accepted that most government services have failed to evolve from enhanced information- based web pages. This is particularly true of the local government sector in Australia despite numerous policies and hopeful deadlines imposed by state and federal governments. This research in progress studied the current status of local government electronic service delivery. The results revealed little progress in the transition to electronic service delivery in most areas of local government. In an effort to enhance their websites, local governments have started to creep into the areas of e-Democracy and e-Governance suggesting that a linear maturity model of e-Service delivery may not be appropriate for the local government sector
LoKit (revisited): A Toolkit for Building Distributed Collaborative Applications
LoKit is a toolkit based on the coordination language LO. It allows to build
distributed collaborative applications by providing a set of generic tools.
This paper briefly introduces the concept of the toolkit, presents a subset of
the LoKit tools, and finally demonstrates its power by discussing a sample
application built with the toolkit.Comment: 20 pages, 3 figures, 1 table. This paper is a reprint of an
unpublished report on the occasion of the (fictitious) 30th anniversary of
the Xerox Research Centre Europe, now Naver Labs, Grenoble, Franc
- âŠ