99 research outputs found

    Modeling temporal dimensions of semistructured data

    Get PDF
    In this paper we propose an approach to manage in a correct way valid time semantics for semistructured temporal clinical information. In particular, we use a graph-based data model to represent radiological clinical data, focusing on the patient model of the well known DICOM standard, and define the set of (graphical) constraints needed to guarantee that the history of the given application domain is consistent

    Tracking Data Provenance of Archaeological Temporal Information in Presence of Uncertainty

    Get PDF
    The interpretation process is one of the main tasks performed by archaeologists who, starting from ground data about evidences and findings, incrementally derive knowledge about ancient objects or events. Very often more than one archaeologist contributes in different time instants to discover details about the same finding and thus, it is important to keep track of history and provenance of the overall knowledge discovery process. To this aim, we propose a model and a set of derivation rules for tracking and refining data provenance during the archaeological interpretation process. In particular, among all the possible interpretation activities, we concentrate on the one concerning the dating that archaeologists perform to assign one or more time intervals to a finding to define its lifespan on the temporal axis. In this context, we propose a framework to represent and derive updated provenance data about temporal information after the mentioned derivation process. Archaeological data, and in particular their temporal dimension, are typically vague, since many different interpretations can coexist, thus, we will use Fuzzy Logic to assign a degree of confidence to values and Fuzzy Temporal Constraint Networks to model relationships between dating of different findings represented as a graph-based dataset. The derivation rules used to infer more precise temporal intervals are enriched to manage also provenance information and their following updates after a derivation step. A MapReduce version of the path consistency algorithm is also proposed to improve the efficiency of the refining process on big graph-based datasets

    A graph-based meta-model for heterogeneous data management

    Get PDF
    The wave of interest in data-centric applications has spawned a high variety of data models, making it extremely difficult to evaluate, integrate or access them in a uniform way. Moreover, many recent models are too specific to allow immediate comparison with the others and do not easily support incremental model design. In this paper, we introduce GSMM, a meta-model based on the use of a generic graph that can be instantiated to a concrete data model by simply providing values for a restricted set of parameters and some high-level constraints, themselves represented as graphs. In GSMM, the concept of data schema is replaced by that of constraint, which allows the designer to impose structural restrictions on data in a very flexible way. GSMM includes GSL, a graph-based language for expressing queries and constraints that besides being applicable to data represented in GSMM, in principle, can be specialised and used for existing models where no language was defined. We show some sample applications of GSMM for deriving and comparing classical data models like the relational model, plain XML data, XML Schema, and time-varying semistructured data. We also show how GSMM can represent more recent modelling proposals: the triple stores, the BigTable model and Neo4j, a graph-based model for NoSQL data. A prototype showing the potential of the approach is also described

    Operational and abstract semantics of the query language G-Log

    Get PDF
    The amount and variety of data available electronically have dramatically increased in the led decade; however, data and documents are stored in different ways and do notusual# show their internal structure. In order to take ful advantage of thetopolk9dQ# structure ofdigital documents, andparticulIII web sites, theirhierarchical organizationshouliz explizatio introducing a notion of querysimil; to the one usedin database systems. A good approach, in that respect, is the one provided bygraphical querylrydM#99; original; designed to model object bases and lndd proposed for semistructured data, la, G-Log. The aim of this paper is to providesuitabl graph-basedsemantics to thislisd;BI# supporting both data structure variabil#I andtopol#Ik;M similpol#I between queries and document structures. A suite ofoperational semantics basedon the notion ofbisimulQM#I is introduced both at theconcr--h level (instances) andat theabstru( level (schemata), giving rise to a semantic framework that benefits from the cross-fertil9;dl of tool originalM designed in quite different research areas (databases, concurrency,loncur static analysis)

    Semi-automatic support for evolving functional dependencies

    Get PDF
    During the life of a database, systematic and frequent violations of a given constraint may suggest that the represented reality is changing and thus the constraint should evolve with it. In this paper we propose a method and a tool to (i) find the functional dependencies that are violated by the current data, and (ii) support their evolution when it is necessary to update them. The method relies on the use of confidence, as a measure that is associated with each dependency and allows us to understand \u201dhow far\u201d the dependency is from correctly describing the current data; and of goodness, as a measure of balance between the data satisfying the antecedent of the dependency and those satisfying its consequent. Our method compares favorably with literature that approaches the same problem in a different way, and performs effectively and efficiently as shown by our tests on both real and synthetic databases

    CoPart: a context-based partitioning technique for big data

    Get PDF
    The MapReduce programming paradigm is frequently used in order to process and analyse a huge amount of data. This paradigm relies on the ability to apply the same operation in parallel on independent chunks of data. The consequence is that the overall performances greatly depend on the way data are partitioned among the various computation nodes. The default partitioning technique, provided by systems like Hadoop or Spark, basically performs a random subdivision of the input records, without considering the nature and correlation between them. Even if such approach can be appropriate in the simplest case where all the input records have to be always analyzed, it becomes a limit for sophisticated analyses, in which correlations between records can be exploited to preliminarily prune unnecessary computations. In this paper we design a context-based multi-dimensional partitioning technique, called COPART, which takes care of data correlation in order to determine how records are subdivided between splits (i.e., units of work assigned to a computation node). More specifically, it considers not only the correlation of data w.r.t. contextual attributes, but also the distribution of each contextual dimension in the dataset. We experimentally compare our approach with existing ones, considering both quality criteria and the query execution times

    Tracking social provenance in chains of retweets

    Get PDF
    In the era of massive sharing of information, the term social provenance is used to denote the ownership, source or origin of a piece of information which has been propagated through social media. Tracking the provenance of information is becoming increasingly important as social platforms acquire more relevance as source of news. In this scenario, Twitter is considered one of the most important social networks for information sharing and dissemination which can be accelerated through the use of retweets and quotes. However, the Twitter API does not provide a complete tracking of the retweet chains, since only the connection between a retweet and the original post is stored, while all the intermediate connections are lost. This can limit the ability to track the diffusion of information as well as the estimation of the importance of specific users, who can rapidly become influencers, in the news dissemination. This paper proposes an innovative approach for rebuilding the possible chains of retweets and also providing an estimation of the contributions given by each user in the information spread. For this purpose, we define the concept of Provenance Constraint Network and a modified version of the Path Consistency Algorithm. An application of the proposed technique to a real-world dataset is presented at the end of the paper

    A Context-Aware Recommendation System with a Crowding Forecaster

    Get PDF
    Recommendation systems (RSs) are increasing their popularity in recent years. Many big IT companies like Google, Amazon and Netflix, have a RS at the core of their business. In this paper, we propose a modular platform for enhancing a RS for the tourism domain with a crowding forecaster, which is able to produce an estimation about the current and future occupation of different Points of Interest (PoIs) by taking into consideration also contextual aspects. The main advantage of the proposed system is its modularity and the ability to be easily tailored to different application domains. Moreover, the use of standard and pluggable components allows the system to be integrated in different application scenarios

    Promoting sustainable tourism by recommending sequences of attractions with deep reinforcement learning

    Get PDF
    Developing Recommender Systems (RSs) is particularly interesting in the tourist domain, where one or more attractions have to be suggested to users based on preferences, contextual dimensions, and several other constraints. RSs usually rely on the availability of a vast amount of historical information about users’ past activities. However, this is not usually the case in the tourist domain, where acquiring complete and accurate information about the user’s behavior is complex, and providing personalized suggestions is frequently practically impossible. Moreover, even though most available Touristic RSs (T-RSs) are user-focused, the touristic domain also requires the development of systems that can promote a more sustainable form of tourism. The concept of sustainable tourism covers many aspects, from economic, social, and environmental issues to the attention to improving tourists’ experience and the needs of host communities. In this regard, one of the most important aspects is the prevention of overcrowded situations in attractions or locations (over-tourism). For this reason, this paper proposes a different kind of T-RS, which focuses more on the tourists’ impact on the destinations, trying to improve their experiences by offering better visit conditions. Moreover, instead of suggesting the next Point of Interest (PoI) to visit in a given situation, it provides a suggestion about a complete sequence of PoIs (tourist itinerary) that covers an entire day or vacation period. The proposed technique is based on the application of Deep Reinforcement Learning, where the tourist’s reward depends on the specific spatial and temporal context in which the itinerary has to be performed. The solution has been evaluated with a real-world dataset regarding the visits conducted by tourists in Verona (Italy) from 2014 to 2023 and compared with three baselines
    • …
    corecore