97 research outputs found

    Robust Group Linkage

    Full text link
    We study the problem of group linkage: linking records that refer to entities in the same group. Applications for group linkage include finding businesses in the same chain, finding conference attendees from the same affiliation, finding players from the same team, etc. Group linkage faces challenges not present for traditional record linkage. First, although different members in the same group can share some similar global values of an attribute, they represent different entities so can also have distinct local values for the same or different attributes, requiring a high tolerance for value diversity. Second, groups can be huge (with tens of thousands of records), requiring high scalability even after using good blocking strategies. We present a two-stage algorithm: the first stage identifies cores containing records that are very likely to belong to the same group, while being robust to possible erroneous values; the second stage collects strong evidence from the cores and leverages it for merging more records into the same group, while being tolerant to differences in local values of an attribute. Experimental results show the high effectiveness and efficiency of our algorithm on various real-world data sets

    Towards Distributed BPEL Orchestrations

    Get PDF
    Web services are imposing as the technology to integrate highly heterogeneous systems. BPEL, the standard technology to compose services, assumes a single âorchestratorâ that controls the execution flow and coordinates the interactions with selected services. This centralized approach simplifies the coordination among components, but it is also a too heavy constraint. To this end, the paper introduces the idea of distributed orchestrations and presents a proposal to couple BPEL and distributed execution in mobile settings. The approach âexemplified on a simple case studyâ transforms a centralized BPEL process into a set of coordinated processes. An explicit meta-model and graph transformation supply the formal grounding to obtain a set of related processes, and to add the communication infrastructure among the newly created processes. The paper also presents a communication infrastructure based on tuple spaces to make the different orchestrators interact in mobile contexts. Keywords: WS-BPEL, Grap

    An Ontology Based Approach to Data Quality Initiatives Cost-Benefit Evaluation

    Get PDF
    In order to achieve higher data quality targets, organizations need to identify the data quality dimensions that are affected by poor quality, assess them, and evaluate which improvement techniques are suitable to apply. Data quality literature provides methodologies that support complete data quality management by providing guidelines that organizations should contextualize and apply to their scenario. Only a few methodologies use the cost-benefit analysis as a tool to evaluate the feasibility of a data quality improvement project. In this paper, we present an ontological description of the cost-benefit analysis including the most important contributes already proposed in literature. The use of ontologies allows the knowledge improvement by means of the identification of the interdependencies between costs and benefits and enables different complex evaluations. The feasibility and usefulness of the proposed ontology-based tool has been tested by means of a real case study

    A capacity and value based model for data architectures adopting integration technologies

    Get PDF
    The paper discusses two concepts that have been associated with various approaches to data and information, namelycapacity and value, focusing on data base architectures, and on two types of technologies diffusely used in integrationprojects, namely data integration, in the area of Enterprise Information Integration, and publish & subscribe, in the area ofEnterprise Application Integration. Furthermore, the paper proposes and discusses a unifying model for information capacityand value, that considers also quality constraints and run time costs of the data base architecture

    workflow partitioning in mobile information systems

    Get PDF
    The increasing success of wireless technologies is sustaining the diffusion of mobile information systems, but the youth of the underlying technology and its peculiar characteristics are impacting the development of such systems. For example, the execution of business processes in such a context must cope with the variable and fluctuating bandwidth available to the different devices. This leads the designer to stress the independence of each actor -- by minimizing interactions and knowledge sharing -- to increase the reliability of the whole system

    ABSTAT-HD: a scalable tool for profiling very large knowledge graphs

    Get PDF
    AbstractProcessing large-scale and highly interconnected Knowledge Graphs (KG) is becoming crucial for many applications such as recommender systems, question answering, etc. Profiling approaches have been proposed to summarize large KGs with the aim to produce concise and meaningful representation so that they can be easily managed. However, constructing profiles and calculating several statistics such as cardinality descriptors or inferences are resource expensive. In this paper, we present ABSTAT-HD, a highly distributed profiling tool that supports users in profiling and understanding big and complex knowledge graphs. We demonstrate the impact of the new architecture of ABSTAT-HD by presenting a set of experiments that show its scalability with respect to three dimensions of the data to be processed: size, complexity and workload. The experimentation shows that our profiling framework provides informative and concise profiles, and can process and manage very large KGs

    Efficient acclimation of the chloroplast antioxidant defence of Arabidopsis thaliana leaves in response to a 10- or 100-fold light increment and the possible involvement of retrograde signals

    Get PDF
    Chloroplasts are equipped with a nuclear-encoded antioxidant defence system the components of which are usually expressed at high transcript and activity levels. To significantly challenge the chloroplast antioxidant system, Arabidopsis thaliana plants, acclimated to extremely low light slightly above the light compensation point or to normal growth chamber light, were moved to high light corresponding to a 100- and 10-fold light jump, for 6 h and 24 h in order to observe the responses of the water–water cycle at the transcript, protein, enzyme activity, and metabolite levels. The plants coped efficiently with the high light regime and the photoinhibition was fully reversible. Reactive oxygen species (ROS), glutathione and ascorbate levels as well as redox states, respectively, revealed no particular oxidative stress in low-light-acclimated plants transferred to 100-fold excess light. Strong regulation of the water–water cycle enzymes at the transcript level was only partly reflected at the protein and activity levels. In general, low light plants had higher stromal (sAPX) and thylakoid ascorbate peroxidase (tAPX), dehydroascorbate reductase (DHAR), and CuZn superoxide dismutase (CuZnSOD) protein contents than normal light-grown plants. Mutants defective in components relevant for retrograde signalling, namely stn7, ex1, tpt1, and a mutant expressing E .coli catalase in the chloroplast showed unaltered transcriptional responses of water–water cycle enzymes. These findings, together with the response of marker transcripts, indicate that abscisic acid is not involved and that the plastoquinone redox state and reactive oxygen species do not play a major role in regulating the transcriptional response at t=6 h, while other marker transcripts suggest a major role for reductive power, metabolites, and lipids as signals for the response of the water–water cycle
    corecore