519 research outputs found

    Reasoning about Record Matching Rules

    Get PDF
    To accurately match records it is often necessary to utilize the semantics of the data. Functional dependencies (FDs) have proven useful in identifying tuples in a clean relation, based on the semantics of the data. For all the reasons that FDs and their inference are needed, it is also important to develop dependencies and their reasoning techniques for matching tuples from unreliable data sources. This paper investigates dependencies and their reasoning for record matching. (a) We introduce a class of matching dependencies (MDs) for specifying the semantics of data in unreliable relations, defined in terms of similarity metrics and a dynamic semantics . (b) We identify a special case of MDs, referred to as relative candidate keys (RCKs), to determine what attributes to compare and how to compare them when matching records across possibly different relations. (c) We propose a mechanism for inferring MDs, a departure from traditional implication analysis, such that when we cannot match records by comparing attributes that contain errors, we may still find matches by using other, more reliable attributes. (d) We provide an O ( n 2 ) time algorithm for inferring MDs, and an effective algorithm for deducing a set of RCKs from MDs. (e) We experimentally verify that the algorithms help matching tools efficiently identify keys at compile time for matching, blocking or windowing, and that the techniques effectively improve both the quality and efficiency of various record matching methods. </jats:p

    Pickering emulsions responsive to CO₂/N₂ and light dual stimuli at ambient temperature

    Get PDF
    A dual stimulus-responsive n-octane-in-water Pickering emulsion with CO₂/N₂ and light triggers is prepared using negatively charged silica nanoparticles in combination with a trace amount of dual switchable surfactant, 4-butyl-4-(4-N,N-dimethylbutoxyamine) azobenzene bicarbonate (AZO-B₄) as stabilizers. On one hand, the emulsion can be transformed between stable and unstable at ambient temperature rapidly via the N₂/CO₂ trigger, and on the other hand a change in droplet size of the emulsion can occur upon light irradiation/re-homogenization cycles without changing the particle/surfactant concentration. The dual responsiveness thus allows for a precise control of emulsion properties. Compared with emulsions stabilised by specially synthesized stimuli-responsive particles or by stimuli-responsive surfactants, the method reported here is much easier and requires relatively low concentration of surfactant (≈1/10 cmc), which is important for potential applications

    Synthesis of 4-thio-5-(2′′-thienyl)uridine and cytotoxicity activity against colon cancer cells <i>in vitro</i>

    Get PDF
    A novel anti-tumor agent 4-thio-5-(2′′-thienyl)uridine (6) was synthesized and the in vitro cytotoxicity activity against mice colon cancer cells (MC-38) and human colon cancer cells (HT-29) was evaluated by MTT assay. The results showed that the novel compound had antiproliferative activity toward MC-38 and HT-29 cells in a dose-dependent manner. The cell cycle analysis by flow cytometry indicated that compound 6 exerted in tumor cell proliferation inhibition by arresting HT-29 cells in the G2/M phase. In addition, cell death detected by propidium iodide staining showed that compound 6 efficiently induced cell apoptosis in a concentration-dependent manner. Moreover, the sensitivity of human fibroblast cells to compound 6 was far lower than that of tumor cells, suggesting the specific anti-tumor effect of 4-thio-5-(2′′-thienyl)uridine. Taken together, novel compound 6 effectively inhibits colon cancer cell proliferation, and hence would have potential value in clinical application as an antitumor agent

    Graph Homomorphism Revisited for Graph Matching

    Get PDF
    In a variety of emerging applications one needs to decide whether a graph G matches another G p , i.e. , whether G has a topological structure similar to that of G p . The traditional notions of graph homomorphism and isomorphism often fall short of capturing the structural similarity in these applications. This paper studies revisions of these notions, providing a full treatment from complexity to algorithms. (1) We propose p-homomorphism (p -hom) and 1-1 p -hom, which extend graph homomorphism and subgraph isomorphism, respectively, by mapping edges from one graph to paths in another, and by measuring the similarity of nodes . (2) We introduce metrics to measure graph similarity, and several optimization problems for p -hom and 1-1 p -hom. (3) We show that the decision problems for p -hom and 1-1 p -hom are NP-complete even for DAGs, and that the optimization problems are approximation-hard. (4) Nevertheless, we provide approximation algorithms with provable guarantees on match quality. We experimentally verify the effectiveness of the revised notions and the efficiency of our algorithms in Web site matching, using real-life and synthetic data. </jats:p

    Towards Certain Fixes with Editing Rules and Master Data

    Get PDF
    A variety of integrity constraints have been studied for data cleaning. While these constraints can detect the presence of errors, they fall short of guiding us to correct the errors. Indeed, data repairing based on these constraints may not find certain fixes that are absolutely correct, and worse, may introduce new errors when repairing the data. We propose a method for finding certain fixes, based on master data, a notion of certain regions , and a class of editing rules . A certain region is a set of attributes that are assured correct by the users. Given a certain region and master data, editing rules tell us what attributes to fix and how to update them. We show how the method can be used in data monitoring and enrichment. We develop techniques for reasoning about editing rules, to decide whether they lead to a unique fix and whether they are able to fix all the attributes in a tuple, relative to master data and a certain region. We also provide an algorithm to identify minimal certain regions, such that a certain fix is warranted by editing rules and master data as long as one of the regions is correct. We experimentally verify the effectiveness and scalability of the algorithm. </jats:p
    corecore