108,920 research outputs found

    Maximum common subgraph isomorphism algorithms for the matching of chemical structures

    Get PDF
    The maximum common subgraph (MCS) problem has become increasingly important in those aspects of chemoinformatics that involve the matching of 2D or 3D chemical structures. This paper provides a classification and a review of the many MCS algorithms, both exact and approximate, that have been described in the literature, and makes recommendations regarding their applicability to typical chemoinformatics tasks

    RASCAL: calculation of graph similarity using maximum common edge subgraphs

    Get PDF
    A new graph similarity calculation procedure is introduced for comparing labeled graphs. Given a minimum similarity threshold, the procedure consists of an initial screening process to determine whether it is possible for the measure of similarity between the two graphs to exceed the minimum threshold, followed by a rigorous maximum common edge subgraph (MCES) detection algorithm to compute the exact degree and composition of similarity. The proposed MCES algorithm is based on a maximum clique formulation of the problem and is a significant improvement over other published algorithms. It presents new approaches to both lower and upper bounding as well as vertex selection

    Measuring Accuracy of Triples in Knowledge Graphs

    Get PDF
    An increasing amount of large-scale knowledge graphs have been constructed in recent years. Those graphs are often created from text-based extraction, which could be very noisy. So far, cleaning knowledge graphs are often carried out by human experts and thus very inefficient. It is necessary to explore automatic methods for identifying and eliminating erroneous information. In order to achieve this, previous approaches primarily rely on internal information i.e. the knowledge graph itself. In this paper, we introduce an automatic approach, Triples Accuracy Assessment (TAA), for validating RDF triples (source triples) in a knowledge graph by finding consensus of matched triples (among target triples) from other knowledge graphs. TAA uses knowledge graph interlinks to find identical resources and apply different matching methods between the predicates of source triples and target triples. Then based on the matched triples, TAA calculates a confidence score to indicate the correctness of a source triple. In addition, we present an evaluation of our approach using the FactBench dataset for fact validation. Our findings show promising results for distinguishing between correct and wrong triples

    Effectiveness of graph-based and fingerprint-based similarity measures for virtual screening of 2D chemical structure databases

    Get PDF
    This paper reports an evaluation of both graph-based and fingerprint-based measures of structural similarity, when used for virtual screening of sets of 2D molecules drawn from the MDDR and ID Alert databases. The graph-based measures employ a new maximum common edge subgraph isomorphism algorithm, called RASCAL, with several similarity coefficients described previously for quantifying the similarity between pairs of graphs. The effectiveness of these graph-based searches is compared with that resulting from similarity searches using BCI, Daylight and Unity 2D fingerprints. Our results suggest that graph-based approaches provide an effective complement to existing fingerprint-based approaches to virtual screening

    On the Effect of Semantically Enriched Context Models on Software Modularization

    Full text link
    Many of the existing approaches for program comprehension rely on the linguistic information found in source code, such as identifier names and comments. Semantic clustering is one such technique for modularization of the system that relies on the informal semantics of the program, encoded in the vocabulary used in the source code. Treating the source code as a collection of tokens loses the semantic information embedded within the identifiers. We try to overcome this problem by introducing context models for source code identifiers to obtain a semantic kernel, which can be used for both deriving the topics that run through the system as well as their clustering. In the first model, we abstract an identifier to its type representation and build on this notion of context to construct contextual vector representation of the source code. The second notion of context is defined based on the flow of data between identifiers to represent a module as a dependency graph where the nodes correspond to identifiers and the edges represent the data dependencies between pairs of identifiers. We have applied our approach to 10 medium-sized open source Java projects, and show that by introducing contexts for identifiers, the quality of the modularization of the software systems is improved. Both of the context models give results that are superior to the plain vector representation of documents. In some cases, the authoritativeness of decompositions is improved by 67%. Furthermore, a more detailed evaluation of our approach on JEdit, an open source editor, demonstrates that inferred topics through performing topic analysis on the contextual representations are more meaningful compared to the plain representation of the documents. The proposed approach in introducing a context model for source code identifiers paves the way for building tools that support developers in program comprehension tasks such as application and domain concept location, software modularization and topic analysis

    Business Process Retrieval Based on Behavioral Semantics

    Get PDF
    This paper develops a framework for retrieving business processes considering search requirements based on behavioral semantics properties; it presents a framework called "BeMantics" for retrieving business processes based on structural, linguistics, and behavioral semantics properties. The relevance of the framework is evaluated retrieving business processes from a repository, and collecting a set of relevant business processes manually issued by human judges. The "BeMantics" framework scored high precision values (0.717) but low recall values (0.558), which implies that even when the framework avoided false negatives, it prone to false positives. The highest pre- cision value was scored in the linguistic criterion showing that using semantic inference in the tasks comparison allowed to reduce around 23.6 % the number of false positives. Using semantic inference to compare tasks of business processes can improve the precision; but if the ontologies are from narrow and specific domains, they limit the semantic expressiveness obtained with ontologies from more general domains. Regarding the perform- ance, it can be improved by using a filter phase which indexes business processes taking into account behavioral semantics propertie
    corecore