7 research outputs found

    Searching by approximate personal-name matching

    Get PDF
    We discuss the design, building and evaluation of a method to access theinformation of a person, using his name as a search key, even if it has deformations. We present a similarity function, the DEA function, based on the probabilities of the edit operations accordingly to the involved letters and their position, and using a variable threshold. The efficacy of DEA is quantitatively evaluated, without human relevance judgments, very superior to the efficacy of known methods. A very efficient approximate search technique for the DEA function is also presented based on a compacted trie-tree structure.Postprint (published version

    Creating, Modeling, and Visualizing Metabolic Networks

    Get PDF
    Metabolic networks combine metabolism and regulation. These complex networks are difficult to understand and create due to the diverse types of information that need to be represented. This chapter describes a suite of interlinked tools for developing, displaying, and modeling metabolic networks. The metabolic network interactions database, MetNetDB, contains information on regulatory and metabolic interactions derived from a combination of web databases and input from biologists in their area of expertise. PathBinderA mines the biological “literaturome” by searching for new interactions or supporting evidence for existing interactions in metabolic networks. Sentences from abstracts are ranked in terms of the likelihood that an interaction is described and combined with evidence provided by other sentences. FCModeler, a publicly available software package, enables the biologist to visualize and model metabolic and regulatory network maps. FCModeler aids in the development and evaluation of hypotheses, and provides a modeling framework for assessing the large amounts of data captured by high-throughput gene expression experiments

    Funciones de comparación de carácteres para APNM: la distancia DEA

    Get PDF
    A typical application of the ASM (Approximate String Matching) is the matching of personal names, as for example to search people in the DB of an Information System. Through the years, several similarity functions have been proposed:phonetic codes, simple edit distance, n-gram distances, etc.A typical application of the ASM (Approximate String Matching) is the matching of personal names, as for example to search people in the DB of an Information System. Through the years, several similarity functions have been proposed: phonetic codes, simple edit distance, n-gram distances, etc. In this report a function is presented, DEA, having substantially better efficacy than existing ones, and mainly oriented to spanish surnames. The DEA distance is an edit distance, with costs based on the probabilities of the operations, characters and positions. The distance threshold is defined as a function of the lenght of the string. The efficacy of DEA is evaluated objectively, without human relevance judgements.Postprint (published version

    Measuring the accuracy of page-reading systems

    Full text link
    Given a bitmapped image of a page from any document, a page-reading system identifies the characters on the page and stores them in a text file. This OCR-generated text is represented by a string and compared with the correct string to determine the accuracy of this process. The string editing problem is applied to find an optimal correspondence of these strings using an appropriate cost function. The ISRI annual test of page-reading systems utilizes the following performance measures, which are defined in terms of this correspondence and the string edit distance: character accuracy, throughput, accuracy by character class, marked character efficiency, word accuracy, non-stopword accuracy, and phrase accuracy. It is shown that the universe of cost functions is divided into equivalence classes, and the cost functions related to the longest common subsequence (LCS) are identified. The computation of a LCS can be made faster by a linear-time preprocessing step

    Magyar Mesterséges Intelligencia Bibliográfia : Válogatás az 1988-96 között (esetenként korábban) megjelent publikációkból

    Get PDF
    Tartalom: referált folyóiratokban, konferencia kiadványokban, tanulmánykötetekben megjelent dolgozatok, könyvek, tankönyvek, disszertációk referenciáit, közel 190 magyar szerző/társszerző 400 (tárgyszavazott) dolgozatát tartalmazza. Függelékében az Új ALAPLAP folyóirat Jakab Ágnes által szerkesztett TUDÁSTECHNOLÓGIA c. tematikus MI-sorozat dolgozatainak jegyzéke található. Az anyagok az NJSZT által Budapesten szervezett ECAI’96 konferenciát kísérő kiállításra készültek. A Bibliográfia és a hozzá kapcsolódó Reprint Gyűjtemény az NJSZT standján volt kiállítva, míg az OMIKK adatbázisában való keresést egy oda kihelyezett terminál biztosította. A tárgyszavazást és az adatfelvitelt Kladiva Ottmár (OMIKK) irányította

    Recherche d'information dans les images de documents

    Get PDF
    L'image de document est un objet intelligible qui véhicule de l'information et qui est défini en fonction de son contenu. Cette thèse présente trois modèles de repérage d'information et de recherche d'images pertinentes à la requête d'un utilisateur. Le premier modèle de repérage des zones informationnelles est basé sur l'analyse multi échelle traduisant le contraste visuel des régions sombres par rapport au fond de l'image. Chaque région extraite est définie à partir de son contenu et ses caractéristiques statistiques et géométriques. L'algorithme de classification automatique est amélioré par l'application de règles de production déduites des formes des objets extraits. Une première évaluation de l'extraction du texte, des logos et des photographies sur les images de l'équipe Média Team de l'Université de Washington (UW-1) montre des résultats encourageants. Le deuxième modèle est basé sur le texte obtenu par Reconnaissance Optique de Caractères (OCR). Des erreurs-grammes et des règles de production modélisant les erreurs de reconnaissance de l'OCR sont utilisées pour l'extension des mots de la requête. Le modèle vectoriel est alors appliqué pour modéliser le texte OCR des images de documents et la requête pour la recherche d'information (RI). Un apprentissage sur les images Média Team (UW-2) et des tests sur un millier d'images Web ont validé cette approche. Les résultats obtenus indiquent une nette amélioration comparés aux méthodes standards comme le modèle vectoriel sans l'expansion de la requête et la méthode de recouvrement 3-grams. Pour les zones non textuelles, un troisième modèle vectoriel, basé sur les variations des paramètres de l'opérateur multi-échelle SKCS(Separable Kernel with Compact Support) et une combinaison de classifieurs et d'analyse de sous-espace en composantes principales MKL (Multi-espace Karhunen-Loeve) est appliqué sur une base d'apprentissage d'images de documents de Washington University et de pages Web. Les expériences ont montré une supériorité de l'interprétation et la puissance des vecteurs d'indexations déduits de la classification et représentant les zones non textuelles de l'image. Finalement, un système hybride d'indexation combinant les modèles textuels et non-textuels a été introduit pour répondre à des requêtes plus complexes portant sur des parties de l'image de documents telles un texte, une illustration, un logo ou un graphe. Les expériences ont montré la puissance d'interrogation par des mots ou des images requêtes et ont permis d'aboutir à des résultats encourageants dans la recherche d'images pertinentes qui surpassent ceux obtenus par les méthodes traditionnelles comme révèle une évaluation des rappels vs. précision conduite sur des requêtes portant sur des images de documents

    Ontology alignment mechanisms for improving web-based searching

    Get PDF
    Ontology has been developed to offer a commonly agreed understanding of a domain that is required for knowledge representation, knowledge exchange and reuse across domains. Therefore, ontology organizes information into taxonomies of terms (i.e., concepts, attributes) and shows the relationships between them. In fact, it is considered to be helpful in reducing conceptual confusion for users who need to share applications of different kinds, so it is widely used to capture and organize knowledge in a given domain. Although ontologies are considered to provide a solution to data heterogeneity, from another point of view, the available ontologies could themselves introduce heterogeneity problems. In order to deal with these problems, ontologies must be available for sharing or reusing; therefore, semantic heterogeneity and structural differences need to be resolved among ontologies. This can be done, in some cases, by aligning or matching heterogeneous ontologies. Thus, establishing the relationships between terms in the different ontologies is needed throughout ontology alignment. Semantic interoperability can be established in ontology reconciliation. The original problem is called the ―ontology alignment‖. The alignment of ontologies is concerned with the identification of the semantic relationships (subsumption, equivalence, etc.) that hold between the constituent entities (which can be classes, properties, etc.) of two ontologies. In this thesis, an ontology alignment technique has been developed in order to facilitate communication and build a bridge between ontologies. An efficient mechanism has been developed in order to align entities from ontologies in different description languages (e.g. OWL, RDF) or in the same language. This approach tries to use all the features of ontologies (concept, attributes, relations, structure, etc.) in order to obtain efficiency and high quality results. For this purpose, several matching techniques have been used such as string, structure, heuristic and linguistic matchingtechniques with thesaurus support, as well as human intervention in certain cases, to obtain high quality results. The main aim of the work is to introduce a method for finding semantic correspondences among heterogeneous ontologies, with the intention of supporting interoperability over given domains. The approach brings together techniques in modelling, string matching, computation linguistics, structure matching and heuristic matching, in order to provide a semi-automatic alignment framework and prototype alignment system to support the procedure of ontology alignment in order to improve semantic interoperability in heterogeneous systems. This technique integrates some important features in matching in order to achieve high quality results, which will help when searching and exchanging information between ontologies. Moreover, an ontology alignment system illustrates the solving of the key issues related to heterogeneous ontologies, which uses combination-matching strategies to execute the ontology-matching task. Therefore, it can be used to discover the matching between ontologies. This thesis also describes a prototype implementation of this approach in many real-world case studies extracted from various Web resources. Evaluating our system is done throughout the experiments provided by the Ontology Alignment Evaluation Initiative. The system successfully achieved 93% accuracy for ontology matching. Finally, a comparison between our system and well-known tools is achieved so that our system can be evaluated
    corecore