252 research outputs found

    Using an ontology to improve the web search experience

    Get PDF
    The search terms that a user passes to a search engine are often ambiguous, referring to homonyms. The results in these cases are a mixture of links to documents that contain different meanings of the search terms. Current search engines provide suggested query completions in a dropdown list. However, such lists are not well organized, mixing completions for different meanings. In addition, the suggested search phrases are not discriminating enough. Moreover, current search engines often return an unexpected number of results. Zero hits are naturally undesirable, while too many hits are likely to be overwhelming and of low precision. This dissertation work aims at providing a better Web search experience for the users by addressing the above described problems.To improve the search for homonyms, suggested completions are well organized and visually separated. In addition, this approach supports the use of negative terms to disambiguate the suggested completions in the list. The dissertation presents an algorithm to generate the suggested search completion terms using an ontology and new ways of displaying homonymous search results. These algorithms have been implemented in the Ontology-Supported Web Search (OSWS) System for famous people. This dissertation presents a method for dynamically building the necessary ontology of famous people based on mining the suggested completions of a search engine. This is combined with data from DBpedia. To enhance the OSWS ontology, Facebook is used as a secondary data source. Information from people public pages is mined and Facebook attributes are cleaned up and mapped to the OSWS ontology. To control the size of the result sets returned by the search engines, this dissertation demonstrates a query rewriting method for generating alternative query strings and implements a model for predicting the number of search engine hits for each alternative query string, based on the English language frequencies of the words in the search terms. Evaluation experiments of the hit count prediction model are presented for three major search engines. The dissertation also discusses and quantifies how far the Google, Yahoo! and Bing search engines diverge from monotonic behavior, considering negative and positive search terms separately

    Enhancing information retrieval in folksonomies using ontology of place constructed from Gazetteer information

    Get PDF
    Dissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial TechnologiesFolksonomy (from folk and taxonomy) is an approach to user metadata creation where users describe information objects with a free-form list of keywords (‘tags’). Folksonomy has have proved to be a useful information retrieval tool that support the emergence of “collective intelligence” or “bottom-up” light weight semantics. Since there are no guiding rules or restrictions on the users, folksonomy has some drawbacks and problems as lack of hierarchy, synonym control, and semantic precision. This research aims at enhancing information retrieval in folksonomy, particularly that of location information, by establishing explicit relationships between place name tags. To accomplish this, an automated approach is developed. The approach starts by retrieving tags from Flickr. The tags are then filtered to identify those that represent place names. Next, the gazetteer service that is a knowledge organization system for spatial information is used to query for the place names. The result of the search from the gazetteer and the feature types are used to construct an ontology of place. The ontology of place is formalized from place name concepts, where each place has a “Part-Of” relationship with its direct parent. The ontology is then formalized in OWL (Web Ontology Language). A search tool prototype is developed that extracts a place name and its parent name from the ontology and use them for searching in Flickr. The semantic richness added to Flickr search engine using our approach is tested and the results are evaluated

    Extensibility of Enterprise Modelling Languages

    Get PDF
    Die Arbeit adressiert insgesamt drei Forschungsschwerpunkte. Der erste Schwerpunkt setzt sich mit zu entwickelnden BPMN-Erweiterungen auseinander und stellt deren methodische Implikationen im Rahmen der bestehenden Sprachstandards dar. Dies umfasst zum einen ganz konkrete Spracherweiterungen wie z. B. BPMN4CP, eine BPMN-Erweiterung zur multi-perspektivischen Modellierung von klinischen Behandlungspfaden. Zum anderen betrifft dieser Teil auch modellierungsmethodische Konsequenzen, um parallel sowohl die zugrunde liegende Sprache (d. h. das BPMN-Metamodell) als auch die Methode zur Erweiterungsentwicklung zu verbessern und somit den festgestellten Unzulänglichkeiten zu begegnen. Der zweite Schwerpunkt adressiert die Untersuchung von sprachunabhängigen Fragen der Erweiterbarkeit, welche sich entweder während der Bearbeitung des ersten Teils ergeben haben oder aus dessen Ergebnissen induktiv geschlossen wurden. Der Forschungsschwerpunkt fokussiert dabei insbesondere eine Konsolidierung bestehender Terminologien, die Beschreibung generisch anwendbarer Erweiterungsmechanismen sowie die nutzerorientierte Analyse eines potentiellen Erweiterungsbedarfs. Dieser Teil bereitet somit die Entwicklung einer generischen Erweiterungsmethode grundlegend vor. Hierzu zählt auch die fundamentale Auseinandersetzung mit Unternehmensmodellierungssprachen generell, da nur eine ganzheitliche, widerspruchsfreie und integrierte Sprachdefinition Erweiterungen überhaupt ermöglichen und gelingen lassen kann. Dies betrifft beispielsweise die Spezifikation der intendierten Semantik einer Sprache

    Semantic Keyword-based Search on Heterogeneous Information Systems

    Get PDF
    En los últimos años, con la difusión y el uso de Internet, el volumen de información disponible para los usuarios ha crecido exponencialmente. Además, la posibilidad de acceder a dicha información se ha visto impulsada por los niveles de conectividad de los que disfrutamos actualmente gracias al uso de los móviles de nueva generación y las redes inalámbricas (e.g., 3G, Wi-Fi). Sin embargo, con los métodos de acceso actuales, este exceso de información es tan perjudicial como la falta de la misma, ya que el usuario no tiene tiempo de procesarla en su totalidad. Por otro lado, esta información está detrás de sistemas de información de naturaleza muy heterogénea (e.g., buscadores Web, fuentes de Linked Data, etc.), y el usuario tiene que conocerlos para poder explotar al máximo sus capacidades. Esta diversidad se hace más patente si consideramos cualquier servicio de información como potencial fuente de información para el usuario (e.g., servicios basados en la localización, bases de datos exportadas mediante Servicios Web, etc.). Dado este nivel de heterogeneidad, la integración de estos sistemas se debe hacer externamente, ocultando su complejidad al usuario y dotándole de mecanismos para que pueda expresar sus consultas de forma sencilla. En este sentido, el uso de interfaces basados en palabras clave (keywords) se ha popularizado gracias a su sencillez y a su adopción por parte de los buscadores Web más usados. Sin embargo, esa sencillez que es su mayor virtud también es su mayor defecto, ya que genera problemas de ambigüedad en las consultas. Las consultas expresadas como conjuntos de palabras clave son inherentemente ambiguas al ser una proyección de la verdadera pregunta que el usuario quiere hacer. En la presente tesis, abordamos el problema de integrar sistemas de información heterogéneos bajo una búsqueda guiada por la semántica de las palabras clave; y presentamos QueryGen, un prototipo de nuestra solución. En esta búsqueda semántica abogamos por establecer la consulta que el usuario tenía en mente cuando escribió sus palabras clave, en un lenguaje de consulta formal para evitar posibles ambigüedades. La integración de los sistemas subyacentes se realiza a través de la definición de sus lenguajes de consulta y de sus modelos de ejecución. En particular, nuestro sistema: - Descubre el significado de las palabras clave consultando un conjunto dinámico de ontologías, y desambigua dichas palabras teniendo en cuenta su contexto (el resto de palabras clave), ya que cada una de las palabras tiene influencia sobre el significado del resto de la entrada. Durante este proceso, los significados que son suficientemente similares son fusionados y el sistema propone aquellos más probables dada la entrada del usuario. La información semántica obtenida en el proceso es integrada y utilizada en fases posteriores para obtener la correcta interpretación del conjunto de palabras clave. - Un mismo conjunto de palabras pueden representar diversas consultas aún cuando se conoce su significado individual. Por ello, una vez establecidos los significados de cada palabra y para obtener la consulta exacta del usuario, nuestro sistema encuentra todas las preguntas posibles utilizando las palabras clave. Esta traducción de palabras clave a preguntas se realiza empleando lenguajes de consulta formales para evitar las posibles ambigüedades y expresar la consulta de manera precisa. Nuestro sistema evita la generación de preguntas semánticamente incorrectas o duplicadas con la ayuda de un razonador basado en Lógicas Descriptivas (Description Logics). En este proceso, nuestro sistema es capaz de reaccionar ante entradas insuficientes (e.g., palabras omitidas) mediante la adición de términos virtuales, que representan internamente palabras que el usuario tenía en mente pero omitió cuando escribió su consulta. - Por último, tras la validación por parte del usuario de su consulta, nuestro sistema accede a los sistemas de información registrados que pueden responderla y recupera la respuesta de acuerdo a la semántica de la consulta. Para ello, nuestro sistema implementa una arquitectura modular permite añadir nuevos sistemas al vuelo siempre que se proporcione su especificación (lenguajes de consulta soportados, modelos y formatos de datos, etc.). Por otro lado, el trabajar con sistemas de información heterogéneos, en particular sistemas relacionados con la Computación Móvil, ha permitido que las contribuciones de esta tesis no se limiten al campo de la búsqueda semántica. A este respecto, se ha estudiado el ámbito de la semántica de las consultas basadas en la localización, y especialmente, la influencia de la semántica de las localizaciones en el procesado e interpretación de las mismas. En particular, se proponen dos modelos ontológicos para modelar y capturar la relaciones semánticas de las localizaciones y ampliar la expresividad de las consultas basadas en la localización. Durante el desarrollo de esta tesis, situada entre el ámbito de la Web Semántica y el de la Computación Móvil, se ha abierto una nueva línea de investigación acerca del modelado de conocimiento volátil, y se ha estudiado la posibilidad de utilizar razonadores basados en Lógicas Descriptivas en dispositivos basados en Android. Por último, nuestro trabajo en el ámbito de las búsquedas semánticas a partir de palabras clave ha sido extendido al ámbito de los agentes conversacionales, haciéndoles capaces de explotar distintas fuentes de datos semánticos actualmente disponibles bajo los principios del Linked Data

    A semantic and agent-based approach to support information retrieval, interoperability and multi-lateral viewpoints for heterogeneous environmental databases

    Get PDF
    PhDData stored in individual autonomous databases often needs to be combined and interrelated. For example, in the Inland Water (IW) environment monitoring domain, the spatial and temporal variation of measurements of different water quality indicators stored in different databases are of interest. Data from multiple data sources is more complex to combine when there is a lack of metadata in a computation forin and when the syntax and semantics of the stored data models are heterogeneous. The main types of information retrieval (IR) requirements are query transparency and data harmonisation for data interoperability and support for multiple user views. A combined Semantic Web based and Agent based distributed system framework has been developed to support the above IR requirements. It has been implemented using the Jena ontology and JADE agent toolkits. The semantic part supports the interoperability of autonomous data sources by merging their intensional data, using a Global-As-View or GAV approach, into a global semantic model, represented in DAML+OIL and in OWL. This is used to mediate between different local database views. The agent part provides the semantic services to import, align and parse semantic metadata instances, to support data mediation and to reason about data mappings during alignment. The framework has applied to support information retrieval, interoperability and multi-lateral viewpoints for four European environmental agency databases. An extended GAV approach has been developed and applied to handle queries that can be reformulated over multiple user views of the stored data. This allows users to retrieve data in a conceptualisation that is better suited to them rather than to have to understand the entire detailed global view conceptualisation. User viewpoints are derived from the global ontology or existing viewpoints of it. This has the advantage that it reduces the number of potential conceptualisations and their associated mappings to be more computationally manageable. Whereas an ad hoc framework based upon conventional distributed programming language and a rule framework could be used to support user views and adaptation to user views, a more formal framework has the benefit in that it can support reasoning about the consistency, equivalence, containment and conflict resolution when traversing data models. A preliminary formulation of the formal model has been undertaken and is based upon extending a Datalog type algebra with hierarchical, attribute and instance value operators. These operators can be applied to support compositional mapping and consistency checking of data views. The multiple viewpoint system was implemented as a Java-based application consisting of two sub-systems, one for viewpoint adaptation and management, the other for query processing and query result adjustment

    A UMLS-based spell checker for natural language processing in vaccine safety

    Get PDF
    BACKGROUND: The Institute of Medicine has identified patient safety as a key goal for health care in the United States. Detecting vaccine adverse events is an important public health activity that contributes to patient safety. Reports about adverse events following immunization (AEFI) from surveillance systems contain free-text components that can be analyzed using natural language processing. To extract Unified Medical Language System (UMLS) concepts from free text and classify AEFI reports based on concepts they contain, we first needed to clean the text by expanding abbreviations and shortcuts and correcting spelling errors. Our objective in this paper was to create a UMLS-based spelling error correction tool as a first step in the natural language processing (NLP) pipeline for AEFI reports. METHODS: We developed spell checking algorithms using open source tools. We used de-identified AEFI surveillance reports to create free-text data sets for analysis. After expansion of abbreviated clinical terms and shortcuts, we performed spelling correction in four steps: (1) error detection, (2) word list generation, (3) word list disambiguation and (4) error correction. We then measured the performance of the resulting spell checker by comparing it to manual correction. RESULTS: We used 12,056 words to train the spell checker and tested its performance on 8,131 words. During testing, sensitivity, specificity, and positive predictive value (PPV) for the spell checker were 74% (95% CI: 74–75), 100% (95% CI: 100–100), and 47% (95% CI: 46%–48%), respectively. CONCLUSION: We created a prototype spell checker that can be used to process AEFI reports. We used the UMLS Specialist Lexicon as the primary source of dictionary terms and the WordNet lexicon as a secondary source. We used the UMLS as a domain-specific source of dictionary terms to compare potentially misspelled words in the corpus. The prototype sensitivity was comparable to currently available tools, but the specificity was much superior. The slow processing speed may be improved by trimming it down to the most useful component algorithms. Other investigators may find the methods we developed useful for cleaning text using lexicons specific to their area of interest

    Finding available services in TOSCA-compliant clouds

    Get PDF
    The OASIS TOSCA specification aims at enhancing the por-ta-bility of cloud applications by defining a language to describe and manage them across heterogeneous clouds. A service template is defined as an orchestration of typed nodes, which can be instantiated by matching other service templates. In this paper, we define and implement the notions of {em exact} and {it plug-in matching} between TOSCA service templates and node types. We then define two other types of matching ({em flexible} and {em white-box}), each permitting to ignore larger sets of non-relevant syntactic differences when type-checking service templates with respect to node types. The paper also describes how a service template that plug-in, flexibly or white-box matches a node type can be suitably adapted so as to exactly match it

    Phoneme-based Video Indexing Using Phonetic Disparity Search

    Get PDF
    This dissertation presents and evaluates a method to the video indexing problem by investigating a categorization method that transcribes audio content through Automatic Speech Recognition (ASR) combined with Dynamic Contextualization (DC), Phonetic Disparity Search (PDS) and Metaphone indexation. The suggested approach applies genome pattern matching algorithms with computational summarization to build a database infrastructure that provides an indexed summary of the original audio content. PDS complements the contextual phoneme indexing approach by optimizing topic seek performance and accuracy in large video content structures. A prototype was established to translate news broadcast video into text and phonemes automatically by using ASR utterance conversions. Each phonetic utterance extraction was then categorized, converted to Metaphones, and stored in a repository with contextual topical information attached and indexed for posterior search analysis. Following the original design strategy, a custom parallel interface was built to measure the capabilities of dissimilar phonetic queries and provide an interface for result analysis. The postulated solution provides evidence of a superior topic matching when compared to traditional word and phoneme search methods. Experimental results demonstrate that PDS can be 3.7% better than the same phoneme query, Metaphone search proved to be 154.6% better than the same phoneme seek and 68.1 % better than the equivalent word search

    Framework for the semantic alignment of enterprise’s domain knowledge

    Get PDF
    Nowadays, the consumption of goods and services on the Internet are increasing in a constant motion. Small and Medium Enterprises (SMEs) mostly from the traditional industry sectors are usually make business in weak and fragile market sectors, where customized products and services prevail. To survive and compete in the actual markets they have to readjust their business strategies by creating new manufacturing processes and establishing new business networks through new technological approaches. In order to compete with big enterprises, these partnerships aim the sharing of resources, knowledge and strategies to boost the sector’s business consolidation through the creation of dynamic manufacturing networks. To facilitate such demand, it is proposed the development of a centralized information system, which allows enterprises to select and create dynamic manufacturing networks that would have the capability to monitor all the manufacturing process, including the assembly, packaging and distribution phases. Even the networking partners that come from the same area have multi and heterogeneous representations of the same knowledge, denoting their own view of the domain. Thus, different conceptual, semantic, and consequently, diverse lexically knowledge representations may occur in the network, causing non-transparent sharing of information and interoperability inconsistencies. The creation of a framework supported by a tool that in a flexible way would enable the identification, classification and resolution of such semantic heterogeneities is required. This tool will support the network in the semantic mapping establishments, to facilitate the various enterprises information systems integration
    corecore