339 research outputs found

    Doctor of Philosophy

    Get PDF
    dissertationThe explosion of structured Web data (e.g., online databases, Wikipedia infoboxes) creates many opportunities for integrating and querying these data that go far beyond the simple search capabilities provided by search engines. Although much work has been devoted to data integration in the database community, the Web brings new challenges: the Web-scale (e.g., the large and growing volume of data) and the heterogeneity in Web data. Because there are so much data, scalable techniques that require little or no manual intervention and that are robust to noisy data are needed. In this dissertation, we propose a new and effective approach for matching Web-form interfaces and for matching multilingual Wikipedia infoboxes. As a further step toward these problems, we propose a general prudent schema-matching framework that matches a large number of schemas effectively. Our comprehensive experiments for Web-form interfaces and Wikipedia infoboxes show that it can enable on-the-fly, automatic integration of large collections of structured Web data. Another problem we address in this dissertation is schema discovery. While existing integration approaches assume that the relevant data sources and their schemas have been identified in advance, schemas are not always available for structured Web data. Approaches exist that exploit information in Wikipedia to discover the entity types and their associate schemas. However, due to inconsistencies, sparseness, and noise from the community contribution, these approaches are error prone and require substantial human intervention. Given the schema heterogeneity in Wikipedia infoboxes, we developed a new approach that uses the structured information available in infoboxes to cluster similar infoboxes and infer the schemata for entity types. Our approach is unsupervised and resilient to the unpredictable skew in the entity class distribution. Our experiments, using over one hundred thousand infoboxes extracted from Wikipedia, indicate that our approach is effective and produces accurate schemata for Wikipedia entities

    On Improving Generalization of CNN-Based Image Classification with Delineation Maps Using the CORF Push-Pull Inhibition Operator

    Get PDF
    Deployed image classification pipelines are typically dependent on the images captured in real-world environments. This means that images might be affected by different sources of perturbations (e.g. sensor noise in low-light environments). The main challenge arises by the fact that image quality directly impacts the reliability and consistency of classification tasks. This challenge has, hence, attracted wide interest within the computer vision communities. We propose a transformation step that attempts to enhance the generalization ability of CNN models in the presence of unseen noise in the test set. Concretely, the delineation maps of given images are determined using the CORF push-pull inhibition operator. Such an operation transforms an input image into a space that is more robust to noise before being processed by a CNN. We evaluated our approach on the Fashion MNIST data set with an AlexNet model. It turned out that the proposed CORF-augmented pipeline achieved comparable results on noise-free images to those of a conventional AlexNet classification model without CORF delineation maps, but it consistently achieved significantly superior performance on test images perturbed with different levels of Gaussian and uniform noise

    Acta Polytechnica Hungarica 2016

    Get PDF

    O impacto da inteligência artificial no negócio eletrónico

    Get PDF
    Pela importância que a Inteligência Artificial exibe na atualidade, revela-se de grande interesse verificar até que ponto ela está a transformar o Negócio Eletrónico. Para esse efeito, delineou-se uma revisão sistemática com o objetivo de avaliar os impactos da proliferação destes instrumentos. A investigação empreendida pretendeu identificar artigos científicos que, através de pesquisas realizadas a Fontes de Dados Eletrónicas, pudessem responder às questões de investigação implementadas: a) que tipo de soluções, baseadas na Inteligência Artificial (IA), têm sido usadas para melhorar o Negócio Eletrónico (NE); b) em que domínios do NE a IA foi aplicada; c) qual a taxa de sucesso ou fracasso do projeto. Simultaneamente, tiveram de respeitar critérios de seleção, nomeadamente, estar escritos em inglês, encontrarem-se no intervalo temporal 2015/2021 e tratar-se de estudos empíricos, suportados em dados reais. Após uma avaliação de qualidade final, procedeu-se à extração dos dados pertinentes para a investigação, para formulários criados em MS Excel. Estes dados estiveram na base da análise quantitativa e qualitativa que evidenciaram as descobertas feitas e sobre os quais se procedeu, posteriormente, à sua discussão. A dissertação termina com as conclusão e discussão de trabalhos futuros.Due to the importance that Artificial Intelligence exhibits today, it is of great interest to see to what extent it is transforming the Electronic Business. To this end, a systematic review was designed to evaluate the impacts of the proliferation of these instruments. The research aimed to identify scientific articles that, through research carried out on Electronic Data Sources, could answer the research questions implemented: a) what kind of solutions, based on Artificial Intelligence, have been used to improve the Electronic Business; b) in which areas of the Electronic Business Artificial Intelligence has been applied; c) what the success rate or failure of the project is. At the same time, they must comply with selection criteria, to be written in English, to be found in the 2015/2021-time interval and to be empirical studies supported by actual data. After a final quality evaluation, the relevant data for the investigation were extracted for forms created in MS Excel. These data were the basis of the quantitative and qualitative analysis that evidenced the findings found and on which they were subsequently discussed. The dissertation ends with the conclusion and discussion of future works

    Using Data Mining for Facilitating User Contributions in the Social Semantic Web

    Get PDF
    This thesis utilizes recommender systems to aid the user in contributing to the Social Semantic Web. In this work, we propose a framework that maps domain properties to recommendation technologies. Next, we develop novel recommendation algorithms for improving personalized tag recommendation and for recommendation of semantic relations. Finally, we introduce a framework to analyze different types of potential attacks against social tagging systems and evaluate their impact on those systems

    Arabic named entity recognition

    Full text link
    En esta tesis doctoral se describen las investigaciones realizadas con el objetivo de determinar las mejores tecnicas para construir un Reconocedor de Entidades Nombradas en Arabe. Tal sistema tendria la habilidad de identificar y clasificar las entidades nombradas que se encuentran en un texto arabe de dominio abierto. La tarea de Reconocimiento de Entidades Nombradas (REN) ayuda a otras tareas de Procesamiento del Lenguaje Natural (por ejemplo, la Recuperacion de Informacion, la Busqueda de Respuestas, la Traduccion Automatica, etc.) a lograr mejores resultados gracias al enriquecimiento que a~nade al texto. En la literatura existen diversos trabajos que investigan la tarea de REN para un idioma especifico o desde una perspectiva independiente del lenguaje. Sin embargo, hasta el momento, se han publicado muy pocos trabajos que estudien dicha tarea para el arabe. El arabe tiene una ortografia especial y una morfologia compleja, estos aspectos aportan nuevos desafios para la investigacion en la tarea de REN. Una investigacion completa del REN para elarabe no solo aportaria las tecnicas necesarias para conseguir un alto rendimiento, sino que tambien proporcionara un analisis de los errores y una discusion sobre los resultados que benefician a la comunidad de investigadores del REN. El objetivo principal de esta tesis es satisfacer esa necesidad. Para ello hemos: 1. Elaborado un estudio de los diferentes aspectos del arabe relacionados con dicha tarea; 2. Analizado el estado del arte del REN; 3. Llevado a cabo una comparativa de los resultados obtenidos por diferentes tecnicas de aprendizaje automatico; 4. Desarrollado un metodo basado en la combinacion de diferentes clasificadores, donde cada clasificador trata con una sola clase de entidades nombradas y emplea el conjunto de caracteristicas y la tecnica de aprendizaje automatico mas adecuados para la clase de entidades nombradas en cuestion. Nuestros experimentos han sido evaluados sobre nueve conjuntos de test.Benajiba, Y. (2009). Arabic named entity recognition [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8318Palanci

    Flood mapping from radar remote sensing using automated image classification techniques

    Get PDF
    corecore