178 research outputs found

    SWAT: A System for Detecting Salient Wikipedia Entities in Texts

    Full text link
    We study the problem of entity salience by proposing the design and implementation of SWAT, a system that identifies the salient Wikipedia entities occurring in an input document. SWAT consists of several modules that are able to detect and classify on-the-fly Wikipedia entities as salient or not, based on a large number of syntactic, semantic and latent features properly extracted via a supervised process which has been trained over millions of examples drawn from the New York Times corpus. The validation process is performed through a large experimental assessment, eventually showing that SWAT improves known solutions over all publicly available datasets. We release SWAT via an API that we describe and comment in the paper in order to ease its use in other software

    Effect of heuristics on serendipity in path-based storytelling with linked data

    Get PDF
    Path-based storytelling with Linked Data on the Web provides users the ability to discover concepts in an entertaining and educational way. Given a query context, many state-of-the-art pathfinding approaches aim at telling a story that coincides with the user's expectations by investigating paths over Linked Data on the Web. By taking into account serendipity in storytelling, we aim at improving and tailoring existing approaches towards better fitting user expectations so that users are able to discover interesting knowledge without feeling unsure or even lost in the story facts. To this end, we propose to optimize the link estimation between - and the selection of facts in a story by increasing the consistency and relevancy of links between facts through additional domain delineation and refinement steps. In order to address multiple aspects of serendipity, we propose and investigate combinations of weights and heuristics in paths forming the essential building blocks for each story. Our experimental findings with stories based on DBpedia indicate the improvements when applying the optimized algorithm

    Recommendation System for Issues Found in R&D

    Get PDF
    Este proyecto nació a partir de la necesidad de encontrar conocimiento y resumir la gran cantidad de problemas descubiertos en las fases de desarrollo de Software para módulos automotrices. Parte de ese conocimiento se puede obtener con base en los problemas del pasado en conjunto con sus propias soluciones. Con el crecimiento de la tecnología es mucho más factible recopilar toda esta información en diferentes formatos y procesarla. Esta información crece día con día, la cual se encuentra principalmente en forma de texto. Leer grandes cantidades de texto por una persona o incluso un conjunto de personas, para extraer información y visualizar datos importantes a la par de ese crecimiento de información es una tarea poco práctica o casi imposible de realizar de manera eficiente. A través de las nuevas tecnologías de IA y Big Data, nos es posible cumplir con estos objetivos. En especial, las técnicas de Procesamiento Natural del Lenguaje por parte de IA y las bases de datos tanto SQL como noSQL nos facilitaron el análisis y proceso en nuestro proyecto.ITESO, A. C

    Leveraging NLP and Social Network Analytic Techniques to Detect Censored Keywords: System Design and Experiments

    Get PDF
    Internet regulation in the form of online censorship and Internet shutdowns have been increasing over recent years. This paper presents a natural language processing (NLP) application for performing cross country probing that conceals the exact location of the originating request. A detailed discussion of the application aims to stimulate further investigation into new methods for measuring and quantifying Internet censorship practices around the world. In addition, results from two experiments involving search engine queries of banned keywords demonstrates censorship practices vary across different search engines. These results suggest opportunities for developing circumvention technologies that enable open and free access to information