1,458 research outputs found

    A User-Centered Concept Mining System for Query and Document Understanding at Tencent

    Full text link
    Concepts embody the knowledge of the world and facilitate the cognitive processes of human beings. Mining concepts from web documents and constructing the corresponding taxonomy are core research problems in text understanding and support many downstream tasks such as query analysis, knowledge base construction, recommendation, and search. However, we argue that most prior studies extract formal and overly general concepts from Wikipedia or static web pages, which are not representing the user perspective. In this paper, we describe our experience of implementing and deploying ConcepT in Tencent QQ Browser. It discovers user-centered concepts at the right granularity conforming to user interests, by mining a large amount of user queries and interactive search click logs. The extracted concepts have the proper granularity, are consistent with user language styles and are dynamically updated. We further present our techniques to tag documents with user-centered concepts and to construct a topic-concept-instance taxonomy, which has helped to improve search as well as news feeds recommendation in Tencent QQ Browser. We performed extensive offline evaluation to demonstrate that our approach could extract concepts of higher quality compared to several other existing methods. Our system has been deployed in Tencent QQ Browser. Results from online A/B testing involving a large number of real users suggest that the Impression Efficiency of feeds users increased by 6.01% after incorporating the user-centered concepts into the recommendation framework of Tencent QQ Browser.Comment: Accepted by KDD 201

    Toward Self-Organising Service Communities

    Get PDF
    This paper discusses a framework in which catalog service communities are built, linked for interaction, and constantly monitored and adapted over time. A catalog service community (represented as a peer node in a peer-to-peer network) in our system can be viewed as domain specific data integration mediators representing the domain knowledge and the registry information. The query routing among communities is performed to identify a set of data sources that are relevant to answering a given query. The system monitors the interactions between the communities to discover patterns that may lead to restructuring of the network (e.g., irrelevant peers removed, new relationships created, etc.)

    On demand translation for querying incompletely aligned datasets

    Get PDF
    More and more users aim at taking advantage of the existing Linked Open Data environment to formulate a query over a dataset and to then try to process the same query over different datasets, one after another, in order to obtain a broader set of answers. However, the heterogeneity of vocabularies used in the datasets on the one side, and the fact that the number of alignments among those datasets is scarce on the other, makes that querying task difficult for them. Considering this scenario we present in this paper a proposal that allows on demand translations of queries formulated over an original dataset, into queries expressed using the vocabulary of a targeted dataset. Our approach relieves users from knowing the vocabulary used in the targeted datasets and even more it considers situations where alignments do not exist or they are not suitable for the formulated query. Therefore, in order to favour the possibility of getting answers, sometimes there is no guarantee of obtaining a semantically equivalent translation. The core component of our proposal is a query rewriting model that considers a set of transformation rules devised from a pragmatic point of view. The feasibility of our scheme has been validated with queries defined in well known benchmarks and SPARQL endpoint logs, as the obtained results confirm

    A survey of RDB to RDF translation approaches and tools

    Get PDF
    ISRN I3S/RR 2013-04-FR 24 pagesRelational databases scattered over the web are generally opaque to regular web crawling tools. To address this concern, many RDB-to-RDF approaches have been proposed over the last years. In this paper, we propose a detailed review of seventeen RDB-to-RDF initiatives, considering end-to-end projects that delivered operational tools. The different tools are classified along three major axes: mapping description language, mapping implementation and data retrieval method. We analyse the motivations, commonalities and differences between existing approaches. The expressiveness of existing mapping languages is not always sufficient to produce semantically rich data and make it usable, interoperable and linkable. We therefore briefly present various strategies investigated in the literature to produce additional knowledge. Finally, we show that R2RML, the W3C recommendation for describing RDB to RDF mappings, may not apply to all needs in the wide scope of RDB to RDF translation applications, leaving space for future extensions

    Traductor de consultas SPARQL, formuladas sobre fuentes de datos incompletamente alineadas, que aporta una estimación de la calidad de la traducción.

    Get PDF
    147 p.Hoy en día existe en la Web un número cada vez mayor de conjuntos de datos enlazados de distinta procedencia, referentes a diferentes dominios y que se encuentran accesibles al público en general para ser libremente explotados. Esta tesis doctoral centra su estudio en el ámbito del procesamiento de consultas sobre dicha nube de conjuntos de datos enlazados, abordando las dificultades en su acceso por aspectos relacionados con su heterogeneidad. La principal contribución reside en el planteamiento de una nueva propuesta que permite traducir la consulta realizada sobre un conjunto de datos enlazado a otro sin que estos se encuentren completamente alineados y sin que el usuario tenga que conocer las características técnicas inherentes a cada fuente de datos. Esta propuesta se materializa en un traductor que transforma una consulta SPARQL, adecuadamente expresada en términos de los vocabularios utilizados en un conjunto de datos de origen, en otra consulta SPARQL adecuadamente expresada para un conjunto de datos objetivo que involucra diferentes vocabularios. La traducción se basa en alineaciones existentes entre términos en diferentes conjuntos de datos. Cuando el traductor no puede producir una consulta semánticamente equivalente debido a la escasez de alineaciones de términos, elsistema produce una aproximación semántica de la consulta para evitar devolver una respuesta vacía al usuario. La traducción a través de los distintos conjuntos de datos se logra gracias a la aplicación de un variado grupo de reglas de transformación. En esta tesis se han definido cinco tipos de reglas, dependiendo de la motivación de la transformación, que son: equivalencia, jerarquía, basadas en las respuestas de la consulta, basadas en el perfil de los recursos que aparecen en la consulta y basadas en las características asociadas a los recursos que aparecen en la consulta.Además, al no garantizar el traductor la preservación semántica debido a la heterogeneidad de los vocabularios se vuelve crucial el obtener una estimación de la calidad de la traducción producida. Por ello otra de las contribuciones relevantes de la tesis consiste en la definición del modo en que informar al usuario sobre la calidad de la consulta traducida, a través de dos indicadores: un factor de similaridad que se basa en el proceso de traducción en sí, y un indicador de calidad de los resultados, estimado gracias a un modelo predictivo.Finalmente, esta tesis aporta una demostración de la viabilidad estableciendo un marco de evaluación sobre el que se ha validado un prototipo del sistema
    corecore