10 research outputs found

    Measuring semantic distance for linked open data-enabled recommender systems

    Get PDF
    The Linked Open Data (LOD) initiative has been quite successful in terms of publishing and interlinking data on the Web. On top of the huge amount of interconnected data, measuring relatedness between resources and identifying their relatedness could be used for various applications such as LOD-enabled recommender systems. In this paper, we propose various distance measures, on top of the basic concept of Linked Data Semantic Distance (LDSD), for calculating Linked Data semantic distance between resources that can be used in a LOD-enabled recommender system. We evaluated the distance measures in the context of a recommender system that provides the top-N recommendations with baseline methods such as LDSD. Results show that the performance is significantly improved by our proposed distance measures incorporating normalizations that use both of the resources and global appearances of paths in a graph

    Security Aspects in Web of Data Based on Trust Principles. A brief of Literature Review

    Get PDF
    Within scientific community, there is a certain consensus to define "Big Data" as a global set, through a complex integration that embraces several dimensions from using of research data, Open Data, Linked Data, Social Network Data, etc. These data are scattered in different sources, which suppose a mix that respond to diverse philosophies, great diversity of structures, different denominations, etc. Its management faces great technological and methodological challenges: The discovery and selection of data, its extraction and final processing, preservation, visualization, access possibility, greater or lesser structuring, between other aspects, which allow showing a huge domain of study at the level of analysis and implementation in different knowledge domains. However, given the data availability and its possible opening: What problems do the data opening face? This paper shows a literature review about these security aspects

    Security Aspects in Web of Data Based on Trust Principles. A brief of Literature Review

    Get PDF
    Within scientific community, there is a certain consensus to define "Big Data" as a global set, through a complex integration that embraces several dimensions from using of research data, Open Data, Linked Data, Social Network Data, etc. These data are scattered in different sources, which suppose a mix that respond to diverse philosophies, great diversity of structures, different denominations, etc. Its management faces great technological and methodological challenges: The discovery and selection of data, its extraction and final processing, preservation, visualization, access possibility, greater or lesser structuring, between other aspects, that allow showing a huge domain of study at the level of analysis and implementation in different knowledge domains. However, given the data availability and its possible opening: What problems do the data opening face? This paper shows a literature review about these security aspects

    Interfaz gráfica para interactuar con resultados de evaluación orientada a sistemas de recomendación

    Full text link
    Se sabe que, en los últimos tiempos, con el uso masivo de internet, la importancia del análisis de los datos cada vez es mayor. Esto lo saben tanto investigadores como empresas, que cada vez recopilan más información de los usuarios para crear un perfil virtual de ellos. A través de estos perfiles, las empresas aportan sugerencias, que pueden ser productos de una tienda, artículos de un blog, etc. a los usuarios de forma precisa, con el uso de sistemas de recomendación. El crear un buen sistema de recomendación puede suponer una gran ventaja a las empresas, tanto en ventas, haciendo que un usuario compre otro libro similar al que acabe de leer o descubra un video que le pueda gustar, como fidelizando al cliente, consiguiendo que estos vuelvan a su negocio. Este Trabajo Fin de Grado se basa en el desarrollo de una aplicación para la evaluación de sistemas de recomendación. En ella se podrá tanto evaluar una ejecución, comparando distintas métricas y cutoffs, como comparar varias ejecuciones, de una forma visual y atractiva. La creación de una aplicación como esta puede ayudar a los desarrolladores de los sistemas de recomendación, ya que una buena evaluación de estos es crucial para saber cuán bueno es. Para el desarrollo de este trabajo se ha realizado un estudio de las formas de evaluación de los sistemas de recomendación, así como de la elección de las tecnologías usadas. La aplicación se ha desarrollado en JavaScript, tanto en la parte frontend, con React.js, como en la parte backend, utilizando el entorno Node.js. Dado que el entorno JavaScript no es muy demandado en este tipo de proyectos, se ha tenido que desarrollar todo desde cero; para ello, la realización de una buena batería de pruebas ha sido esencial

    Évaluation et amélioration de la qualité de DBpedia pour la représentation de la connaissance du domaine

    Get PDF
    RÉSUMÉ L’évolution récente du Web sémantique, tant par la quantité d’information offerte que par la multiplicité des usages possibles, rend indispensable l’évaluation de la qualité des divers ensembles de données (datasets) disponibles. Le Web sémantique étant basé sur la syntaxe RDF, i.e. des triplets (par exemple ), on peut le voir comme un immense graphe, où un triplet relie un nœud « sujet » et un nœud « objet » par une arête « relation ». Chaque dataset représente ainsi un sous-graphe. Dans cette représentation, DBpedia, un des datasets majeurs du Web sémantique, en est souvent considéré comme le nœud central. En effet, DBpedia a pour vocation, à terme, de pouvoir représenter toute l’information présente dans Wikipedia, et couvre donc une très grande variété de sujets, permettant de faire le lien avec tous les autres datasets, incluant les plus spécialisés. C’est de cette multiplicité des sujets couverts qu’apparait un point fondamental de ce projet : la notion de « domaine ». Informellement, nous considérons un domaine comme étant un ensemble de sujets reliés par une thématique commune. Par exemple, le domaine Mathématiques contient plusieurs sujets, comme algèbre, fonction ou addition. Formellement, nous considérons un domaine comme un sous-graphe de DBpedia, où l’on ne conserve que les nœuds représentant des concepts liés à ce domaine. En l’état actuel, les méthodes d’extraction de données de DBpedia sont généralement beaucoup moins efficaces lorsque le sujet est abstrait, conceptuel, que lorsqu’il s’agit d’une entité nommée, par exemple une personne, ville ou compagnie. Par conséquent, notre première hypothèse est que l’information disponible sur DBpedia liée à un domaine est souvent pauvre, car nos domaines sont essentiellement constitués de concepts abstraits. La première étape de ce travail de recherche fournit une évaluation de la qualité de l’information conceptuelle d’un ensemble de 17 domaines choisis semi-aléatoirement, et confirme cette hypothèse. Pour cela, nous identifions plusieurs axes permettant de chiffrer la « qualité » d’un domaine : 1 - nombre de liens entrants et sortants pour chaque concept, 2 - nombre de liens reliant deux concepts du domaine par rapport aux liens reliant le domaine au reste de DBpedia, 3 - nombre de concepts typés (i.e. représentant l’instance d’une classe, par exemple Addition est une instance de la classe Opération mathématique : le concept Addition est donc typé si la relation apparait dans DBpedia). Nous arrivons à la conclusion que l’information conceptuelle contenue dans DBpedia est effectivement incomplète, et ce selon les trois axes. La seconde partie de ce travail de recherche est de tenter de répondre au problème posé dans la première partie. Pour cela, nous proposons deux approches possibles. La première permet de fournir des classes potentielles, répondant en partie à la problématique de la quantité de concepts typés. La seconde utilise des systèmes d’extraction de relations à partir de texte (ORE – Open Relation Extraction) sur l’abstract (i.e. premier paragraphe de la page Wikipedia) de chaque concept. En classifiant les relations extraites, cela nous permet 1) de proposer des relations inédites entre concepts d’un domaine, 2) de proposer des classes potentielles, comme dans la première approche. Ces deux approches ne sont, en l’état, qu’un début de solution, mais nos résultats préliminaires sont très encourageants, et indiquent qu’il s’agit sans aucun doute de solutions pertinentes pour aider à corriger les problèmes démontrés dans la première partie.----------ABSTRACT In the current state of the semantic web, the quantity of available data and the multiplicity of its uses impose the continuous evaluation of the quality of this data, on the various Linked Open Data (LOD) datasets. These datasets are based on the RDF syntax, i.e. triples, such as . As a consequence, the LOD cloud can be represented as a huge graph, where every triple links the two nodes “subject” and “object”, by an edge “relation”. In this representation, each dataset is a sub-graph. DBpedia, one of the major datasets, is colloquially considered to be the central hub of this cloud. Indeed, the ultimate purpose of DBpedia is to provide all the information present in Wikipedia, “translated” into RDF, and therefore covers a wide range of domains, allowing a linkage with every other LOD dataset, including the most specialized. From this wide coverage arises one of the fundamental concepts of this project: the notion of “domain”. Informally, a domain is a set of subjects with a common thematic. For instance, the domain Mathematics contains several subjects such as algebra, function or addition. More formally, a domain is a sub-graph of DBpedia, where the nodes represent domain-related concepts. Currently, the automatic extraction methods for DBpedia are usually far less efficient when the target subject is conceptual than when it is a named entity (such as a person, city or company). Hence our first hypothesis: the domain-related information available on DBpedia is often poor, since domains are constituted of concepts. In the first part of this research project, we confirm this hypothesis by evaluating the quality of domain-related knowledge in DBpedia for 17 domains chosen semi-randomly. This evaluation is based on three numerical aspects of the “quality” of a domain: 1 – number of inbound and outbound links for each concepts, 2 – number of links between two domain concepts compared to the number of links between the domain and the rest of DBpedia, 3- number of typed concepts (i.e. representing the instance of a class : for example, Addition is an instance of the class Mathematical operation : the concept Addition is typed if the relation appears in DBpedia). We reach the conclusion that the domain-related, conceptual information present in DBpedia is indeed poor on the three axis. In the second half of this work, we give two solutions to the quality problem highlighted in the first half. The first one allows to propose potential classes that could be added in DBpedia, addressing the 3rd quality aspect: number of typed concepts. The second one uses an Open Relation Extraction (ORE) system that allows to detect relations in a text. By using this system on the abstract (i.e. the first paragraph of the Wikipedia page) of each concept, and classifying the extracted relation depending on their semantic meaning, we can 1) propose novel relations between domain concepts, and 2) propose additional potential classes. These two methods currently only represent the first step, but the preliminary results we obtain are very encouraging, and seem to indicate that they are absolutely relevant to help correcting the issues highlighted in the first part

    Exploiting Semantic Distance in Linked Open Data for Recommendation

    Get PDF
    The use of Linked Open Data (LOD) has been explored in recommender systems in different ways, primarily through its graphical representation. The graph structure of LOD is utilized to measure inter-resource relatedness via their semantic distance in the graph. The intuition behind this approach is that the more connected resources are to each other, the more related they are. One drawback of this approach is that it treats all inter-resource connections identically rather than prioritizing links that may be more important in semantic relatedness calculations. Another drawback of current approaches is that they only consider resources that are connected directly or indirectly through an intermediate resource only. In this document, we show that different types of inter-resource links hold different values for relatedness calculations between resources, and we exploit this observation to introduce improved resource semantic relatedness measures that are more accurate than the current state of the art approaches. Moreover, we introduce an approach to propagate current semantic distance approaches that does not only expand the coverage of current approaches, it also increases their accuracy. To validate the effectiveness of our approaches, we conducted several experiments to identify the relatedness between musical artists in DBpedia, and they demonstrated that approaches that prioritize link types resulted in more accurate recommendation results. Also, propagating semantic distances beyond one hub resources does not only result in an improved accuracy, it also shows that propagating semantic distances beyond one hub resources improves the coverage of LOD-based recommender systems
    corecore