160 research outputs found

    Analysis on Partial Relationship in LOD

    Get PDF
    Relationships play a key role in Semantic Web to connect the dots between entities (concepts or instances) in a way that enables to absorb the real sense of the entities. Some interesting relationships would give proof for the existence of subject and object in triples which in tern can be defined as evidential relationships. Identifying evidential relationships will yield solutions to some existing inference problems and open doors for new applications and research. Part_of relationships are identified as a special kind of an evidential relationship out of membership, causality and etc. Linked Open data as a global data space would provide a good platform to explore these relationships and solve interesting inference problems. But this is not trivial because LOD does not have a rich schema in terms of the data sets and also the existing work with respect to schema mapping in LOD is limited to concepts and not relationships. This project is based on finding a novel approach to identify partial relationships which is the superset of part_of relationships from LOD instance data by conducting a proper analysis of the data patterns in instance data. Ultimately this approach would provide a way to enhance the shallow schemas in LOD which in tern would be helpful in schema matching in LOD. We apply the determined approach to the DBpedia data set in order to identify the partial relationships in DBpedia

    Foundational Ontologies meet Ontology Matching: A Survey

    Get PDF
    Ontology matching is a research area aimed at finding ways to make different ontologies interoperable. Solutions to the problem have been proposed from different disciplines, including databases, natural language processing, and machine learning. The role of foundational ontologies for ontology matching is an important one. It is multifaceted and with room for development. This paper presents an overview of the different tasks involved in ontology matching that consider foundational ontologies. We discuss the strengths and weaknesses of existing proposals and highlight the challenges to be addressed in the future

    Knowledge Organization Systems (KOS) in the Semantic Web: A Multi-Dimensional Review

    Full text link
    Since the Simple Knowledge Organization System (SKOS) specification and its SKOS eXtension for Labels (SKOS-XL) became formal W3C recommendations in 2009 a significant number of conventional knowledge organization systems (KOS) (including thesauri, classification schemes, name authorities, and lists of codes and terms, produced before the arrival of the ontology-wave) have made their journeys to join the Semantic Web mainstream. This paper uses "LOD KOS" as an umbrella term to refer to all of the value vocabularies and lightweight ontologies within the Semantic Web framework. The paper provides an overview of what the LOD KOS movement has brought to various communities and users. These are not limited to the colonies of the value vocabulary constructors and providers, nor the catalogers and indexers who have a long history of applying the vocabularies to their products. The LOD dataset producers and LOD service providers, the information architects and interface designers, and researchers in sciences and humanities, are also direct beneficiaries of LOD KOS. The paper examines a set of the collected cases (experimental or in real applications) and aims to find the usages of LOD KOS in order to share the practices and ideas among communities and users. Through the viewpoints of a number of different user groups, the functions of LOD KOS are examined from multiple dimensions. This paper focuses on the LOD dataset producers, vocabulary producers, and researchers (as end-users of KOS).Comment: 31 pages, 12 figures, accepted paper in International Journal on Digital Librarie

    Deliverable D4.2 User profile schema and profile capturing

    Get PDF
    This deliverable presents methods employed in LinkedTV to create, update and formalise a semantic user model that will be used for concept and content filtering. It focuses on the ex-traction of lightweight and dense implicit knowledge about user preferences. This process includes the semantic interpretation of information that stem from the user’s interaction with the content, together with the estimation of the impact that pre-ferred concepts have for each specific interaction based on the type of transaction and the user’s physical reaction to the con-tent. User preferences are then updated based on their age, frequency of appearance and utility, while persistent associa-tions between preferences are learnt. This information evolves to a semantic user model that is made available for predictive inference about relevant concepts and content

    Focused categorization power of ontologies: General framework and study on simple existential concept expressions

    Get PDF
    When reusing existing ontologies for publishing a dataset in RDF (or developing a new ontology), preference may be given to those providing extensive subcategorization for important classes (denoted as focus classes). The subcategories may consist not only of named classes but also of compound class expressions. We define the notion of focused categorization power of a given ontology, with respect to a focus class and a concept expression language, as the (estimated) weighted count of the categories that can be built from the ontology’s signature, conform to the language, and are subsumed by the focus class. For the sake of tractable initial experiments we then formulate a restricted concept expression language based on existential restrictions, and heuristically map it to syntactic patterns over ontology axioms (so-called FCE patterns). The characteristics of the chosen concept expression language and associated FCE patterns are investigated using three different empirical sources derived from ontology collections: first, the concept expression pattern frequency in class definitions; second, the occurrence of FCE patterns in the Tbox of ontologies; and last, for class expressions generated from the Tbox of ontologies (through the FCE patterns); their ‘meaningfulness’ was assessed by different groups of users, yielding a ‘quality ordering’ of the concept expression patterns. The complementary analyses are then compared and summarized. To allow for further experimentation, a web-based prototype was also implemented, which covers the whole process of ontology reuse from keyword-based ontology search through the FCP computation to the selection of ontologies and their enrichment with new concepts built from compound expressions

    A structural and quantitative analysis of the webof linked data and its components to perform retrieval data

    Get PDF
    Esta investigación consiste en un análisis cuantitativo y estructural de la Web of Linked Data con el fin de mejorar la búsqueda de datos en distintas fuentes. Para obtener métricas cuantitativas de la Web of Linked Data, se aplicarán técnicas estadísticas. En el caso del análisis estructural haremos un Análisis de Redes Sociales (ARS). Para tener una idea de la Web of Linked Data para poder hacer un análisis, nos ayudaremos del diagrama de la Linking Open Data (LOD) cloud. Este es un catálogo online de datasets cuya información ha sido publicada usando técnicas de Linked Data. Los datasets son publicados en un lenguaje llamado Resource Description Framework (RDF), el cual crea enlaces entre ellos para que la información pudiera ser reutilizada. El objetivo de obtener un análisis cuantitativo y estructural de la Web of Linked Data es mejorar las búsquedas de datos. Para ese propósito nosotros nos aprovecharemos del uso del lenguaje de marcado Schema.org y del proyecto Linked Open Vocabularies (LOV). Schema.org es un conjunto de etiquetas cuyo objetivo es que los Webmasters pudieran marcar sus propias páginas Web con microdata. El microdata es usado para ayudar a los motores de búsqueda y otras herramientas Web a entender mejor la información que estas contienen. LOV es un catálogo para registrar los vocabularios que usan los datasets de la Web of Linked Data. Su objetivo es proporcionar un acceso sencillo a dichos vocabularios. En la investigación, vamos a desarrollar un estudio para la obtención de datos de la Web of Linked Data usando las fuentes mencionadas anteriormente con técnicas de “ontology matching”. En nuestro caso, primeros vamos a mapear Schema.org con LOV, y después LOV con la Web of Linked Data. Un ARS de LOV también ha sido realizado. El objetivo de dicho análisis es obtener una idea cuantitativa y cualitativa de LOV. Sabiendo esto podemos concluir cosas como: cuales son los vocabularios más usados o si están especializados en algún campo o no. Estos pueden ser usados para filtrar datasets o reutilizar información

    Methods for Matching of Linked Open Social Science Data

    Get PDF
    In recent years, the concept of Linked Open Data (LOD), has gained popularity and acceptance across various communities and domains. Science politics and organizations claim that the potential of semantic technologies and data exposed in this manner may support and enhance research processes and infrastructures providing research information and services. In this thesis, we investigate whether these expectations can be met in the domain of the social sciences. In particular, we analyse and develop methods for matching social scientific data that is published as Linked Data, which we introduce as Linked Open Social Science Data. Based on expert interviews and a prototype application, we investigate the current consumption of LOD in the social sciences and its requirements. Following these insights, we first focus on the complete publication of Linked Open Social Science Data by extending and developing domain-specific ontologies for representing research communities, research data and thesauri. In the second part, methods for matching Linked Open Social Science Data are developed that address particular patterns and characteristics of the data typically used in social research. The results of this work contribute towards enabling a meaningful application of Linked Data in a scientific domain

    Financial decision-making process based on unstructured data sources and domain ontologies

    Get PDF
    Nowadays a great number of financial decisions arrive from watching the information stream, selecting relevant data, analysing it and acting accordingly. With the increasing global competition, the need for swift data analysis, high accuracy and quality becomes a must. For the majority of financial analysts, the main source for information is in the form of structured data. Such data can be easily processed and acted upon. However, there are vast amounts of knowledge that still can not be easily digested by computers, but have a great importance in our everyday life. For instance, (i) news are describing events and changes to the state of the world, (ii) columnists’ opinions are providing arguments that are shaping our thoughts or (iii) experts’ conclusions are influencing people decisions. This thesis main objective is to employ unstructured data in the financial decision-making process, with the support of ontologies as the main backbone for knowledge representation. The whole financial-making process is contextualised in the scope of the Spanish market, where the main source of data is news and company disclosures published in the Spanish language. The main contribution of this thesis is the creation of the Decision Support System (DSS) that follows a novel approach to incorporate unstructured data and domain (financial) ontologies into the automated financial decision-making process. Our approach employs Natural Language Processing (NLP) as means for extracting relevant information from unstructured sources. Moreover, semantics is applied thoroughly, not only in the process of information extraction but also in the knowledge modelling and the decision support.Hoy en día un gran número de decisiones financieras son tomadas analizando el flujo de información, seleccionando los datos pertinentes, y finalmente actuando dependiendo del resultado de dicho análisis. Con el crecimiento de la competencia global, un análisis de datos rápido y con alta precisión y calidad se convierte en una necesidad. Para la mayoría de los analistas financieros, la fuente principal de información está formado por datos estructurados, tales datos se pueden ser procesados y gestionados fácilmente. Sin embargo, estos datos contienen grandes cantidades de conocimiento que todavía no pueden ser fácilmente digeridos por los ordenadores y tienen una gran importancia para los analistas, por ejemplo (i) las noticias, que describen eventos y cambios en el mundo, (ii) las opiniones de los columnistas, que proporcionan argumentos que están moldeando nuestros pensamientos o (iii) las conclusiones de los expertos, que influyen en las decisiones de las personas. El objetivo principal de la tesis es emplear datos no estructurados en el proceso de toma de decisiones financieras, con el apoyo de las ontologías como la principal columna vertebral para la representación del conocimiento. Todo el proceso de toma de decisiones financieras se contextualiza en el ámbito del mercado español, donde la principal fuente de datos son las noticias y divulgaciones publicadas en la lengua española. La principal contribución de esta tesis es la creación del Sistema de Soporte a la Decisión (DSS) que sigue un novedoso enfoque para incorporar datos no estructurados y ontologías de dominio (financiero) en un proceso automatizado de toma de decisiones financieras. El proceso emplea procesamiento de lenguaje natural (PLN) como medio para extraer información relevante de fuentes no estructuradas. Por otra parte,la semántica se aplica a fondo, no sólo en el proceso de extracción de información, sino también en el modelado del conocimiento y el apoyo a la toma de decisión.Programa Oficial de Doctorado en Ciencia y Tecnología InformáticaPresidente: Antonio Bibiloni Coll.- Secretario: Israel González Carrasco.- Vocal: Inmaculada Puebla Sánche

    Bioinformatics assisted breeding, from QTL to candidate genes

    Get PDF
    Over the last decade, the amount of data generated by a single run of a NGS sequencer outperforms days of work done with Sanger sequencing. Metabolomics, proteomics and transcriptomics technologies have also involved producing more and more information at an ever faster rate. In addition, the number of databases available to biologists and breeders is increasing every year. The challenge for them becomes two-fold, namely: to cope with the increased amount of data produced by these new technologies and to cope with the distribution of the information across the Web. An example of a study with a lot of ~omics data is described in Chapter 2, where more than 600 peaks have been measured using liquid chromatography mass-spectrometry (LCMS) in peel and flesh of a segregating F1apple population. In total, 669 mQTL were identified in this study. The amount of mQTL identified is vast and almost overwhelming. Extracting meaningful information from such an experiment requires appropriate data filtering and data visualization techniques. The visualization of the distribution of the mQTL on the genetic map led to the discovery of QTL hotspots on linkage group: 1, 8, 13 and 16. The mQTL hotspot on linkage group 16 was further investigated and mainly contained compounds involved in the phenylpropanoid pathway. The apple genome sequence and its annotation were used to gain insight in genes potentially regulating this QTL hotspot. This led to the identification of the structural gene leucoanthocyanidin reductase (LAR1) as well as seven genes encoding transcription factors as putative candidates regulating the phenylpropanoid pathway, and thus candidates for the biosynthesis of health beneficial compounds. However, this study also indicated bottlenecks in the availability of biologist-friendly tools to visualize large-scale QTL mapping results and smart ways to mine genes underlying QTL intervals. In this thesis, we provide bioinformatics solutions to allow exploration of regions of interest on the genome more efficiently. In Chapter 3, we describe MQ2, a tool to visualize results of large-scale QTL mapping experiments. It allows biologists and breeders to use their favorite QTL mapping tool such as MapQTL or R/qtl and visualize the distribution of these QTL among the genetic map used in the analysis with MQ2. MQ2provides the distribution of the QTL over the markers of the genetic map for a few hundreds traits. MQ2is accessible online via its web interface but can also be used locally via its command line interface. In Chapter 4, we describe Marker2sequence (M2S), a tool to filter out genes of interest from all the genes underlying a QTL. M2S returns the list of genes for a specific genome interval and provides a search function to filter out genes related to the provided keyword(s) by their annotation. Genome annotations often contain cross-references to resources such as the Gene Ontology (GO), or proteins of the UniProt database. Via these annotations, additional information can be gathered about each gene. By integrating information from different resources and offering a way to mine the list of genes present in a QTL interval, M2S provides a way to reduce a list of hundreds of genes to possibly tens or less of genes potentially related to the trait of interest. Using semantic web technologies M2S integrates multiple resources and has the flexibility to extend this integration to more resources as they become available to these technologies. Besides the importance of efficient bioinformatics tools to analyze and visualize data, the work in Chapter 2also revealed the importance of regulatory elements controlling key genes of pathways. The limitation of M2S is that it only considers genes within the interval. In genome annotations, transcription factors are not linked to the trait (keyword) and to the gene it controls, and these relationships will therefore not be considered. By integrating information about the gene regulatory network of the organism into Marker2sequence, it should be able to integrate in its list of genes, genes outside of the QTL interval but regulated by elements present within the QTL interval. In tomato, the genome annotation already lists a number of transcription factors, however, it does not provide any information about their target. In Chapter 5, we describe how we combined transcriptomics information with six genotypes from an Introgression Line (IL) population to find genes differentially expressed while being in a similar genomic background (i.e.: outside of any introgression segments) as the reference genotype (with no introgression). These genes may be differentially expressed as a result of a regulatory element present in an introgression. The promoter regions of these genes have been analyzed for DNA motifs, and putative transcription factor binding sites have been found. The approaches taken in M2S (Chaper 4) are focused on a specific region of the genome, namely the QTL interval. In Chapter 6, we generalized this approach to develop Annotex. Annotex provides a simple way to browse the cross-references existing between biological databases (ChEBI, Rhea, UniProt, GO) and genome annotations. The main concept of Annotex being, that from any type of data present in the databases, one can navigate the cross-references to retrieve the desired type of information. This thesis has resulted in the production of three tools that biologists and breeders can use to speed up their research and build new hypothesis on. This thesis also revealed the state of bioinformatics with regards to data integration. It also reveals the need for integration into annotations (for example, genome annotations, protein annotations, and pathway annotations) of more ontologies than just the Gene Ontology (GO) currently used. Multiple platforms are arising to build these new ontologies but the process of integrating them into existing resources remains to be done. It also confirms the state of the data in plants where multiples resources may contain overlapping. Finally, this thesis also shows what can be achieved when the data is made inter-operable which should be an incentive to the community to work together and build inter-operable, non-overlapping resources, creating a bioinformatics Web for plant research.</p
    corecore