118,574 research outputs found

    Spectral gene set enrichment (SGSE)

    Get PDF
    Motivation: Gene set testing is typically performed in a supervised context to quantify the association between groups of genes and a clinical phenotype. In many cases, however, a gene set-based interpretation of genomic data is desired in the absence of a phenotype variable. Although methods exist for unsupervised gene set testing, they predominantly compute enrichment relative to clusters of the genomic variables with performance strongly dependent on the clustering algorithm and number of clusters. Results: We propose a novel method, spectral gene set enrichment (SGSE), for unsupervised competitive testing of the association between gene sets and empirical data sources. SGSE first computes the statistical association between gene sets and principal components (PCs) using our principal component gene set enrichment (PCGSE) method. The overall statistical association between each gene set and the spectral structure of the data is then computed by combining the PC-level p-values using the weighted Z-method with weights set to the PC variance scaled by Tracey-Widom test p-values. Using simulated data, we show that the SGSE algorithm can accurately recover spectral features from noisy data. To illustrate the utility of our method on real data, we demonstrate the superior performance of the SGSE method relative to standard cluster-based techniques for testing the association between MSigDB gene sets and the variance structure of microarray gene expression data. Availability: http://cran.r-project.org/web/packages/PCGSE/index.html Contact: [email protected] or [email protected]

    Cataloguing tool for manipulation and enrichment of records from COBISS

    Get PDF
    Thesis presents cataloguing tool, which enables manipulation and enrichment of bibliographic records retrieved from COBISS. In the first step tool scrapes records from COBISS and prepares them in accordance with simplified FRBR conceptual model. In the next step, system enables cataloguer to check and enrich data from COBISS, using data retrieved from different data sources on the web. Our thesis includes presentation of problem domain, analysis of data sources, analyses of technical solutions for text manipulation and data integration. Final contribution is a web application prototype, which supports cataloguers work flow and enables him to check and integrate data in intuitive and user-friendly way. Thesis includes the evaluation of possible approach to FRBRization using our tool, and it also discusses the possible added value regarding better search options for the user

    A framework for supporting knowledge representation – an ontological based approach

    Get PDF
    Dissertação para obtenção do Grau de Mestre em Engenharia Electrotécnica e de ComputadoresThe World Wide Web has had a tremendous impact on society and business in just a few years by making information instantly available. During this transition from physical to electronic means for information transport, the content and encoding of information has remained natural language and is only identified by its URL. Today, this is perhaps the most significant obstacle to streamlining business processes via the web. In order that processes may execute without human intervention, knowledge sources, such as documents, must become more machine understandable and must contain other information besides their main contents and URLs. The Semantic Web is a vision of a future web of machine-understandable data. On a machine understandable web, it will be possible for programs to easily determine what knowledge sources are about. This work introduces a conceptual framework and its implementation to support the classification and discovery of knowledge sources, supported by the above vision, where such sources’ information is structured and represented through a mathematical vector that semantically pinpoints the relevance of those knowledge sources within the domain of interest of each user. The presented work also addresses the enrichment of such knowledge representations, using the statistical relevance of keywords based on the classical vector space model concept, and extending it with ontological support, by using concepts and semantic relations, contained in a domain-specific ontology, to enrich knowledge sources’ semantic vectors. Semantic vectors are compared against each other, in order to obtain the similarity between them, and better support end users with knowledge source retrieval capabilities

    Interlinking educational data to web of data

    Get PDF
    With the proliferation of educational data on the Web, publishing and interlinking eLearning resources have become an important issue nowadays. Educational resources are exposed under heterogeneous Intellectual Property Rights (IPRs) in different times and formats. Some resources are implicitly related to each other or to the interest, cultural and technical environment of learners. Linking educational resources to useful knowledge on the Web improves resource seeking. This becomes crucial for moving from current isolated eLearning repositories towards an open discovery space, including distributed resources irrespective of their geographic and system boundaries. Linking resources is also useful for enriching educational content, as it provides a richer context and other related information to both educators and learners. On the other hand, the emergence of the so-called "Linked Data" brings new opportunities for interconnecting different kinds of resources on the Web of Data. Using the Linked Data approach, data providers can publish structured data and establish typed links between them from various sources. To this aim, many tools, approaches and frameworks have been built to first expose the data as Linked Data formats and to second discover the similarities between entities in the datasets. The research carried out for this PhD thesis assesses the possibilities of applying the Linked Open Data paradigm to the enrichment of educational resources. Generally speaking, we discuss the interlinking educational objects and eLearning resources on the Web of Data focusing on existing schemas and tools. The main goals of this thesis are thus to cover the following aspects: -- Exposing the educational (meta)data schemas and particularly IEEE LOM as Linked Data -- Evaluating currently available interlinking tools in the Linked Data context -- Analyzing datasets in the Linked Open Data cloud, to discover appropriate datasets for interlinking -- Discussing the benefits of interlinking educational (meta)data in practice

    Discovering Links for Metadata Enrichment on Computer Science Papers

    Full text link
    At the very beginning of compiling a bibliography, usually only basic information, such as title, authors and publication date of an item are known. In order to gather additional information about a specific item, one typically has to search the library catalog or use a web search engine. This look-up procedure implies a manual effort for every single item of a bibliography. In this technical report we present a proof of concept which utilizes Linked Data technology for the simple enrichment of sparse metadata sets. This is done by discovering owl:sameAs links be- tween an initial set of computer science papers and resources from external data sources like DBLP, ACM and the Semantic Web Conference Corpus. In this report, we demonstrate how the link discovery tool Silk is used to detect additional information and to enrich an initial set of records in the computer science domain. The pros and cons of silk as link discovery tool are summarized in the end.Comment: 22 pages, 4 figures, 7 listings, presented at SWIB1

    Meta-Analysis Approach identifies Candidate Genes and associated Molecular Networks for Type-2 Diabetes Mellitus

    Get PDF
    Background Multiple functional genomics data for complex human diseases have been published and made available by researchers worldwide. The main goal of these studies is the detailed analysis of a particular aspect of the disease. Complementary, meta-analysis approaches try to extract supersets of disease genes and interaction networks by integrating and combining these individual studies using statistical approaches. Results Here we report on a meta-analysis approach that integrates data of heterogeneous origin in the domain of type-2 diabetes mellitus (T2DM). Different data sources such as DNA microarrays and, complementing, qualitative data covering several human and mouse tissues are integrated and analyzed with a Bootstrap scoring approach in order to extract disease relevance of the genes. The purpose of the meta-analysis is two-fold: on the one hand it identifies a group of genes with overall disease relevance indicating common, tissue-independent processes related to the disease; on the other hand it identifies genes showing specific alterations with respect to a single study. Using a random sampling approach we computed a core set of 213 T2DM genes across multiple tissues in human and mouse, including well-known genes such as Pdk4, Adipoq, Scd, Pik3r1, Socs2 that monitor important hallmarks of T2DM, for example the strong relationship between obesity and insulin resistance, as well as a large fraction (128) of yet barely characterized novel candidate genes. Furthermore, we explored functional information and identified cellular networks associated with this core set of genes such as pathway information, protein-protein interactions and gene regulatory networks. Additionally, we set up a web interface in order to allow users to screen T2DM relevance for any – yet non-associated – gene. Conclusion In our paper we have identified a core set of 213 T2DM candidate genes by a meta-analysis of existing data sources. We have explored the relation of these genes to disease relevant information and – using enrichment analysis – we have identified biological networks on different layers of cellular information such as signaling and metabolic pathways, gene regulatory networks and protein-protein interactions. The web interface is accessible via http://t2dm-geneminer.molgen.mpg.de webcite

    Linked Data approach for selection process automation in Systematic Reviews

    Get PDF
    Background: a systematic review identifies, evaluates and synthesizes the available literature on a given topic using scientific and repeatable methodologies. The significant workload required and the subjectivity bias could affect results. Aim: semi-automate the selection process to reduce the amount of manual work needed and the consequent subjectivity bias. Method: extend and enrich the selection of primary studies using the existing technologies in the field of Linked Data and text mining. We define formally the selection process and we also develop a prototype that implements it. Finally, we conduct a case study that simulates the selection process of a systematic literature published in literature. Results: the process presented in this paper could reduce the work load of 20% with respect to the work load needed in the fully manually selection, with a recall of 100%. Conclusions: the extraction of knowledge from scientific studies through Linked Data and text mining techniques could be used in the selection phase of the systematic review process to reduce the work load and subjectivity bia

    Expanding sensor networks to automate knowledge acquisition

    Get PDF
    The availability of accurate, low-cost sensors to scientists has resulted in widespread deployment in a variety of sporting and health environments. The sensor data output is often in a raw, proprietary or unstructured format. As a result, it is often difficult to query multiple sensors for complex properties or actions. In our research, we deploy a heterogeneous sensor network to detect the various biological and physiological properties in athletes during training activities. The goal for exercise physiologists is to quickly identify key intervals in exercise such as moments of stress or fatigue. This is not currently possible because of low level sensors and a lack of query language support. Thus, our motivation is to expand the sensor network with a contextual layer that enriches raw sensor data, so that it can be exploited by a high level query language. To achieve this, the domain expert specifies events in a tradiational event-condition-action format to deliver the required contextual enrichment
    corecore