3,411 research outputs found

    Steps in Metagenomics: Let’s Avoid Garbage in and Garbage Out

    Get PDF
    Is metagenomics a revolution or a new fad? Metagenomics is tightly associated with the availability of next-generation sequencing in all its implementations. The key feature of these new technologies, moving beyond the Sanger-based DNA sequencing approach, is the depth of nucleotide sequencing per sample.1 Knowing much more about a sample changes the traditional paradigms of “What is the most abundant?” or “What is the most significant?” to “What is present and potentially sig­nificant that might influence the situation and outcome?” Let’s take the case of identifying proper biomarkers of disease state in the context of chronic disease prevention. Prevention has been deemed as a viable option to avert human chronic diseases and to curb health­care management costs.2 The actual implementation of any effective preventive measures has proven to be rather difficult. In addition to the typically poor compliance of the general public, the vagueness of the successful validation of habit modification on the long-term risk, points to the need of defining new biomarkers of disease state. Scientists and the public are accepting the fact that humans are super-organisms, harboring both a human genome and a microbial genome, the latter being much bigger in size and diversity, and key for the health of individuals.3,4 It is time to investigate the intricate relationship between humans and their associated microbiota and how this relationship mod­ulates or affects both partners.5 These remarks can be expanded to the animal and plant kingdoms, and holistically to the Earth’s biome. By its nature, the evolution and function of all the Earth’s biomes are influenced by a myriad of interactions between and among microbes (planktonic, in biofilms or host associated) and the surrounding physical environment. The general definition of metagenomics is the cultivation-indepen­dent analysis of the genetic information of the collective genomes of the microbes within a given environment based on its sampling. It focuses on the collection of genetic information through sequencing that can target DNA, RNA, or both. The subsequent analyses can be solely fo­cused on sequence conservation, phylogenetic, phylogenomic, function, or genetic diversity representation including yet-to-be annotated genes. The diversity of hypotheses, questions, and goals to be accomplished is endless. The primary design is based on the nature of the material to be analyzed and its primary function

    Enriched biodiversity data as a resource and service

    Get PDF
    Background: Recent years have seen a surge in projects that produce large volumes of structured, machine-readable biodiversity data. To make these data amenable to processing by generic, open source “data enrichment” workflows, they are increasingly being represented in a variety of standards-compliant interchange formats. Here, we report on an initiative in which software developers and taxonomists came together to address the challenges and highlight the opportunities in the enrichment of such biodiversity data by engaging in intensive, collaborative software development: The Biodiversity Data Enrichment Hackathon. Results: The hackathon brought together 37 participants (including developers and taxonomists, i.e. scientific professionals that gather, identify, name and classify species) from 10 countries: Belgium, Bulgaria, Canada, Finland, Germany, Italy, the Netherlands, New Zealand, the UK, and the US. The participants brought expertise in processing structured data, text mining, development of ontologies, digital identification keys, geographic information systems, niche modeling, natural language processing, provenance annotation, semantic integration, taxonomic name resolution, web service interfaces, workflow tools and visualisation. Most use cases and exemplar data were provided by taxonomists. One goal of the meeting was to facilitate re-use and enhancement of biodiversity knowledge by a broad range of stakeholders, such as taxonomists, systematists, ecologists, niche modelers, informaticians and ontologists. The suggested use cases resulted in nine breakout groups addressing three main themes: i) mobilising heritage biodiversity knowledge; ii) formalising and linking concepts; and iii) addressing interoperability between service platforms. Another goal was to further foster a community of experts in biodiversity informatics and to build human links between research projects and institutions, in response to recent calls to further such integration in this research domain. Conclusions: Beyond deriving prototype solutions for each use case, areas of inadequacy were discussed and are being pursued further. It was striking how many possible applications for biodiversity data there were and how quickly solutions could be put together when the normal constraints to collaboration were broken down for a week. Conversely, mobilising biodiversity knowledge from their silos in heritage literature and natural history collections will continue to require formalisation of the concepts (and the links between them) that define the research domain, as well as increased interoperability between the software platforms that operate on these concepts

    Anti-Pattern Specification and Correction Recommendations for Semantic Cloud Services

    Get PDF
    Given the economic and technological advantages \ they offer, cloud services are increasing being offered by \ several cloud providers. However, the lack of standardized \ descriptions of cloud services hinders their discovery. \ In an effort to standardize cloud service descriptions, \ several works propose to use ontologies. Nevertheless, \ the adoption of any of the proposed ontologies \ calls for an evaluation to show its efficiency in cloud \ service discovery. Indeed, the existing cloud providers \ describe, their similar offered services in different ways. \ Thus, various existing works aim at standardizing the \ representation of cloud computing services by proposing \ ontologies. However, since the existing proposals \ were not evaluated, they might be less adopted and considered. \ Indeed, the ontology evaluation has a direct impact \ on its understandability and reusability. In this paper, \ we propose an evaluation approach to validate our \ proposed Cloud Service Ontology (CSO), to guarantee \ an adequate cloud service discovery. To this end, this \ paper has a three-fold contribution. First, we specify a \ set of patterns and anti-patterns in order to evaluate our \ CSO. Second, we define an anti-pattern detection algorithm \ based on SPARQL queries which provides a set of \ correction recommendations to help ontologists revise \ their ontology. Finally, tests were conducted in relation \ to: (i) the algorithm efficiency and (ii) anti-pattern detection \ of design anomalies as well as taxonomic and \ domain errors within CSO

    Language resources and linked data: a practical perspective

    Full text link
    Recently, experts and practitioners in language resources have started recognizing the benefits of the linked data (LD) paradigm for the representation and exploitation of linguistic data on the Web. The adoption of the LD principles is leading to an emerging ecosystem of multilingual open resources that conform to the Linguistic Linked Open Data Cloud, in which datasets of linguistic data are interconnected and represented following common vocabularies, which facilitates linguistic information discovery, integration and access. In order to contribute to this initiative, this paper summarizes several key aspects of the representation of linguistic information as linked data from a practical perspective. The main goal of this document is to provide the basic ideas and tools for migrating language resources (lexicons, corpora, etc.) as LD on the Web and to develop some useful NLP tasks with them (e.g., word sense disambiguation). Such material was the basis of a tutorial imparted at the EKAW’14 conference, which is also reported in the paper

    From Linguistic Resources to Ontology-Aware Terminologies: Minding the Representation Gap

    Get PDF
    Terminological resources have proven crucial in many applications ranging from Computer-Aided Translation tools to authoring software and multilingual and cross-lingual information retrieval systems. Nonetheless, with the exception of a few felicitous examples, such as the IATE (Interactive Terminology for Europe) Termbank, many terminological resources are not available in standard formats, such as Term Base eXchange (TBX), thus preventing their sharing and reuse. Yet, these terminologies could be improved associating the correspondent ontology-based information. The research described in the present contribution demonstrates the process and the methodologies adopted in the automatic conversion into TBX of such type of resources, together with their semantic enrichment based on the formalization of ontological information into terminologies. We present a proof-of-concept using the Italian Linguistic Resource for the Archaeological domain (developed according to Thesauri and Guidelines of the Italian Central Institute for the Catalogue and Documentation). Further, we introduce the conversion tool developed to support the process of creating ontology-aware terminologies for improving interoperability and sharing of existing language technologies and data set

    The BioLighthouse: Reusable Software Design for Bioinformatics

    Get PDF
    Advances in next-generation sequencing have accelerated the field of microbiology by making accessible a wealth of information about microbiomes. Unfortunately, microbiome experiments are among the least reproducible in terms of bioinformatics. Software tools are often poorly documented, under-maintained, and commonly have arcane dependencies requiring significant time investment to configure them correctly. Microbiome studies are multidisciplinary efforts but communication and knowledge discrepancies make accessibility, reproducibility, and transparency of computational workflows difficult. The BioLighthouse uses Ansible roles, playbooks, and modules to automate configuration and execution of bioinformatics workflows. The roles and playbooks act as virtual laboratory notebooks by documenting the provenance of a bioinformatics workflow. The BioLighthouse was tested for platform dependence and data-scale dependence with a microbial profiling pipeline. The microbial profiling pipeline consisted of Cutadapt, FLASH2, and DADA2. The pipeline was tested on 3 canola root and soil microbiome datasets with differing orders of magnitude of data: 1 sample, 10 samples, and 100 samples. Each dataset was processed by The BioLighthouse with 10 unique parameter sets and outputs were compared across 8 computing environments for a total of 240 pipeline runs. Outputs after each step in the pipeline were tested for identity using the Linux diff command to ensure reproducible results. Testing of The BioLighthouse suggested no platform or data-scale dependence. To provide an easy way of maintaining environment reproducibility in user-space, Conda and the channel Bioconda were used for virtual environments and software dependencies for configuring bioinformatics tools. The BioLighthouse provides a framework for developers to make their tools accessible to the research community, for bioinformaticians to build bioinformatics workflows, and for the broader research community to consume these tools at a high level while knowing the tools will execute as intended

    Ontology-based Access Control in Open Scenarios: Applications to Social Networks and the Cloud

    Get PDF
    La integració d'Internet a la societat actual ha fet possible compartir fàcilment grans quantitats d'informació electrònica i recursos informàtics (que inclouen maquinari, serveis informàtics, etc.) en entorns distribuïts oberts. Aquests entorns serveixen de plataforma comuna per a usuaris heterogenis (per exemple, empreses, individus, etc.) on es proporciona allotjament d'aplicacions i sistemes d'usuari personalitzades; i on s'ofereix un accés als recursos compartits des de qualsevol lloc i amb menys esforços administratius. El resultat és un entorn que permet a individus i empreses augmentar significativament la seva productivitat. Com ja s'ha dit, l'intercanvi de recursos en entorns oberts proporciona importants avantatges per als diferents usuaris, però, també augmenta significativament les amenaces a la seva privacitat. Les dades electròniques compartides poden ser explotades per tercers (per exemple, entitats conegudes com "Data Brokers"). Més concretament, aquestes organitzacions poden agregar la informació compartida i inferir certes característiques personals sensibles dels usuaris, la qual cosa pot afectar la seva privacitat. Una manera de del.liar aquest problema consisteix a controlar l'accés dels usuaris als recursos potencialment sensibles. En concret, la gestió de control d'accés regula l'accés als recursos compartits d'acord amb les credencials dels usuaris, el tipus de recurs i les preferències de privacitat dels propietaris dels recursos/dades. La gestió eficient de control d'accés és crucial en entorns grans i dinàmics. D'altra banda, per tal de proposar una solució viable i escalable, cal eliminar la gestió manual de regles i restriccions (en la qual, la majoria de les solucions disponibles depenen), atès que aquesta constitueix una pesada càrrega per a usuaris i administradors . Finalment, la gestió del control d'accés ha de ser intuïtiu per als usuaris finals, que en general no tenen grans coneixements tècnics.La integración de Internet en la sociedad actual ha hecho posible compartir fácilmente grandes cantidades de información electrónica y recursos informáticos (que incluyen hardware, servicios informáticos, etc.) en entornos distribuidos abiertos. Estos entornos sirven de plataforma común para usuarios heterogéneos (por ejemplo, empresas, individuos, etc.) donde se proporciona alojamiento de aplicaciones y sistemas de usuario personalizadas; y donde se ofrece un acceso ubicuo y con menos esfuerzos administrativos a los recursos compartidos. El resultado es un entorno que permite a individuos y empresas aumentar significativamente su productividad. Como ya se ha dicho, el intercambio de recursos en entornos abiertos proporciona importantes ventajas para los distintos usuarios, no obstante, también aumenta significativamente las amenazas a su privacidad. Los datos electrónicos compartidos pueden ser explotados por terceros (por ejemplo, entidades conocidas como “Data Brokers”). Más concretamente, estas organizaciones pueden agregar la información compartida e inferir ciertas características personales sensibles de los usuarios, lo cual puede afectar a su privacidad. Una manera de paliar este problema consiste en controlar el acceso de los usuarios a los recursos potencialmente sensibles. En concreto, la gestión de control de acceso regula el acceso a los recursos compartidos de acuerdo con las credenciales de los usuarios, el tipo de recurso y las preferencias de privacidad de los propietarios de los recursos/datos. La gestión eficiente de control de acceso es crucial en entornos grandes y dinámicos. Por otra parte, con el fin de proponer una solución viable y escalable, es necesario eliminar la gestión manual de reglas y restricciones (en la cual, la mayoría de las soluciones disponibles dependen), dado que ésta constituye una pesada carga para usuarios y administradores. Por último, la gestión del control de acceso debe ser intuitivo para los usuarios finales, que por lo general carecen de grandes conocimientos técnicos.Thanks to the advent of the Internet, it is now possible to easily share vast amounts of electronic information and computer resources (which include hardware, computer services, etc.) in open distributed environments. These environments serve as a common platform for heterogeneous users (e.g., corporate, individuals etc.) by hosting customized user applications and systems, providing ubiquitous access to the shared resources and requiring less administrative efforts; as a result, they enable users and companies to increase their productivity. Unfortunately, sharing of resources in open environments has significantly increased the privacy threats to the users. Indeed, shared electronic data may be exploited by third parties, such as Data Brokers, which may aggregate, infer and redistribute (sensitive) personal features, thus potentially impairing the privacy of the individuals. A way to palliate this problem consists on controlling the access of users over the potentially sensitive resources. Specifically, access control management regulates the access to the shared resources according to the credentials of the users, the type of resource and the privacy preferences of the resource/data owners. The efficient management of access control is crucial in large and dynamic environments such as the ones described above. Moreover, in order to propose a feasible and scalable solution, we need to get rid of manual management of rules/constraints (in which most available solutions rely) that constitutes a serious burden for the users and the administrators. Finally, access control management should be intuitive for the end users, who usually lack technical expertise, and they may find access control mechanism more difficult to understand and rigid to apply due to its complex configuration settings

    Knowledge Organization Systems (KOS) in the Semantic Web: A Multi-Dimensional Review

    Full text link
    Since the Simple Knowledge Organization System (SKOS) specification and its SKOS eXtension for Labels (SKOS-XL) became formal W3C recommendations in 2009 a significant number of conventional knowledge organization systems (KOS) (including thesauri, classification schemes, name authorities, and lists of codes and terms, produced before the arrival of the ontology-wave) have made their journeys to join the Semantic Web mainstream. This paper uses "LOD KOS" as an umbrella term to refer to all of the value vocabularies and lightweight ontologies within the Semantic Web framework. The paper provides an overview of what the LOD KOS movement has brought to various communities and users. These are not limited to the colonies of the value vocabulary constructors and providers, nor the catalogers and indexers who have a long history of applying the vocabularies to their products. The LOD dataset producers and LOD service providers, the information architects and interface designers, and researchers in sciences and humanities, are also direct beneficiaries of LOD KOS. The paper examines a set of the collected cases (experimental or in real applications) and aims to find the usages of LOD KOS in order to share the practices and ideas among communities and users. Through the viewpoints of a number of different user groups, the functions of LOD KOS are examined from multiple dimensions. This paper focuses on the LOD dataset producers, vocabulary producers, and researchers (as end-users of KOS).Comment: 31 pages, 12 figures, accepted paper in International Journal on Digital Librarie

    Towards a Taxonomy of Cognitive RPA Components

    Get PDF
    Robotic Process Automation (RPA) is a discipline that is increasingly growing hand in hand with Artificial Intelligence (AI) and Machine Learning enabling the so-called cognitive automation. In such context, the existing RPA platforms that include AI-based solutions clas sify their components, i.e. constituting part of a robot that performs a set of actions, in a way that seems to obey market or business deci sions instead of common-sense rules. To be more precise, components that present similar functionality are identified with different names and grouped in different ways depending on the platform that provides the components. Therefore, the analysis of different cognitive RPA platforms to check their suitability for facing a specific need is typically a time consuming and error-prone task. To overcome this problem and to pro vide users with support in the development of an RPA project, this paper proposes a method for the systematic construction of a taxonomy of cognitive RPA components. Moreover, such a method is applied over components that solve selected real-world use cases from the industry obtaining promising resultsMinisterio de Economía y Competitividad TIN2016-76956-C3-2-RJunta de Andalucía CEI-12-TIC021Centro para el Desarrollo Tecnológico Industrial P011-19/E0
    corecore