11 research outputs found

    A Semantic Problem Solving Environment for Integrative Parasite Research: Identification of Intervention Targets for Trypanosoma cruzi

    Get PDF
    Effective research in parasite biology requires analyzing experimental lab data in the context of constantly expanding public data resources. Integrating lab data with public resources is particularly difficult for biologists who may not possess significant computational skills to acquire and process heterogeneous data stored at different locations. Therefore, we develop a semantic problem solving environment (SPSE) that allows parasitologists to query their lab data integrated with public resources using ontologies. An ontology specifies a common vocabulary and formal relationships among the terms that describe an organism, and experimental data and processes in this case. SPSE supports capturing and querying provenance information, which is metadata on the experimental processes and data recorded for reproducibility, and includes a visual query-processing tool to formulate complex queries without learning the query language syntax. We demonstrate the significance of SPSE in identifying gene knockout targets for T. cruzi. The overall goal of SPSE is to help researchers discover new or existing knowledge that is implicitly present in the data but not always easily detected. Results demonstrate improved usefulness of SPSE over existing lab systems and approaches, and support for complex query design that is otherwise difficult to achieve without the knowledge of query language syntax

    A Semantic Problem Solving Environment for Integrative Parasite Research: Identification of Intervention Targets for \u3cem\u3eTrypanosoma cruzi\u3c/em\u3e

    No full text
    Background: Research on the biology of parasites requires a sophisticated and integrated computational platform to query and analyze large volumes of data, representing both unpublished (internal) and public (external) data sources. Effective analysis of an integrated data resource using knowledge discovery tools would significantly aid biologists in conducting their research, for example, through identifying various intervention targets in parasites, and in deciding the future direction of ongoing as well as planned projects. A key challenge in achieving this objective is the heterogeneity between the internal lab data, usually stored as flat files, Excel spreadsheets or custom-built databases, and the external databases. Reconciling the different forms of heterogeneity and effectively integrating data from disparate sources is a nontrivial task for biologists and requires a dedicated informatics infrastructure. Thus, we developed an integrated environment using Semantic Web technologies that may provide biologists the tools for managing and analyzing their data, without the need for acquiring in-depth computer science knowledge. Methodology/Principle Findings: We developed a semantic problem-solving environment (SPSE) that uses ontologies to integrate internal lab data with external resources in a Parasite Knowledge Base (PKB), which has the ability to query across these resources in a unified manner. The SPSE includes Web Ontology Language (OWL)-based ontologies, experimental data with its provenance information represented using the Resource Description Format (RDF), and a visual querying tool, Cuebee, that features integrated use of Web services. We demonstrate the use and benefit of SPSE using example queries for identifying gene knockout targets of Trypanosoma cruzi for vaccine development. Answers to these queries involve looking up multiple sources of data, linking them together and presenting the results. Conclusion/Significance: The SPSE facilitates parasitologists in leveraging the growing, but disparate, parasite data resources by offering an integrative platform that utilizes Semantic Web techniques, while keeping their workload increase minimal

    A Proposal for Classifying IoT Resources Exploring the Treatment of Uncertainty in Multicriteria Decision.

    Get PDF
    The Internet of things (IoT) is characterized by networked resources, most of which offer more than one service. The classification process for selecting the resource whose services best meet a client’s request constitutes a research front with several current and internationally relevant challenges. Among the challenges we highlight the high scalability, the dynamism with which the resources enter and leave the environment, as well as the client’s uncertainties when specifying their preferences. Thus, due to this high scalability of the IoT, a client request can return hundreds, or even thousands of resources that meet the specifications of the request considering its functional parameters. In turn, the literature review made in this Thesis identified the evaluation of non-functional parameters as a recurring technique in the various proposals for classification of resources. Considering this scenario, this Thesis has as main objective to present a proposal for classification of discovered resources in an autonomous way, called EXEHDA-RR, which is based on conception: (i) a decision approach that considers multiple criteria; and (ii) the treatment of uncertainty when defining the different parameters considered. Another aspect contemplated in this Thesis, is related to the minimization of the computational costs involved in the procedure of classification of resources in dynamic environments such as IoT, where the entry and/or exit of resources in the computing environment can occur at all times. The EXEHDA-RR evaluation was performed using usage scenarios and the resource specification was extracted from the QWS dataset, which considers real resources made available on the Internet. The results obtained show that it is possible to offer clients the most suitable resources, according to the specified preference. The use of Fuzzy Logic, among other aspects, made it possible for clients to specify their preferences more comfortably, using linguistic terms such as low, medium and high, minimizing uncertainties. The results achieved with the usage scenarios validate the proposal and point to the continuity of the research in future works.Sem bolsaA Internet das coisas (IoT) é caracterizada por recursos conectados em rede, a maioria destes disponibilizando mais de um serviço. O processo de classificação para selecionar o recurso cujos serviços melhor atendem a requisição de um cliente constitui uma frente de pesquisa com diversos desafios atuais e de relevância internacional. Dentre os desafios destaca-se a elevada escalabilidade, a dinamicidade com que os recursos entram e saem do ambiente, bem como as incertezas do cliente no momento de especificar suas preferências. Assim, devido a esta elevada escalabilidade da IoT, uma requisição de cliente pode retornar centenas, ou até milhares de recursos que atendem as especificações da requisição considerando seus parâmetros funcionais. Por sua vez, a revisão de literatura feita nesta Tese identificou a avaliação de parâmetros não funcionais como técnica recorrente nas diversas propostas para classificação de recursos. Considerando este cenário, esta Tese tem por objetivo central apresentar uma proposta para classificação de recursos descobertos de forma autônoma, denominado EXEHDA-RR, que tem por base de concepção: (i) uma abordagem de decisão que pondera múltiplos critérios; e (ii) o tratamento da incerteza quando da definição dos diferentes parâmetros considerados. Outro aspecto contemplado nesta Tese, está relacionado a minimização dos custos computacionais envolvidos no procedimento de classificação de recursos em ambientes dinâmicos como a IoT, onde a entrada e/ou a saída de recursos no ambiente computacional podem ocorrer a todo momento. A avaliação do EXEHDA-RR foi realizada por meio de cenários de uso e a especificação dos recursos foi extraída do dataset QWS, que considera recursos reais disponibilizados na Internet. Os resultados obtidos mostram que é possível oferecer aos clientes os recursos mais adequados, de acordo com a preferência especificada. O emprego da Lógica Fuzzy dentre outros aspectos, possibilitou que a especificação das preferências por parte do cliente possa ser feita, de modo mais confortável, por meio de termos linguísticos, tais como, baixo, médio e alto, minimizando incertezas. Os resultados alcançados com os cenários de uso validam a proposta e apontam para a continuidade da pesquisa em trabalhos futuros

    Screenshot of the Cuebee interface for query formulation.

    No full text
    <p>The row contains a triple (subject-predicate-object) that is required to formulate the query. The expressions over arrows represent relationships (predicates) that link the subject and object. The query formulation is initiated by first selecting the server (PE All Datasets). If the users know which particular datasets will be used for the query, they can select dataset there, such as microarray dataset, gene knockout dataset, etc. However, if the users are not sure about this, then they can select PE All Datasets, and Cuebee will try to find answers using all the datasets in Parasite Knowledge Base (PKB). Users then begin to type in the search field and Cuebee provides suggestions matching the first letters typed in a drop-down list. In this case “Microarray Analysis” is selected. The users can select specific instance of Microarray Analysis if known. Else, users can select “any_Microarray_analysis”. This will let Cuebee find answers using all the microarray data. Cuebee provides definitions on each concept (under Class Description) and more information about relationships (under Relations) as shown for the concept “gene” in this figure. Relationships that have asterix in front means that they are directly associated with the concept “gene” where “gene” acts as a subject of the triple. This information comes from the ontology, PEO in this case. Once the desired query is formulated, the users can click on Search and Cuebee will provide results under Specific results or General results section. Users can also query on the results of their first query using Refine button. The video demo on querying using Cuebee is available at: <a href="http://wiki.knoesis.org/index.php/Manuscript_Details" target="_blank">http://wiki.knoesis.org/index.php/Manuscript_Details</a>.</p

    New Web forms that store the data in a RDF subject-predicate-object (i.e., triple) format providing opportunity to relate the data to ontology concepts.

    No full text
    <p>Storing the data using these Web forms has no impact on the front-end user experience, but it offers extended querying functionality through the use of ontology concepts. Provenance information added through these Web forms is instantly available for querying.</p

    Screenshot of the Cuebee interface after formulation of the query 1, “List the genes that are downregulated in the epimastigote stage and exist in a single metabolic pathway.”

    No full text
    <p>Each row contains triples that are required to formulate the query. The query formulation is initiated by first selecting the server (PE All Datasets). After selecting the dataset, users begin to type in the search field and Cuebee provides suggestions matching the first letters typed in a drop down list. In this case “Microarray Analysis” is selected, and the query was limited to microarray analysis data pertaining to only “epimastigote” lifecycle stage of the parasite using filtering function of Cuebee. The triples are then extended as shown to achieve the desired query. The query uses “Group by” function of Cuebee to group all the epimastigote genes associated with a single metabolic pathway and “Refine” function to identify only those genes from the group that are downregulated; i.e, with log2 ratio less than −1. Specific results show a part of the results that include gene information from microarray lab data and pathway information from KEGG where each pathway ID represents specific pathway in KEGG.</p

    Domain-Specific Use Cases for Knowledge-Enabled Social Media Analysis

    No full text
    Social media provides a virtual platform for users to share and discuss their daily life, activities, opinions, health, feelings, etc. Such personal accounts readily generate Big Data marked by velocity, volume, value, variety, and veracity challenges. This type of Big Data analytics already supports useful investigations ranging from research into data mining and developing public policy to actions targeting an individual in a variety of domains such as branding and marketing, crime and law enforcement, crisis monitoring and management, as well as public and personalized health management. However, using social media to solve domain-specific problem is challenging due to complexity of the domain, lack of context, colloquial nature of language, and changing topic relevance in temporally dynamic domain. In this article, we discuss the need to go beyond data-driven machine learning and natural language processing, and incorporate deep domain knowledge as well as knowledge of how experts and decision makers explore and perform contextual interpretation. Four use cases are used to demonstrate the role of domain knowledge in addressing each challenge
    corecore