8 research outputs found

    Semi‐automated workflows for acquiring specimen data from label images in herbarium collections

    Full text link
    Computational workflow environments are an active area of computer science and informatics research; they promise to be effective for automating biological information processing for increasing research efficiency and impact. In this project, semi‐automated data processing workflows were developed to test the efficiency of computerizing information contained in herbarium plant specimen labels. Our test sample consisted of mexican and Central American plant specimens held in the University of michigan Herbarium (MICH). The initial data acquisition process consisted of two parts: (1) the capture of digital images of specimen labels and of full‐specimen herbarium sheets, and (2) creation of a minimal field database, or "pre‐catalog", of records that contain only information necessary to uniquely identify specimens. For entering "pre‐catalog" data, two methods were tested: key‐stroking the information (a) from the specimen labels directly, or (b) from digital images of specimen labels. In a second step, locality and latitude/longitude data fields were filled in if the values were present on the labels or images. If values were not available, geo‐coordinates were assigned based on further analysis of the descriptive locality information on the label. Time and effort for the various steps were measured and recorded. Our analysis demonstrates a clear efficiency benefit of articulating a biological specimen data acquisition workflow into discrete steps, which in turn could be individually optimized. First, we separated the step of capturing data from the specimen from most keystroke data entry tasks. We did this by capturing a digital image of the specimen for the first step, and also by limiting initial key‐stroking of data to create only a minimal "pre‐catalog" database for the latter tasks. By doing this, specimen handling logistics were streamlined to minimize staff time and cost. Second, by then obtaining most of the specimen data from the label images, the more intellectually challenging task of label data interpretation could be moved electronically out of the herbarium to the location of more highly trained specialists for greater efficiency and accuracy. This project used experts in the plants’ country of origin, mexico, to verify localities, geography, and to derive geo‐coordinates. Third, with careful choice of data fields for the "pre‐catalog" database, specimen image files linked to the minimal tracking records could be sorted by collector and date of collection to minimize key‐stroking of redundant data in a continuous series of labels, resulting in improved data entry efficiency and data quality.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/146956/1/tax596014.pd

    Behavioral analysis of scientific workflows with semantic information

    Get PDF
    The recent development in scientific computing related areas has shown an increasing interest in scientific workflows because of their abilities to solve complex challenges. Problems and challenges that were too heavy or time-consuming can be solved now in a more efficient manner. Scientific workflows have been progressively improved by means of the introduction of new paradigms and technologies, being the semantic area one of the most promising ones. This paper focuses on the addition of semantic Web techniques to the scientific workflow area, which facilitates the integration of network-based solutions. On the other hand, a model checking technique to study the workflow behavior prior to its execution is also described. Using the Unary RDF annotated Petri net formalism (U-RDF-PN), scientific workflows can be improved by adding semantic annotations related to the task descriptions and workflow evolution. This technique can be applied using a complete environment for the model checking of this kind of workflows that is also depicted in this work. Finally, the proposed methodology is exemplified by its application to a couple of known scientific workflows: the First Provenance Challenge and the InterScan protein analysis workflow

    Mechanisms of semantic annotation for scientific workflows

    Get PDF
    Orientador: Claudia Maria Bauzer MedeirosDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: O compartilhamento de informações, processos e modelos de experimentos entre cientistas de diferentes organizações e domínios do conhecimento vem aumentando com a disponibilização dessas informações e modelos na Web. Muitos destes modelos de experimentos são descritos como workflows científicos. Entretanto, não existe uma padronização para a sua descrição, dificultando assim o reaproveitamento de workflows e seus componentes já existentes. A dissertação contribui para a solução deste problema com os seguintes resultados: a análise dos problemas relativos ao compartilhamento e projeto cooperativo de workflows científicos na Web, análise de aspectos de semântica e metadados relacionados a estes workflows, a disponibilização de um editor Web de workflows usando padrões WFMC e, o desenvolvimento de um modelo de anotação semântica para workflows científicos. Com isto, a dissertação cria a base para permitir a descoberta, reuso e compartilhamento de workflows científicos nas Web. O editor permite que pesquisadores construam seus workflows e anotações de forma online, e permite o consequente teste, com dados externos, do sistema de anotaçõesAbstract: The sharing of information, processes and models of experiments is increasing among scientists from many organizations and areas of knowledge, and thus there is a need for supply mechanisms of workflow discovery. Many of these models are described as scientific workflows. However, there is no default specification to describe them, which complicates the reuse of workflows and components that are available. This thesis contributes to solving this problem by presenting the following results: analysis of issues related to the sharing and cooperative design of scientific workflows on the Web; analysis of semantic aspects and metadata related to workflows, the development of a Web-based workflow editor, which incorporates our semantic annotation model for scientific workflows. Given these factors, this work creates the basis to allow the discovery, reuse and sharing of scientific workflows in the WebMestradoBanco de DadosMestre em Ciência da Computaçã

    SimiFlow: uma arquitetura para agrupamento de Workflows por similaridade

    Get PDF
    Os cientistas tem utilizado Sistemas de Gerência de Workflows Científicos (SGWfC) para apoiar experimentos científicos. Contudo, um SGWfC utiliza uma linguagem própria para a modelagem de um workflow, a ser futuramente executado. Os cientistas não possuem um auxílio ou orientação para obter o workflow modelado. As linhas de experimentos, que são uma nova abordagem para lidar com essas limitações, permitem uma representação abstrata e uma composição sistêmica dos experimentos. Dado que já existem muitos workflows científicos previamente modelados, os cientistas podem usá-los para alavancar a construção de novas representações abstratas. Esses experimentos anteriores podem ser úteis para formar uma estrutura abstrata, se conseguirmos agrupá-los por meio de critérios de similaridade. Esse projeto propõe a SimiFlow, que é uma arquitetura para comparação baseada na similaridade e agrupamento para construir linhas de experimentos através de uma abordagem ascendente

    "Model checking" paramétrico de "workflows" científicos

    Get PDF
    La computación científica ha ganado un creciente interés en los últimos años en áreas afines a las ciencias de la vida. Los workflows científicos son un tipo especial de workflow que se utilizan en escenarios de grandes dimensiones y gran complejidad computacional como modelos climáticos, estructuras biológicas, química, cirugía o simulación de desastres, por ejemplo, y cuya ejecución es un proceso que consume una gran cantidad de tiempo y recursos. Uno de los objetivos principales de la computación científica ha sido la mejora progresiva a través de la introducción de nuevos paradigmas y tecnologías para poder abordar desafíos cada vez más complejos, siendo uno de estos paradigmas la adición de aspectos semánticos a los workflows. Disponer de una serie de herramientas y técnicas que posibiliten el análisis del comportamiento del workflow antes de su ejecución resulta de gran interés. El objetivo de ese análisis es poder garantizar un comportamiento adecuado y correcto, así como verificar la correcta gestión y utilización de los recursos involucrados. El análisis debería permitir la predicción de la calidad de los resultados, así como identificar aquellos parámetros que son necesarios para obtener los resultados esperados. Desde el punto de vista del usuario, la incorporación de aspectos semánticos permite a los científicos realizar una navegación, interrogación, integración y composición de conjuntos de datos y servicios mucho más eficiente. Sin embargo, el análisis del estado del arte en el área de la semántica aplicada a los modelos en la computación científica muestra carencias significativas en el grado de madurez y aplicación de este enfoque, así como la carencia de técnicas y herramientas para su aplicación. Es necesario, por tanto, proponer y desarrollar nuevas técnicas de modelado y análisis que puedan manejar dichos aspectos semánticos. En este Trabajo Fin de Máster se aborda el análisis, diseño y desarrollo de un método y una herramienta de model checking basados en la introducción de aspectos y anotaciones semánticas tanto en los modelos como en las propiedades que deben verificarse. Como resultado, la herramienta COMBAS (COmprobador de Modelos BAsado en Semántica) proporciona un entorno de integración para la verificación de este tipo de modelos y la navegación por las estructuras resultantes del proceso. Para la descripción de los modelos de workflows científicos se ha utilizado una clase de Redes de Petri de alto nivel anotadas con información semántica en RDF, las U-RDF-PN. A lo largo de este trabajo se ha abordado la adición de las técnicas, metodologías y modelos necesarios para extender el framework con análisis paramétrico, que consiste en un análisis mucho más potente y expresivo mediante la utilización de parámetros cuyo valor es indeterminado al inicio del proceso, de forma que es posible estudiar el comportamiento del workflow respecto a los posibles valores de dichos parámetros. Para restringir los valores de los parámetros en cada uno de los caminos de ejecución del workflow se utiliza el concepto de guardas, expresadas en lógica proposicional, en el modelo del workflow. Para ello, es necesario estudiar primero qué herramientas permiten tratar dichas proposiciones, por lo que se analizan los Satisfiability Modulo Theories (SMTs), el estado actual de los estándares relacionados, la flexibilidad de los solvers disponibles y las herramientas que soporten la semántica que se va a aplicar. Finalmente, la viabilidad y usabilidad del enfoque propuesto se ha demostrado mediante su aplicación al análisis del workflow EBI InterProScan, verificando propiedades de interés para el científico sin necesidad de implementar, desplegar ni ejecutar el workflow

    Bioinformatics methodologies for detection and study of repetitive sequences in gene loci of chimeric transcripts

    Get PDF
    Orientador: Michel Eduardo Beleza YamagishiTese (doutorado) - Universidade Estadual de Campinas, Instituto de BiologiaResumo: A grande quantidade de dados biológicos gerados recentemente permitiu verificar que os genomas são repletos de seqüências repetitivas (SR), como microsatélites e elementos genéticos móveis, altamente improváveis de ocorrer estatisticamente se os genomas fossem gerados a partir de uma distribuição aleatória de nucleotídeos. Tal comprovação motivou a classificação de tais seqüências e também a construção de diversas ferramentas de bioinformática, além de mecanismos de armazenamento baseados em sistemas de gerenciamento de bancos de dados (SGBD) para permitir localizá-las e armazená-las para posterior estudo. Entretanto, foi com a comprovação biológica da importância das SR, como no mecanismo de interferência por RNAi (SR reversa complementar), que as SR despertaram maior interesse por parte da comunidade científica. Atualmente, já há fortes evidências que associam as SR com fenômenos biológicos bastante interessantes, como o processamento de RNA por cis-splicing e a formação de transcritos quiméricos, freqüentes em organismos inferiores e muito raro em organismos superiores. Tais tipos de transcritos podem ser gerados a partir de trans-splicing ou, como conjecturamos nesse trabalho, pela transposição de elementos genéticos móveis (como por exemplo transposons ou retrotransposons). Em virtude disso, este projeto propõe a construção de metodologias de Bioinformática, disponibilizadas na WEB, para detectar transcritos quiméricos em genomas de organismos, tanto em versões draft ou em alta qualidade, e também estudar as SR que ocorrem no locus gênico dos transcritos envolvidos na formação de uma seqüência quimérica. As ferramentas propostas permitiram identificar, a partir de bibliotecas de transcritos de full-length cDNA, tanto de humanos quanto de bovinos, novos transcritos quiméricos provenientes de células de tecidos normais, e que não seguem splice-sites canônicos na região de fusão dos transcritos envolvidos. Além disso, as seqüências encontradas apresentam uma elevada taxa de concentração de pares de SR do tipo reverso complementar no locus gênico dos dois transcritos que formam a seqüência quimérica. As ferramentas propostas podem ser utilizadas para outros organismos e direcionar trabalhos experimentais para tentar comprovar em bancada novos transcritos quiméricos, tanto em organismos inferiores quanto em superioresAbstract: The recent availability of a huge amount of biological data allowed to know about the high concentration of repetitive sequences (SR) like microsatellites and genetic mobile elements in different genomes. Repetitive sequences are improbable to occur statistically if genome data were generated by a random distribution of nucleotides. Such observation motivated the classification of repetitive sequences, and the construction of several bioinformatics tools. Furthermore, several mechanisms to store repetitive sequences, which are based on data base management systems (DBMS) were proposed and created. They can be used to search for specific sequences to make a posteriori study. However, it was with the biological confirmation of the importance of repetitive sequences, like by the RNA interference (reverse complement, or inverted repeat) mechanism, that the scientific community gained more interest by such sequences. Actually, there is strong evidence that associates the repetitive sequences with some interesting biological phenomena, like in RNA processing by cis-splicing, and in chimeric transcript formation mechanism. This last one is very frequently in inferior organism, but rare in superior organisms. Such types of transcripts can be generated by trans-splicing, or like conjectured in this work, by the retrotransposition of mobile genetic elements (like transposons or retrotransposons). In this way, this work proposed the construction of several Bioinformatics methodologies, available in the WEB, to detect new evidences of chimeric transcripts in genomes of different organisms, both in draft genome and in high quality genome assemblage. We also studied repetitive sequences in gene loci of the involved transcripts in a chimeric sequence formation. The proposed tools allowed us to identify, using a full-length cDNA databank, new chimeric transcript candidates in human and in bovine genome. They are from cells of normal tissues, and do not follow canonical splice-sites in the fusion region of the involved transcripts. Moreover, it was possible to show that the detected sequences have high concentration pairs of reverse complement type of repetitive sequences in gene loci of the two involved transcripts, which originated a new chimeric transcript candidate. The created bioinformatics tools can be used in other organisms in addition to the one used in this work, leading to the proposition of new experimental work to try to prove in vivo new chimeric transcripts, both in superior organism and in inferior organismDoutoradoBioinformaticaDoutor em Genetica e Biologia Molecula

    Woodss And The Web: Annotating And Reusing Scientific Workflows

    No full text
    This paper discusses ongoing research on scientific workflows at the Institute of Computing, University of Campinas (IC - UNICAMP) Brazil. Our projects with bio-scientists have led us to develop a scientific workflow infrastructure named WOODSS. This framework has two main objectives in mind: to help scientists to specify and annotate their models and experiments; and to document collaborative efforts in scientific activities. In both contexts, workflows are annotated and stored in a database. This "annotated scientific workflow" database is treated as a repository of (sometimes incomplete) approaches to solving scientific problems. Thus, it serves two purposes: allows comparison of distinct solutions to a problem, and their designs; and provides reusable and executable building blocks to construct new scientificworkflows, to meet specific needs. Annotations, moreover, allow further insight into methodology, success rates, underlying hypotheses and other issues in experimental activities. The many research challenges faced by us at the moment include: the extension of this framework to the Web, following Semantic Web standards; providing means of discovering workflow components on the Web for reuse; and taking advantage of planning in Artificial Intelligence to support composition mechanisms. This paper describes our efforts in these directions, tested over two domains-agro-environmental planning and bioinformatics.3431823Anderson, K.M., Taylor, R.N., Whitehead Jr., E.J., Chimera, hypermedia for heterogeneous software development environments (2000) ACM Transactions on Information Systems (TOIS), 18 (3), pp. 211-245Bacarin, E., Medeiros, C.B., Madeira, E., A collaborative model for agricultural supply chains (2004) Proc COOPIS 2004, LNCS 3290, pp. 319-336Biggerstaff, T.J., Richter, C., (1989) Reusability Framework, Assessment, and Directions, pp. 1-17. , ACM PressCavalcanti, M.C., Targino, R., Baião, F., Rössle, S.C., Bisch, P.M., Pires, P.F., Campos, M.L.M., Mattoso, M., Managing structural genomic workflows using Web services (2005) Data & Knowledge Engineering, 53 (1), pp. 45-74Digiampietri, L.A., Medeiros, C.B., Setubal, J.C., A framework based in Web services orchestration for bioinformatics workflow management (2004) Proc III Brazilian Workshop in BioinformaticsSimpson, A.J.G., The genome sequence of the plant pathogen Xylella fastidiosa (2000) Nature, 406 (1), pp. 151-157Fileto, R., (2003) The POESIA Approach for Services and Data. Integration on the Semantic Web, , PhD thesis, IC-UNICAMP, Campinas-SPFileto, R., Liu, L., Pu, C., Assad, E., Medeiros, C.B., POESIA: An ontological workflow approach for composing web services in agriculture (2003) VLDB Journal, 12 (4), pp. 352-367Ghallab, M., Nau, D., Traverso, P., (2004) Automated Planning, Theory and Practice, , Elsevier(2004) Proc. of the Workshop on Planning and Scheduling for Web and Grid Services, , http://www.isi.edu/ikcap/icaps04-workshop, June (as of 2005-07-11)Kaster, D., Medeiros, C.B., Rocha, H., Supporting modeling and problem solving from precedent experiences: The role of workflows and case-based reasoning (2005) Environmental Modeling and Software, 20, pp. 689-704Kim, J., Gil, Y., Towards interactive composition of semantic web services (2004) Proc. of the AAAI Spring Symposium on Semantic Web Services, , marchhttp://www.oasis-open.org/committees/tc_home.php?wg_abbrev=wsbpel, as of 2005-07-11Pastorello Jr., G.Z., (2005) Publication and Integration of Scientific Workflows on the Web, , Master's thesis, UNICAMP, in PortuguesePastorello Jr., G.Z., Medeiros, C.B., Resende, S., Rocha, H., Interoperability for GIS document management in environmental planning (2005) Journal on Data Semantics, 3, pp. 100-124. , LNCS 3534Santanchè, A., Medeiros, C.B., Geographic digital content components (2004) Proc. of VI Brazilian Symposium on GeoInformatics, , NovemberSantanchè, A., Medeiros, C.B., Managing dynamic repositories for digital content components (2004) EDBT 2004 Workshops, LNCS 3268-2004, pp. 66-77Santanchè, A., Medeiros, C.B., Self describing components Searching for digital artifacts on the web (2005) Proc. of XX Brazilian Symp. on Databases, , OctoberSeffino, L., Medeiros, C.B., Rocha, J., Yi, B., WOODSS - A spatial decision support system based on workflows (1999) Decision Support Systems, 27 (1-2), pp. 105-123Sirin, E., Parsia, B., Wu, D., Hendler, J.A., Nau, D.S., HTN planning for Web Service composition using SHOP2 (2004) Journal of Web Semantics, 1 (4), pp. 377-396Stevens, R.D., Tipney, H.J., Wroe, C.J., Oinn, T.M., Senger, M., Lord, P.W., Goble, C.A., Tassabehji, M., Exploring Williams-Beuren syndrome using my Grid (2004) Biomformatics, 20 (1 SUPPL.), pp. 1303-1310MyGrid Middleware for in Silico Experiments in Biology, , http://www.mygrid.org.uk, as of 2005-07-11Wainer, J., Weske, M., Vossen, G., Medeiros, C.B., Scientific workflow systems (1996) Proc. of the NSF Workshop on Workflow and Process Automation Information System
    corecore