16 research outputs found

    "Model checking" paramétrico de "workflows" científicos

    Get PDF
    La computación científica ha ganado un creciente interés en los últimos años en áreas afines a las ciencias de la vida. Los workflows científicos son un tipo especial de workflow que se utilizan en escenarios de grandes dimensiones y gran complejidad computacional como modelos climáticos, estructuras biológicas, química, cirugía o simulación de desastres, por ejemplo, y cuya ejecución es un proceso que consume una gran cantidad de tiempo y recursos. Uno de los objetivos principales de la computación científica ha sido la mejora progresiva a través de la introducción de nuevos paradigmas y tecnologías para poder abordar desafíos cada vez más complejos, siendo uno de estos paradigmas la adición de aspectos semánticos a los workflows. Disponer de una serie de herramientas y técnicas que posibiliten el análisis del comportamiento del workflow antes de su ejecución resulta de gran interés. El objetivo de ese análisis es poder garantizar un comportamiento adecuado y correcto, así como verificar la correcta gestión y utilización de los recursos involucrados. El análisis debería permitir la predicción de la calidad de los resultados, así como identificar aquellos parámetros que son necesarios para obtener los resultados esperados. Desde el punto de vista del usuario, la incorporación de aspectos semánticos permite a los científicos realizar una navegación, interrogación, integración y composición de conjuntos de datos y servicios mucho más eficiente. Sin embargo, el análisis del estado del arte en el área de la semántica aplicada a los modelos en la computación científica muestra carencias significativas en el grado de madurez y aplicación de este enfoque, así como la carencia de técnicas y herramientas para su aplicación. Es necesario, por tanto, proponer y desarrollar nuevas técnicas de modelado y análisis que puedan manejar dichos aspectos semánticos. En este Trabajo Fin de Máster se aborda el análisis, diseño y desarrollo de un método y una herramienta de model checking basados en la introducción de aspectos y anotaciones semánticas tanto en los modelos como en las propiedades que deben verificarse. Como resultado, la herramienta COMBAS (COmprobador de Modelos BAsado en Semántica) proporciona un entorno de integración para la verificación de este tipo de modelos y la navegación por las estructuras resultantes del proceso. Para la descripción de los modelos de workflows científicos se ha utilizado una clase de Redes de Petri de alto nivel anotadas con información semántica en RDF, las U-RDF-PN. A lo largo de este trabajo se ha abordado la adición de las técnicas, metodologías y modelos necesarios para extender el framework con análisis paramétrico, que consiste en un análisis mucho más potente y expresivo mediante la utilización de parámetros cuyo valor es indeterminado al inicio del proceso, de forma que es posible estudiar el comportamiento del workflow respecto a los posibles valores de dichos parámetros. Para restringir los valores de los parámetros en cada uno de los caminos de ejecución del workflow se utiliza el concepto de guardas, expresadas en lógica proposicional, en el modelo del workflow. Para ello, es necesario estudiar primero qué herramientas permiten tratar dichas proposiciones, por lo que se analizan los Satisfiability Modulo Theories (SMTs), el estado actual de los estándares relacionados, la flexibilidad de los solvers disponibles y las herramientas que soporten la semántica que se va a aplicar. Finalmente, la viabilidad y usabilidad del enfoque propuesto se ha demostrado mediante su aplicación al análisis del workflow EBI InterProScan, verificando propiedades de interés para el científico sin necesidad de implementar, desplegar ni ejecutar el workflow

    Spatiotemporal Characteristics of the Largest HIV-1 CRF02_AG Outbreak in Spain: Evidence for Onward Transmissions

    Get PDF
    Background and Aim: The circulating recombinant form 02_AG (CRF02_AG) is the predominant clade among the human immunodeficiency virus type-1 (HIV-1) non-Bs with a prevalence of 5.97% (95% Confidence Interval-CI: 5.41–6.57%) across Spain. Our aim was to estimate the levels of regional clustering for CRF02_AG and the spatiotemporal characteristics of the largest CRF02_AG subepidemic in Spain.Methods: We studied 396 CRF02_AG sequences obtained from HIV-1 diagnosed patients during 2000–2014 from 10 autonomous communities of Spain. Phylogenetic analysis was performed on the 391 CRF02_AG sequences along with all globally sampled CRF02_AG sequences (N = 3,302) as references. Phylodynamic and phylogeographic analysis was performed to the largest CRF02_AG monophyletic cluster by a Bayesian method in BEAST v1.8.0 and by reconstructing ancestral states using the criterion of parsimony in Mesquite v3.4, respectively.Results: The HIV-1 CRF02_AG prevalence differed across Spanish autonomous communities we sampled from (p < 0.001). Phylogenetic analysis revealed that 52.7% of the CRF02_AG sequences formed 56 monophyletic clusters, with a range of 2–79 sequences. The CRF02_AG regional dispersal differed across Spain (p = 0.003), as suggested by monophyletic clustering. For the largest monophyletic cluster (subepidemic) (N = 79), 49.4% of the clustered sequences originated from Madrid, while most sequences (51.9%) had been obtained from men having sex with men (MSM). Molecular clock analysis suggested that the origin (tMRCA) of the CRF02_AG subepidemic was in 2002 (median estimate; 95% Highest Posterior Density-HPD interval: 1999–2004). Additionally, we found significant clustering within the CRF02_AG subepidemic according to the ethnic origin.Conclusion: CRF02_AG has been introduced as a result of multiple introductions in Spain, following regional dispersal in several cases. We showed that CRF02_AG transmissions were mostly due to regional dispersal in Spain. The hot-spot for the largest CRF02_AG regional subepidemic in Spain was in Madrid associated with MSM transmission risk group. The existence of subepidemics suggest that several spillovers occurred from Madrid to other areas. CRF02_AG sequences from Hispanics were clustered in a separate subclade suggesting no linkage between the local and Hispanic subepidemics

    Redo log process mining in real life:data challenges & opportunities

    No full text
    \u3cp\u3eData extraction and preparation are the most time-consuming phases of any process mining project. Due to the variability on the sources of event data, it remains a highly manual process in most of the cases. Moreover, it is very difficult to obtain reliable event data in enterprise systems that are not process-aware. Some techniques, like redo log process mining, try to solve these issues by automating the process as much as possible, and enabling event extraction in systems that are not process aware. This paper presents the challenges faced by redo log, and traditional process mining, comparing both approaches at theoretical and practical levels. Finally, we demonstrate that the data obtained with redo log process mining in a real-life environment is, at least, as valid as the one extracted by the traditional approach.\u3c/p\u3

    Connecting databases with process mining: a meta model and toolset

    Get PDF
    Process mining techniques require event logs which, in many cases, are obtained from databases. Obtaining these event logs is not a trivial task and requires substantial domain knowledge. In addition, an extracted event log provides only a single view on the database. To change our view, e.g., to focus on another business process and generate another event log, it is necessary to go back to the source of data. This paper proposes a meta model to integrate both process and data perspectives, relating one to the other. It can be used to generate different views from the database at any moment in a highly flexible way. This approach decouples the data extraction from the application of analysis techniques, enabling the application of process mining in different contexts

    Everything you always wanted to know about your process, but did not know how to ask

    No full text
    The size of execution data available for process mining analysis grows several orders of magnitude every couple of years. Extracting and selecting the relevant data to be analyzed on each case represents an open challenge in the field. This paper presents a systematic literature review on different approaches to query process data and establish their provenance. In addition, a new query language is proposed, which overcomes the limitations identified during the review. The proposal is based on a com-bination of data and process perspectives. It provides simple constructs to intuitively formulate questions. An implementation of the language is provided, together with examples of queries to be applied on different aspects of the process analysis

    Connecting databases with process mining : a meta model and toolset

    Get PDF
    This paper raises questions about the development of cultural identity as it will transform and impact upon the process of regional integration in the Asia Pacific Rim, through a consideration of the post-national tendencies created by migrant populations, in this case the Chilean diasporic community. I am specifically interested in how nationalisms impact on regional integration projects and how a post-national reading of the region might be beneficial in developing strategies in regional integration

    Process mining on databases : unearthing historical data from redo logs

    No full text
    Process Mining techniques rely on the existence of event data. However, in many cases it is far from trivial to obtain such event data. Considerable efforts may need to be spent on making IT systems record historic data at all. But even if such records are available, it may not be possible to derive an event log for the case notion one is interested in, i.e., correlating events to form process instances may be challenging. This paper proposes an approach that exploits a commonly available and versatile source of data, i.e. database redo logs. Such logs record the writing operations performed in a general-purpose database for a range of objects, which constitute a collection of events. By using the relations between objects as specified in the associated data model, it is possible to turn such events into an event log for a wide range of case types. The resulting logs can be analyzed using existing process mining techniques. Keywords: Process mining; Database; Redo log; Historical data; Trace creation; Transitive relations; Data mode
    corecore