68 research outputs found

    Kvasir-Capsule, a video capsule endoscopy dataset

    Get PDF
    Artificial intelligence (AI) is predicted to have profound effects on the future of video capsule endoscopy (VCE) technology. The potential lies in improving anomaly detection while reducing manual labour. Existing work demonstrates the promising benefits of AI-based computer-assisted diagnosis systems for VCE. They also show great potential for improvements to achieve even better results. Also, medical data is often sparse and unavailable to the research community, and qualified medical personnel rarely have time for the tedious labelling work. We present Kvasir-Capsule, a large VCE dataset collected from examinations at a Norwegian Hospital. Kvasir-Capsule consists of 117 videos which can be used to extract a total of 4,741,504 image frames. We have labelled and medically verified 47,238 frames with a bounding box around findings from 14 different classes. In addition to these labelled images, there are 4,694,266 unlabelled frames included in the dataset. The Kvasir-Capsule dataset can play a valuable role in developing better algorithms in order to reach true potential of VCE technology

    Workflow models for heterogeneous distributed systems

    Get PDF
    The role of data in modern scientific workflows becomes more and more crucial. The unprecedented amount of data available in the digital era, combined with the recent advancements in Machine Learning and High-Performance Computing (HPC), let computers surpass human performances in a wide range of fields, such as Computer Vision, Natural Language Processing and Bioinformatics. However, a solid data management strategy becomes crucial for key aspects like performance optimisation, privacy preservation and security. Most modern programming paradigms for Big Data analysis adhere to the principle of data locality: moving computation closer to the data to remove transfer-related overheads and risks. Still, there are scenarios in which it is worth, or even unavoidable, to transfer data between different steps of a complex workflow. The contribution of this dissertation is twofold. First, it defines a novel methodology for distributed modular applications, allowing topology-aware scheduling and data management while separating business logic, data dependencies, parallel patterns and execution environments. In addition, it introduces computational notebooks as a high-level and user-friendly interface to this new kind of workflow, aiming to flatten the learning curve and improve the adoption of such methodology. Each of these contributions is accompanied by a full-fledged, Open Source implementation, which has been used for evaluation purposes and allows the interested reader to experience the related methodology first-hand. The validity of the proposed approaches has been demonstrated on a total of five real scientific applications in the domains of Deep Learning, Bioinformatics and Molecular Dynamics Simulation, executing them on large-scale mixed cloud-High-Performance Computing (HPC) infrastructures

    On the Molecular Origin of the Cooperative Coil-to-globule Transition of Poly(N-isopropylacrylamide) in Water

    Full text link
    By means of atomistic molecular dynamics simulations we investigate the behaviour of poly(N-isopropylacrylamide), PNIPAM, in water at temperatures below and above the lower critical solution temperature (LCST), including the undercooled regime. The transition between water soluble and insoluble states at the LCST is described as a cooperative process involving an intramolecular coil-to-globule transition preceding the aggregation of chains and the polymer precipitation. In this work we investigate the molecular origin of such cooperativity and the evolution of the hydration pattern in the undercooled polymer solution. The solution behaviour of an atactic 30-mer at high dilution is studied in the temperature interval from 243 to 323 K with a favourable comparison to available experimental data. In the PNIPAM water soluble states we detect a correlation between polymer segmental dynamics and diffusion motion of bound water, occurring with the same activation energy. Simulation results show that below the coil-to-globule transition temperature PNIPAM is surrounded by a network of hydrogen bonded water molecules and that the cooperativity arises from the structuring of water clusters in proximity to hydrophobic groups. Differently, the perturbation of the hydrogen bond pattern involving water and amide groups occurs above the transition temperature. Altogether these findings reveal that even above the LCST PNIPAM remains largely hydrated and that the coil-to-globule transition is related with a significant rearrangement of the solvent in proximity of the surface of the polymer. The comparison between the hydrogen bonding of water in the surrounding of PNIPAM isopropyl groups and in bulk displays a decreased structuring of solvent at the hydrophobic polymer-water interface across the transition temperature, as expected because of the topological extension along the chain of such interface

    Water-polymer coupling induces a dynamical transition in microgels

    Full text link
    The long debated protein dynamical transition was recently found also in non-biological macromolecules, such as poly-N-isopropylacrylamide (PNIPAM) microgels. Here, by using atomistic molecular dynamics simulations, we report a description of the molecular origin of the dynamical transition in these systems. We show that PNIPAM and water dynamics below the dynamical transition temperature Td are dominated by methyl group rotations and hydrogen bonding, respectively. By comparing with bulk water, we unambiguously identify PNIPAM-water hydrogen bonding as the main responsible for the occurrence of the transition. The observed phenomenology thus crucially depends on the water-macromolecule coupling, being relevant to a wide class of hydrated systems, independently from the biological function

    An extensive study on iterative solver resilience : characterization, detection and prediction

    Get PDF
    Soft errors caused by transient bit flips have the potential to significantly impactan applicalion's behavior. This has motivated the design of an array of techniques to detect, isolate, and correct soft errors using microarchitectural, architectural, compilation­based, or application-level techniques to minimize their impact on the executing application. The first step toward the design of good error detection/correction techniques involves an understanding of an application's vulnerability to soft errors. This work focuses on silent data e orruption's effects on iterative solvers and efforts to mitigate those effects. In this thesis, we first present the first comprehensive characterizalion of !he impact of soft errors on !he convergen ce characteris tics of six iterative methods using application-level fault injection. We analyze the impact of soft errors In terms of the type of error (single-vs multi-bit), the distribution and location of bits affected, the data structure and statement impacted, and varialion with time. We create a public access database with more than 1.5 million fault injection results. We then analyze the performance of soft error detection mechanisms and present the comparalive results. Molivated by our observations, we evaluate a machine-learning based detector that takes as features that are the runtime features observed by the individual detectors to arrive al their conclusions. Our evalualion demonstrates improved results over individual detectors. We then propase amachine learning based method to predict a program's error behavior to make fault injection studies more efficient. We demonstrate this method on asse ssing the performance of soft error detectors. We show that our method maintains 84% accuracy on average with up to 53% less cost. We also show, once a model is trained further fault injection tests would cost 10% of the expected full fault injection runs.“Soft errors” causados por cambios de estado transitorios en bits, tienen el potencial de impactar significativamente el comportamiento de una aplicación. Esto, ha motivado el diseño de una variedad de técnicas para detectar, aislar y corregir soft errors aplicadas a micro-arquitecturas, arquitecturas, tiempo de compilación y a nivel de aplicación para minimizar su impacto en la ejecución de una aplicación. El primer paso para diseñar una buna técnica de detección/corrección de errores, implica el conocimiento de las vulnerabilidades de la aplicación ante posibles soft errors. Este trabajo se centra en los efectos de la corrupción silenciosa de datos en soluciones iterativas, así como en los esfuerzos para mitigar esos efectos. En esta tesis, primeramente, presentamos la primera caracterización extensiva del impacto de soft errors sobre las características convergentes de seis métodos iterativos usando inyección de fallos a nivel de aplicación. Analizamos el impacto de los soft errors en términos del tipo de error (único vs múltiples-bits), de la distribución y posición de los bits afectados, las estructuras de datos, instrucciones afectadas y de las variaciones en el tiempo. Creamos una base de datos pública con más de 1.5 millones de resultados de inyección de fallos. Después, analizamos el desempeño de mecanismos de detección de soft errors actuales y presentamos los resultados de su comparación. Motivados por las observaciones de los resultados presentados, evaluamos un detector de soft errors basado en técnicas de machine learning que toma como entrada las características observadas en el tiempo de ejecución individual de los detectores anteriores al llegar a su conclusión. La evaluación de los resultados obtenidos muestra una mejora por sobre los detectores individualmente. Basados en estos resultados propusimos un método basado en machine learning para predecir el comportamiento de los errores en un programa con el fin de hacer el estudio de inyección de errores mas eficiente. Presentamos este método para evaluar el rendimiento de los detectores de soft errors. Demostramos que nuestro método mantiene una precisión del 84% en promedio con hasta un 53% de mejora en el tiempo de ejecución. También mostramos que una vez que un modelo ha sido entrenado, las pruebas de inyección de errores siguientes costarían 10% del tiempo esperado de ejecución.Postprint (published version

    Disulfide Bond Engineering of an Endoglucanase from Penicillium verruculosum to Improve Its Thermostability

    Get PDF
    Endoglucanases (EGLs) are important components of multienzyme cocktails used in the production of a wide variety of fine and bulk chemicals from lignocellulosic feedstocks. However, a low thermostability and the loss of catalytic performance of EGLs at industrially required temperatures limit their commercial applications. A structure-based disulfide bond (DSB) engineering was carried out in order to improve the thermostability of EGLII from Penicillium verruculosum. Based on in silico prediction, two improved enzyme variants, S127C-A165C (DSB2) and Y171C-L201C (DSB3), were obtained. Both engineered enzymes displayed a 15–21% increase in specific activity against carboxymethylcellulose and β-glucan compared to the wild-type EGLII (EGLII-wt). After incubation at 70 °C for 2 h, they retained 52–58% of their activity, while EGLII-wt retained only 38% of its activity. At 80 °C, the enzyme-engineered forms retained 15–22% of their activity after 2 h, whereas EGLII-wt was completely inactivated after the same incubation time. Molecular dynamics simulations revealed that the introduced DSB rigidified a global structure of DSB2 and DSB3 variants, thus enhancing their thermostability. In conclusion, this work provides an insight into DSB protein engineering as a potential rational design strategy that might be applicable for improving the stability of other enzymes for industrial applications

    An abstract interpretation for SPMD divergence on reducible control flow graphs

    Get PDF
    Vectorizing compilers employ divergence analysis to detect at which program point a specific variable is uniform, i.e. has the same value on all SPMD threads that execute this program point. They exploit uniformity to retain branching to counter branch divergence and defer computations to scalar processor units. Divergence is a hyper-property and is closely related to non-interference and binding time. There exist several divergence, binding time, and non-interference analyses already but they either sacrifice precision or make significant restrictions to the syntactical structure of the program in order to achieve soundness. In this paper, we present the first abstract interpretation for uniformity that is general enough to be applicable to reducible CFGs and, at the same time, more precise than other analyses that achieve at least the same generality. Our analysis comes with a correctness proof that is to a large part mechanized in Coq. Our experimental evaluation shows that the compile time and the precision of our analysis is on par with LLVM’s default divergence analysis that is only sound on more restricted CFGs. At the same time, our analysis is faster and achieves better precision than a state-of-the-art non-interference analysis that is sound and at least as general as our analysis
    corecore