418 research outputs found

    GPU-accelerated Chemical Similarity Assessment for Large Scale Databases

    Get PDF
    The assessment of chemical similarity between molecules is a basic operation in chemoinformatics, a computational area concerning with the manipulation of chemical structural information. Comparing molecules is the basis for a wide range of applications such as searching in chemical databases, training prediction models for virtual screening or aggregating clusters of similar compounds. However, currently available multimillion databases represent a challenge for conventional chemoinformatics algorithms raising the necessity for faster similarity methods. In this paper, we extensively analyze the advantages of using many-core architectures for calculating some commonly-used chemical similarity coe_cients such as Tanimoto, Dice or Cosine. Our aim is to provide a wide-breath proof-of-concept regarding the usefulness of GPU architectures to chemoinformatics, a class of computing problems still uncovered. In our work, we present a general GPU algorithm for all-to-all chemical comparisons considering both binary fingerprints and floating point descriptors as molecule representation. Subsequently, we adopt optimization techniques to minimize global memory accesses and to further improve e_ciency. We test the proposed algorithm on different experimental setups, a laptop with a low-end GPU and a desktop with a more performant GPU. In the former case, we obtain a 4-to-6-fold speed-up over a single-core implementation for fingerprints and a 4-to-7-fold speed-up for descriptors. In the latter case, we respectively obtain a 195-to-206-fold speed-up and a 100-to-328-fold speed-up.National Institutes of Health (U.S.) (grant GM079804)National Institutes of Health (U.S.) (grant GM086145

    Overcoming timescale and finite-size limitations to compute nucleation rates from small scale Well Tempered Metadynamics simulations

    Get PDF
    Condensation of a liquid droplet from a supersaturated vapour phase is initiated by a prototypical nucleation event. As such it is challenging to compute its rate from atomistic molecular dynamics simulations. In fact at realistic supersaturation conditions condensation occurs on time scales that far exceed what can be reached with conventional molecular dynamics methods. Another known problem in this context is the distortion of the free energy profile associated to nucleation due to the small, finite size of typical simulation boxes. In this work the problem of time scale is addressed with a recently developed enhanced sampling method while contextually correcting for finite size effects. We demonstrate our approach by studying the condensation of argon, and showing that characteristic nucleation times of the order of magnitude of hours can be reliably calculated, approaching realistic supersaturation conditions, thus bridging the gap between what standard molecular dynamics simulations can do and real physical systems.Comment: 9 pages, 7 figures, additional figures and data provided as supplementary information. Submitted to the Journal of Chemical Physisc

    Improved Signal Characterization via Empirical Mode Decomposition to Enhance in-line Quality Monitoring

    Get PDF
    The machine tool industry is facing the need to increase the sensorization of production systems to meet evolving market demands. This leads to the increasing interest for in-process monitoring tools that allow a fast detection of faults and unnatural process behaviours during the process itself. Nevertheless, the analysis of sensor signals implies several challenges. One major challenge consists of the complexity of signal patterns, which often exhibit a multiscale content, i.e., a superimposition of both stationary and non-stationary fluctuations on different time-frequency levels. Among time-frequency techniques, Empirical Mode Decomposition (EMD) is a powerful method to decompose any signal into its embedded oscillatory modes in a fully data-driven way, without any ex-ante basis selection. Because of this, it might be used effectively for automated monitoring and diagnosis of manufacturing processes. Unfortunately, it usually yields an over-decomposition, with single oscillation modes that can be split into more than one scale (this effect is also known as “mode mixing”). The literature lacks effective strategies to automatically synthetize the decomposition into a minimal number of physically relevant and interpretable components. This paper proposes a novel approach to achieve a synthetic decomposition of complex signals through the EMD procedure. A new criterion is proposed to group together multiple components associated to a common time-frequency pattern, aimed at summarizing the information content into a minimal number of modes, which may be easier to interpret. A real case study in waterjet cutting is presented, to demonstrate the benefits and the critical issues of the proposed approach

    Crystal nucleation from solution: design and modelling of detection time experiments

    Get PDF
    Crystal nucleation is the process responsible for the appearance of a thermodynamically stable phase from a metastable parent solution. Given its activated nature, nucleation is affected by stochasticity which, despite originating at the molecular level, affects heavily also the macroscopic behaviour of the system. Being far too small to be observed directly, nuclei are detected by indirect methods, which correlate the formation of the new phase with a measurable change in a property of the system, hence a model linking nuclei formation and crystals detection is always needed. We have previously presented a model describing nucleation in macroscopic systems as a stochastic Poisson process. The model, despite its general character, can describe industrially relevant processes, e.g. batch cooling at different operating conditions. The different scales influenced by the stochastic nature of nucleation demand appropriate theoretical and experimental investigations, particularly for applying the model to industrial scale-up, optimisation, and control. Using statistical tools, we have looked into the issue of estimating stochastic processes by collecting a representative, but limited number of data, produced from a homogeneous set. Moreover, using our model, we analysed the sensitivity of crystallising systems on initial and boundary conditions, with particular emphasis on the effect of supersaturation, temperature and detection conditions. Finally, in light of the stochastic nature of nucleation, we also applied statistical meta-analysis to assess the agreement between the fitting and its parameters and experiments, to gain further insight into the quality of the model. Experimentally, we have first investigated the conditions to perform homogeneous and reproducible measurements, necessary to understand the fundamental physical features and ultimately to estimate reliable kinetic parameters. A second aspect we have explored concerned the size of the crystallising systems. Since in macroscopic reactors various phenomena occur simultaneously (nucleation, growth, breakage, agglomeration) we chose to work with two main system sizes, 1-3 mL reactors (mesoscale) and 1-60 nL reactors (microscale, i.e. microscopic droplets), where at least some of such phenomena could be decoupled. In the mesoscale crystallisers, one can perform experiments where temperature and transmissivity could be measured online, hence monitoring the appearance and disappearance of crystals. Additionally, the influence of fluid-dynamics, typically turbulent in these reactors, was investigated. In the microfluidic chips, on the other hand, a very high through-put (thousands of replicas of the same reactor) can be potentially achieved and, thanks to their very small size, high supersaturations, outside of usual experimental reach, could be explored. Additionally, within the microscopic droplets the fluid motion is generally diffusive or laminar convective, hence hindering breakage and agglomeration. One could thus observe systems where nucleation and growth of single crystals (or of few crystals) occur unperturbed. Nevertheless, some main challenges, which we have been addressing, must be tackled before performing reliable crystallisation experiments: the characterisation and the reproducibility of shape and size of the droplets and their stability (i.e. the loss of mass due to evaporation and perspiration through the chip). In conclusion, we demonstrate that, even if the data are reproducible and reliable, robust probability estimations can be obtained only with a sufficiently large number of experiments, which require careful design to avoid sensitivity regions and data processing to reject the non-homogeneous data. The different sizes investigated have permitted to gain a better insight into the fundamental phenomena occurring in a crystallising system between the first formation of nuclei until crystal detection, which is of utmost importance for understanding the design of the experiments at an industrially relevant scale. Moreover, appropriate mathematical tools allowed to assess the reliability of the fitting obtained from independent measurements of the same system at different conditions

    Design Performance Analysis of a Self-Organizing Map for Statistical Monitoring of Distribution-free Data Streams

    Get PDF
    In industrial applications, the continuously growing development of multi-sensor approaches, together with the trend of creating data-rich environments, are straining the effectiveness of the traditional Statistical Process Control (SPC) tools. Industrial data streams frequently violate the statistical assumptions on which SPC tools are based, presenting non-normal or even mixture distributions, strong autocorrelation and complex noise patterns. To tackle these challenges, novel nonparametric approaches are required. Machine learning techniques are suitable to deal with distributional assumption violations and to cope with complex data patterns. Recent studies showed that those methods can be used in quality control problems by exploiting only in-control data for training (such a learning paradigm is also known as “one-class-classification”). In recent studies, the use of distribution-free multivariate SPC methods was proposed, based on unsupervised statistical learning tools, pointing out the difficulty of defining suitable control regions for non-normal data. In this paper, a Self-Organizing Map (SOM) based monitoring approach is presented. The SOM is an automatic data-analysis method, widely applied in recent works to clustering and data exploration problems. A very interesting feature of this method consists of its capability of providing a computationally efficient way to estimate a data-adaptive control region, even in the presence of high dimensional problems. Nevertheless, very few authors adopted the SOM in an SPC monitoring strategy. The aim of this work is to exploit the SOM network architecture, and proposing a network design approach that suites the SPC needs. A comparison study is presented, in which the process monitoring performances are compared against literature benchmark methods. The comparison framework is based on both simulated data and real data from a roll grinding application

    'Real-world' atrial fibrillation management in Europe: observations from the 2-year follow-up of the EURObservational Research Programme-Atrial Fibrillation General Registry Pilot Phase

    Get PDF
    Atrial fibrillation (AF) is commonly associated with a high risk of stroke, thromboembolism, and mortality. The 1-year follow-up of the EURObservational Research Programme-Atrial Fibrillation (EORP-AF) Pilot Registry demonstrated a high mortality but good outcomes with European Society of Cardiology guideline-adherent therapy. Whether these 'real-world' observations on patients managed by European cardiologists extend to 2 years remains uncertain

    Does democracy cause growth? A meta-analysis (of 2000 regressions)

    Get PDF
    The relationship between democracy and economic growth has been widely debated in the social sciences with contrasting results. We apply a meta-analytical framework surveying 188 studies (2047 models) covering 36 years of research in the field. We also compare the effect of democracy on growth with the effect of human capital on growth in a sub-sample of 111 studies (875 models). Our findings suggest that democracy has a positive and direct effect on economic growth beyond the reach of publication bias, albeit weaker (about one third) of that of human capital. Further, the growth effect of democracy appears to be stronger in more recent papers not surveyed in Doucouliagos and Uluba\u15fo\u11flu (2008). Finally, we show that the heterogeneity in the reported results is mainly driven by spatial and temporal differences in the samples, indicating that the democracy and growth nexus is not homogeneous across world regions and decades
    • …
    corecore