1,843 research outputs found

    Clustering files of chemical structures using the Szekely-Rizzo generalization of Ward's method

    Get PDF
    Ward's method is extensively used for clustering chemical structures represented by 2D fingerprints. This paper compares Ward clusterings of 14 datasets (containing between 278 and 4332 molecules) with those obtained using the Szekely–Rizzo clustering method, a generalization of Ward's method. The clusters resulting from these two methods were evaluated by the extent to which the various classifications were able to group active molecules together, using a novel criterion of clustering effectiveness. Analysis of a total of 1400 classifications (Ward and Székely–Rizzo clustering methods, 14 different datasets, 5 different fingerprints and 10 different distance coefficients) demonstrated the general superiority of the Székely–Rizzo method. The distance coefficient first described by Soergel performed extremely well in these experiments, and this was also the case when it was used in simulated virtual screening experiments

    A lightweight, graph-theoretic model of class-based similarity to support object-oriented code reuse.

    Get PDF
    The work presented in this thesis is principally concerned with the development of a method and set of tools designed to support the identification of class-based similarity in collections of object-oriented code. Attention is focused on enhancing the potential for software reuse in situations where a reuse process is either absent or informal, and the characteristics of the organisation are unsuitable, or resources unavailable, to promote and sustain a systematic approach to reuse. The approach builds on the definition of a formal, attributed, relational model that captures the inherent structure of class-based, object-oriented code. Based on code-level analysis, it relies solely on the structural characteristics of the code and the peculiarly object-oriented features of the class as an organising principle: classes, those entities comprising a class, and the intra and inter-class relationships existing between them, are significant factors in defining a two-phase similarity measure as a basis for the comparison process. Established graph-theoretic techniques are adapted and applied via this model to the problem of determining similarity between classes. This thesis illustrates a successful transfer of techniques from the domains of molecular chemistry and computer vision. Both domains provide an existing template for the analysis and comparison of structures as graphs. The inspiration for representing classes as attributed relational graphs, and the application of graph-theoretic techniques and algorithms to their comparison, arose out of a well-founded intuition that a common basis in graph-theory was sufficient to enable a reasonable transfer of these techniques to the problem of determining similarity in object-oriented code. The practical application of this work relates to the identification and indexing of instances of recurring, class-based, common structure present in established and evolving collections of object-oriented code. A classification so generated additionally provides a framework for class-based matching over an existing code-base, both from the perspective of newly introduced classes, and search "templates" provided by those incomplete, iteratively constructed and refined classes associated with current and on-going development. The tools and techniques developed here provide support for enabling and improving shared awareness of reuse opportunity, based on analysing structural similarity in past and ongoing development, tools and techniques that can in turn be seen as part of a process of domain analysis, capable of stimulating the evolution of a systematic reuse ethic

    Stratigraphic interpretation of Well-Log data of the Athabasca Oil Sands of Alberta Canada through Pattern recognition and Artificial Intelligence

    Get PDF
    Dissertation submitted in partial fulfillment of the requirements for the Degree of Master of Science in Geospatial Technologies.Automatic Stratigraphic Interpretation of Oil Sand wells from well logs datasets typically involve recognizing the patterns of the well logs. This is done through classification of the well log response into relatively homogenous subgroups based on eletrofacies and lithofacies. The electrofacies based classification involves identifying clusters in the well log response that reflect ‘similar’ minerals and lithofacies within the logged interval. The identification of lithofacies relies on core data analysis which can be expensive and time consuming as against the electrofacies which are straight forward and inexpensive. To date, challenges of interpreting as well as correlating well log data has been on the increase especially when it involves numerous wellbore that manual analysis is almost impossible. This thesis investigates the possibilities for an automatic stratigraphic interpretation of an Oil Sand through statistical pattern recognition and rule-based (Artificial Intelligence) method. The idea involves seeking high density clusters in the multivariate space log data, in order to define classes of similar log responses. A hierarchical clustering algorithm was implemented in each of the wellbores and these clusters and classifies the wells in four classes that represent the lithologic information of the wells. These classes known as electrofacies are calibrated using a developed decision rules which identify four lithology -Sand, Sand-shale, Shale-sand and Shale in the gamma ray log data. These form the basis of correlation to generate a subsurface model

    Fluid injections in the subsurface: a multidisciplinary approach for better understanding their implications on induced seismicity and the environment.

    Get PDF
    Fluid injections in the subsurface are common operations in underground industrial activities such as oil and gas exploitation, geothermal energy development, and carbon capture and storage (CCS). In recent years, it became a focal point as new drilling technologies (e.g., hydraulic fracturing) enable the extraction of oil and gas in unconventional reservoirs and the development of CCS injection techniques became a key research topic in the context of the low-carbon energy transition. Fluid injections have drawn the attention also in the general public because of their main potential implications such as the induced seismicity phenomenon (Rubinstein and Mahani, 2015) and the environmental pollution (Burton et al., 2016, Pitchel et al., 2016). Considering the strong socioeconomic impact of fluid injection operations (National Research Council, 2013; Ellsworth, 2013; Grigoli et al., 2017) the current research in this field needs the integration of multidisciplinary studies, involving knowledge on geology, seismology, source physics, hydrogeology, fluid geochemistry, rocks geomechanics for a complete understanding of the phenomenon and to set-up the most effective and “best practice” protocols for the monitoring of areas where injection operation are performed. On this basis, this work applied a multidisciplinary approach integrating seismological methods, geochemical studies, and machine learning techniques. Two key-study areas characterized by high fluid-rock interaction and fluid-injection in the subsurface were analyzed: i) the High Agri Valley (hereinafter HAV), hosting the largest onshore oil field in West Europe, in which wastewater disposal operations have been carried out since 2006 at the Costa Molina 2 injection well and where both natural and induced seismicity clusters were recognized; ii) the Mefite d’Ansanto, the largest natural emission of CO2-rich gases with mantle-derived fluids (from non‐volcanic environment) ever measured on the Earth (Carcausi et al., 2013; Caracausi and Paternoster, 2015; Chiodini et al, 2010). Regarding the HAV study area, we reconstructed the preliminary catalogue of seismicity through accurate absolute locations in a 3D-velocity model (Serlenga and Stabile, 2019) of earthquakes detected from the local seismic INSIEME network managed by the CNR-IMAA. A total of 852 between local tectonic and induced earthquakes occurred in the HAV between September 2016 and March 2019. We tested the potential of the unsupervised machine-learning approach as an automated tool to make faster dataset exploratory analysis, founding the density-based approach (DBSCAN algorithm-Density-Based Spatial Clustering of Applications with Noise, Ester et al., 1996) particularly suitable for the fast identification of clusters in the catalogue resulting from both injection-induced events and tectonic local earthquake swarms. Moreover, we proposed a semi-automated workflow for earthquake detection and location with the aim to improve the current standard procedures, quite time-consuming and strictly related to human operators. The workflow, integrating manual, semi-automatic and automatic detection and location methods enabled us to characterize a low magnitude natural seismic sequence occurred in August 2020 in the southwestern area of the HAV (Castelsaraceno sequence) in a relatively short time with respect to the application of standard techniques, thus representing a starting point for the improvement of the efficiency of seismic monitoring techniques of both anthropogenic and natural seismicity in the HAV. Our multidisciplinary approach involved the geochemical study of the HAV groundwaters with the aim to: (1) determine the geochemical processes controlling the chemical composition; (2) define a geochemical conceptual model regarding fluid origin (deep vs shallow) and mixing processes by means isotopic data; (3) establish a geochemical baseline for the long-term environmental monitoring of the area. A total of 39 water samples were collected from springs and wells located at the main hydro-structures bordering the valley to determine chemical (major, minor and trace elements) and isotopic composition (e.g., dD, d18O, d13C-TDIC and noble gas). All investigated water samples have a meteoric origin, although some springs show long and deep flow than the other ones, and a bicarbonate alkaline-earth composition, thus suggesting the carbonate hydrolysis as the main water-rock interaction process. Our results demonstrated that HAV groundwater is chemically suitable for drinking use showing no criticalities for potentially toxic metals reported by the Italian and European legislation guidelines. Particular attention was given on thermal water of Tramutola well, built by Agip S.p.a. for oil & gas exploration, with the occurrence of bubbling gases. The geochemical study highlighted a substantial difference of these CH4-dominated thermal fluids with the rest of the dataset. Helium isotope (3He/4He) indicate a prevalent radiogenic component with a contribution of mantle-derived helium (~20%) and the average δ13C-CO2 value is of – 4.6 ‰ VPDB, consistent with a mantle origin. Methane isotope composition indicates a likely microbial isotopic signature (δ13C-CH4 =−63.1‰, −62.4‰, δD-CH4=−196‰, −212‰), probably due to biodegradation processes of thermogenic hydrocarbons. The methane output at the well, evaluated by means of anemometric measurement of the volume flow (m3/h) is of ~156 t/y, that represent about 1.5% of total national anthropogenic sources related to fossil fuel industry (Etiope et al., 2007). Our work highlighted that Tramutola well may represent a key natural laboratory to better understand the complex coupling effects between mechanical and fluid-dynamic processes in earthquake generation. Moreover, the integration of seismic and geochemical data in this work allowed us to identify the most suitable locations for the future installation of multiparametric stations for the long-term monitoring of the area and development of integrated research in the HAV. Regarding the Mefite d’Ansanto, we analyzed the background seismicity in the emission area recorded by a dense temporary seismic network deployed at the site between 30-10-2019 and 02-11-2019. First, we implemented and tested an automated detection algorithm based on non-parametric statistics of the recorded amplitudes at each station, collecting a total dataset of 8561 events. Then, both unsupervised (DBSCAN) and supervised (KNN-k-nearest neighbors classification, Fix & Hodges, 1951) machine learning techniques were applied, based on specific parameters (duration, RMS-amplitude and arrival slope) of the detected events. DBSCAN algorithm allowed to determine characteristic bivariate correlations among tremors parameters: a high linear correlation (r~0.6-0.7) between duration and RMS-amplitude and a lower one (r~0.5-0.6) between amplitude and arrival slope (first arrival parametrization). These relationships let us to define training samples for the KNN algorithm, which allowed to classify tremor signals at each station and to automatically discriminate between tremors and accidentally detected anthropogenic noise. Results allowed to extract new information on seismic tremor at Mefite d’Ansanto, previously poorly quantitively analyzed, and its discrimination, thus providing a starting workflow for monitoring the non-volcanic emission. Isotopic geochemistry (3He/4He, 4 He/20Ne, δ13CCO2) indicated a mixing of mantle (30%-40%) and crust-derived fluids. The source location of the emission related tremor would represent a step forward in its characterization, and for setting up more advanced automated detection and machine learning classification techniques to exploit the information provided by seismic tremor for an improved automatic monitoring of non-volcanic, CO2 -gas emissions

    Pattern mining approaches used in sensor-based biometric recognition: a review

    Get PDF
    Sensing technologies place significant interest in the use of biometrics for the recognition and assessment of individuals. Pattern mining techniques have established a critical step in the progress of sensor-based biometric systems that are capable of perceiving, recognizing and computing sensor data, being a technology that searches for the high-level information about pattern recognition from low-level sensor readings in order to construct an artificial substitute for human recognition. The design of a successful sensor-based biometric recognition system needs to pay attention to the different issues involved in processing variable data being - acquisition of biometric data from a sensor, data pre-processing, feature extraction, recognition and/or classification, clustering and validation. A significant number of approaches from image processing, pattern identification and machine learning have been used to process sensor data. This paper aims to deliver a state-of-the-art summary and present strategies for utilizing the broadly utilized pattern mining methods in order to identify the challenges as well as future research directions of sensor-based biometric systems

    Cluster analysis with deolistic graphs

    Get PDF
    This work introduces a particular family of acyclic graphs referred to as Deolistic graphs. The formal definition of a Deolistic graph is presented together with a new clustering algorithm that uses logical sentences to reveal hidden structures in data. The sentences yielded by the algorithm can be generated by automatic means, considerably reducing the complexity of data analysis and time spent on the clustering process, not to mention the usefulness of its results. Furthermore, these sentences are rules that can be easily understood, interpreted and disclosed to all interested parties, thereby improving communication and reducing misunderstandings
    • …
    corecore