1,364 research outputs found

    CLADAG 2021 BOOK OF ABSTRACTS AND SHORT PAPERS

    Get PDF
    The book collects the short papers presented at the 13th Scientific Meeting of the Classification and Data Analysis Group (CLADAG) of the Italian Statistical Society (SIS). The meeting has been organized by the Department of Statistics, Computer Science and Applications of the University of Florence, under the auspices of the Italian Statistical Society and the International Federation of Classification Societies (IFCS). CLADAG is a member of the IFCS, a federation of national, regional, and linguistically-based classification societies. It is a non-profit, non-political scientific organization, whose aims are to further classification research

    Fuzzy Systems

    Get PDF
    This book presents some recent specialized works of theoretical study in the domain of fuzzy systems. Over eight sections and fifteen chapters, the volume addresses fuzzy systems concepts and promotes them in practical applications in the following thematic areas: fuzzy mathematics, decision making, clustering, adaptive neural fuzzy inference systems, control systems, process monitoring, green infrastructure, and medicine. The studies published in the book develop new theoretical concepts that improve the properties and performances of fuzzy systems. This book is a useful resource for specialists, engineers, professors, and students

    Evaluating the potential of carbonate sub-facies classification using NMR longitudinal over transverse relaxation time ratio

    Get PDF
       While the well log-based lithology classification has been extensively utilized in reservoir characterization, the classification of carbonate sub-facies remains challenging due to the subtle nuances in conventional well-logs. The nuclear magnetic resonance (NMR) log provides extra information of pore size and pore geometry features, improving differentiating carbonate sub-facies. Here we explore the feasibility of using the ratio between NMR longitudinal relaxation time and transverse relaxation time as a potential lithology indicator to determine carbonate sub-facies. We analyzed a series of logging data and corresponding core samples of Arbuckle Group carbonate containing mudstone, packstone, grainstone, incipient breccia, and breccia in northern Kansas for the characteristics of longitudinal relaxation times, transverse relaxation times, and longitudinal over transverse relaxation time ratios. The results show that mudstone, packstone, and grainstone exhibit high, intermediate, and low longitudinal over transverse relaxation time ratios, respectively, while incipient breccia and breccia have a wide range of longitudinal over transverse relaxation time ratios. Furthermore, we evaluated the potential of using longitudinal over transverse relaxation time ratios to classify carbonate sub-facies using multivariate analysis. By adding longitudinal over transverse relaxation time ratios to neutron porosity, total gamma-ray, and conductivity logs as inputs of automated facies classification, the prediction error decreased, especially for incipient breccia. On the contrary, when photoelectric log and computed gamma-ray are also available, adding longitudinal over transverse relaxation time ratios does not improve the accuracy of sub-facies classification. Our results suggest that longitudinal over transverse relaxation time ratio is an independent lithology indicator. However, it cannot replace other logs like gamma-ray and photoelectric logs in classifying carbonate sub-facies. Our study provided valuable evidence and credible elucidation of the importance and physicochemical mechanism of longitudinal over transverse relaxation time ratios, which is essential for deciphering NMR logging data in carbonate reservoirs.Cited as: Zhang, F., Zhang, C. Evaluating the potential of carbonate sub-facies classification using NMR longitudinal over transverse relaxation time ratio.  Advances in Geo-Energy Research, 2021, 5(1):  87-103, doi: 10.46690/ager.2021.01.0

    Improving k-nn search and subspace clustering based on local intrinsic dimensionality

    Get PDF
    In several novel applications such as multimedia and recommender systems, data is often represented as object feature vectors in high-dimensional spaces. The high-dimensional data is always a challenge for state-of-the-art algorithms, because of the so-called curse of dimensionality . As the dimensionality increases, the discriminative ability of similarity measures diminishes to the point where many data analysis algorithms, such as similarity search and clustering, that depend on them lose their effectiveness. One way to handle this challenge is by selecting the most important features, which is essential for providing compact object representations as well as improving the overall search and clustering performance. Having compact feature vectors can further reduce the storage space and the computational complexity of search and learning tasks. Support-Weighted Intrinsic Dimensionality (support-weighted ID) is a new promising feature selection criterion that estimates the contribution of each feature to the overall intrinsic dimensionality. Support-weighted ID identifies relevant features locally for each object, and penalizes those features that have locally lower discriminative power as well as higher density. In fact, support-weighted ID measures the ability of each feature to locally discriminate between objects in the dataset. Based on support-weighted ID, this dissertation introduces three main research contributions: First, this dissertation proposes NNWID-Descent, a similarity graph construction method that utilizes the support-weighted ID criterion to identify and retain relevant features locally for each object and enhance the overall graph quality. Second, with the aim to improve the accuracy and performance of cluster analysis, this dissertation introduces k-LIDoids, a subspace clustering algorithm that extends the utility of support-weighted ID within a clustering framework in order to gradually select the subset of informative and important features per cluster. k-LIDoids is able to construct clusters together with finding a low dimensional subspace for each cluster. Finally, using the compact object and cluster representations from NNWID-Descent and k-LIDoids, this dissertation defines LID-Fingerprint, a new binary fingerprinting and multi-level indexing framework for the high-dimensional data. LID-Fingerprint can be used for hiding the information as a way of preventing passive adversaries as well as providing an efficient and secure similarity search and retrieval for the data stored on the cloud. When compared to other state-of-the-art algorithms, the good practical performance provides an evidence for the effectiveness of the proposed algorithms for the data in high-dimensional spaces

    Machine learning based data pre-processing for the purpose of medical data mining and decision support

    Get PDF
    Building an accurate and reliable model for prediction for different application domains, is one of the most significant challenges in knowledge discovery and data mining. Sometimes, improved data quality is itself the goal of the analysis, usually to improve processes in a production database and the designing of decision support. As medicine moves forward there is a need for sophisticated decision support systems that make use of data mining to support more orthodox knowledge engineering and Health Informatics practice. However, the real-life medical data rarely complies with the requirements of various data mining tools. It is often inconsistent, noisy, containing redundant attributes, in an unsuitable format, containing missing values and imbalanced with regards to the outcome class label.Many real-life data sets are incomplete, with missing values. In medical data mining the problem with missing values has become a challenging issue. In many clinical trials, the medical report pro-forma allow some attributes to be left blank, because they are inappropriate for some class of illness or the person providing the information feels that it is not appropriate to record the values for some attributes. The research reported in this thesis has explored the use of machine learning techniques as missing value imputation methods. The thesis also proposed a new way of imputing missing value by supervised learning. A classifier was used to learn the data patterns from a complete data sub-set and the model was later used to predict the missing values for the full dataset. The proposed machine learning based missing value imputation was applied on the thesis data and the results are compared with traditional Mean/Mode imputation. Experimental results show that all the machine learning methods which we explored outperformed the statistical method (Mean/Mode).The class imbalance problem has been found to hinder the performance of learning systems. In fact, most of the medical datasets are found to be highly imbalance in their class label. The solution to this problem is to reduce the gap between the minority class samples and the majority class samples. Over-sampling can be applied to increase the number of minority class sample to balance the data. The alternative to over-sampling is under-sampling where the size of majority class sample is reduced. The thesis proposed one cluster based under-sampling technique to reduce the gap between the majority and minority samples. Different under-sampling and over-sampling techniques were explored as ways to balance the data. The experimental results show that for the thesis data the new proposed modified cluster based under-sampling technique performed better than other class balancing techniques.In further research it is found that the class imbalance problem not only affects the classification performance but also has an adverse effect on feature selection. The thesis proposed a new framework for feature selection for class imbalanced datasets. The research found that, using the proposed framework the classifier needs less attributes to show high accuracy, and more attributes are needed if the data is highly imbalanced.The research described in the thesis contains the flowing four novel main contributions.a) Improved data mining methodology for mining medical datab) Machine learning based missing value imputation methodc) Cluster Based semi-supervised class balancing methodd) Feature selection framework for class imbalance datasetsThe performance analysis and comparative study show that the use of proposed method of missing value imputation, class balancing and feature selection framework can provide an effective approach to data preparation for building medical decision support
    corecore