1,032 research outputs found

    Possibilistic and fuzzy clustering methods for robust analysis of non-precise data

    Get PDF
    This work focuses on robust clustering of data affected by imprecision. The imprecision is managed in terms of fuzzy sets. The clustering process is based on the fuzzy and possibilistic approaches. In both approaches the observations are assigned to the clusters by means of membership degrees. In fuzzy clustering the membership degrees express the degrees of sharing of the observations to the clusters. In contrast, in possibilistic clustering the membership degrees are degrees of typicality. These two sources of information are complementary because the former helps to discover the best fuzzy partition of the observations while the latter reflects how well the observations are described by the centroids and, therefore, is helpful to identify outliers. First, a fully possibilistic k-means clustering procedure is suggested. Then, in order to exploit the benefits of both the approaches, a joint possibilistic and fuzzy clustering method for fuzzy data is proposed. A selection procedure for choosing the parameters of the new clustering method is introduced. The effectiveness of the proposal is investigated by means of simulated and real-life data

    A Heuristic Approach to Possibilistic Clustering for Fuzzy Data

    Get PDF
    The paper deals with the problem of the fuzzy data clustering. In other words, objects attributes can be represented by fuzzy numbers or fuzzy intervals. A direct algorithm of possibilistic clustering is the basis of an approach to the fuzzy data clustering. The paper provides the basic ideas of the method of clustering and a plan of the direct possibilistic clustering algorithm. Definitions of fuzzy intervals and fuzzy numbers are presented and distances for fuzzy numbers are considered. A concept of a vector of fuzzy numbers is introduced and the fuzzy data preprocessing methodology for constructing of a fuzzy tolerance matrix is described. A numerical example is given and results of application of the direct possibilistic clustering algorithm to a set of vectors of triangular fuzzy numbers are considered in the example. Some preliminary conclusions are stated

    SUBIC: A Supervised Bi-Clustering Approach for Precision Medicine

    Full text link
    Traditional medicine typically applies one-size-fits-all treatment for the entire patient population whereas precision medicine develops tailored treatment schemes for different patient subgroups. The fact that some factors may be more significant for a specific patient subgroup motivates clinicians and medical researchers to develop new approaches to subgroup detection and analysis, which is an effective strategy to personalize treatment. In this study, we propose a novel patient subgroup detection method, called Supervised Biclustring (SUBIC) using convex optimization and apply our approach to detect patient subgroups and prioritize risk factors for hypertension (HTN) in a vulnerable demographic subgroup (African-American). Our approach not only finds patient subgroups with guidance of a clinically relevant target variable but also identifies and prioritizes risk factors by pursuing sparsity of the input variables and encouraging similarity among the input variables and between the input and target variable

    Effects of training parameter concept and sample size in possibilistic c-means classifier for pigeon pea specific crop mapping

    Get PDF
    4openInternationalInternational coauthor/editorThis research work aims to study the effect of training parameter concept and sample size in the process of classification by using a fuzzy Possibilistic c-Means (PCM) approach for Pigeon Pea specific crop mapping. For specific class extraction, the “mean” of the training data is considered as a training parameter of the classification algorithm. In this study, we proposed an “Individual Sample as Mean” (ISM) approach where the individual training sample is accounted as a mean parameter for the fuzzy PCM classifier. In order to avoid the spectral overlap of target Pigeon pea crop with other crops in the study area, a temporal indices database was generated from Sentinel 2A/2B satellite images acquired during the 2019–2020 Pigeon Pea crop cycle. The spectral dimensionality of temporal data was reduced to extract the required bands to achieve maximum enhancement of the target crop class in the temporal data. Further, the training sample size was increased to study the heterogeneity within the class in the classified output. The proposed ISM approach delivered a higher mean membership difference (MMD) between the Pigeon Pea crop and the co-cultivated Cotton crop as compared to the conventional mean method. This indicated that a better separation was achieved between the target crop and the spectrally similar crop grown, that were cultivated in the same study area. When the sample size was gradually increased from 5 to 60, the MMD values within the Pigeon Pea test fields remained in the range 0.013–0.02, thereby implying that the proposed algorithm works better even with a small number of training samples. The heterogeneity was better handled using the proposed ISM approach since the variance obtained within Pigeon Pea field was only 0.008, as compared to that of 0.02 achieved using the conventional mean approachopenSivaraj, Priyadarsini; Kumar, Anil; Koti, Shiva Reddy; Naik, ParthSivaraj, P.; Kumar, A.; Koti, S.R.; Naik, P

    Building environmentally-aware classifiers on streaming data

    Get PDF
    The three biggest challenges currently faced in machine learning, in our estimation, are the staggering quantity of data we wish to analyze, the incredibly small proportion of these data that are labeled, and the apparent lack of interest in creating algorithms that continually learn during inference. An unsupervised streaming approach addresses all three of these challenges, storing only a finite amount of information to model an unbounded dataset and adapting to new structures as they arise. Specifically, we are motivated by automated target recognition (ATR) in synthetic aperture sonar (SAS) imagery, the problem of finding explosive hazards on the sea oor. It has been shown that the performance of ATR can be improved by, instead of using a single classifier for the entire ATR task, creating several specialized classifers and fusing their predictions [44]. The prevailing opinion seems be that one should have different classifiers for varying complexity of sea oor [74], but we hypothesize that fusing classifiers based on sea bottom type will yield higher accuracy and better lend itself to making explainable classification decisions. The first step of building such a system is developing a robust framework for online texture classification, the topic of this research. xi In this work, we improve upon StreamSoNG [85], an existing algorithm for streaming data analysis (SDA) that models each structure in the data with a neural gas [69] and detects new structures by clustering an outlier list with the possibilistic 1-means [62] (P1M) algorithm. We call the modified algorithm StreamSoNGv2, denoting that it is the second version, or verse, if you will, of StreamSoNG. Notable improvements include detection of arbitrarily-shaped clusters by using DBSCAN [37] instead of P1M, using growing neural gas [43] to model each structure with an adaptive number of prototypes, and an automated approach to estimate the n parameters. Furthermore, we propose a novel algorithm called single-pass possibilistic clustering (SPC) for solving the same task. SPC maintains a fixed number of structures to model the data stream. These structures can be updated and merged based only on their "footprints", that is, summary statistics that contain all of the information from the stream needed by the algorithm without directly maintaining the entire stream. SPC is built on a damped window framework, allowing the user to balance the weight between old and new points in the stream with a decay factor parameter. We evaluate the two algorithms under consideration against four state of the art SDA algorithms from the literature on several synthetic datasets and two texture datasets: one real (KTH-TIPS2b [68]) and xii one simulated. The simulated dataset, a significant research effort in itself, is of our own construction in Unreal Engine and contains on the order of 6,000 images at 720 x 720 resolution from six different texture types. Our hope is that the methodology developed here will be effective texture classifiers for use not only in underwater scene understanding, but also in improving performance of ATR algorithms by providing a context in which the potential target is embedded.Includes bibliographical references

    Bot recognition in a Web store: An approach based on unsupervised learning

    Get PDF
    Abstract Web traffic on e-business sites is increasingly dominated by artificial agents (Web bots) which pose a threat to the website security, privacy, and performance. To develop efficient bot detection methods and discover reliable e-customer behavioural patterns, the accurate separation of traffic generated by legitimate users and Web bots is necessary. This paper proposes a machine learning solution to the problem of bot and human session classification, with a specific application to e-commerce. The approach studied in this work explores the use of unsupervised learning (k-means and Graded Possibilistic c-Means), followed by supervised labelling of clusters, a generative learning strategy that decouples modelling the data from labelling them. Its efficiency is evaluated through experiments on real e-commerce data, in realistic conditions, and compared to that of supervised learning classifiers (a multi-layer perceptron neural network and a support vector machine). Results demonstrate that the classification based on unsupervised learning is very efficient, achieving a similar performance level as the fully supervised classification. This is an experimental indication that the bot recognition problem can be successfully dealt with using methods that are less sensitive to mislabelled data or missing labels. A very small fraction of sessions remain misclassified in both cases, so an in-depth analysis of misclassified samples was also performed. This analysis exposed the superiority of the proposed approach which was able to correctly recognize more bots, in fact, and identified more camouflaged agents, that had been erroneously labelled as humans

    Unsupervised tracking of time-evolving data streams and an application to short-term urban traffic flow forecasting

    Get PDF
    I am indebted to many people for their help and support I receive during my Ph.D. study and research at DIBRIS-University of Genoa. First and foremost, I would like to express my sincere thanks to my supervisors Prof.Dr. Masulli, and Prof.Dr. Rovetta for the invaluable guidance, frequent meetings, and discussions, and the encouragement and support on my way of research. I thanks all the members of the DIBRIS for their support and kindness during my 4 years Ph.D. I would like also to acknowledge the contribution of the projects Piattaforma per la mobili\ue0 Urbana con Gestione delle INformazioni da sorgenti eterogenee (PLUG-IN) and COST Action IC1406 High Performance Modelling and Simulation for Big Data Applications (cHiPSet). Last and most importantly, I wish to thanks my family: my wife Shaimaa who stays with me through the joys and pains; my daughter and son whom gives me happiness every-day; and my parents for their constant love and encouragement
    • …
    corecore