Search CORE

22 research outputs found

Uncovering Group Level Insights with Accordant Clustering

Author: Ackerman Margareta
Dhurandhar Amit
Wang Xiang
Publication venue
Publication date: 07/04/2017
Field of study

Clustering is a widely-used data mining tool, which aims to discover partitions of similar items in data. We introduce a new clustering paradigm, \emph{accordant clustering}, which enables the discovery of (predefined) group level insights. Unlike previous clustering paradigms that aim to understand relationships amongst the individual members, the goal of accordant clustering is to uncover insights at the group level through the analysis of their members. Group level insight can often support a call to action that cannot be informed through previous clustering techniques. We propose the first accordant clustering algorithm, and prove that it finds near-optimal solutions when data possesses inherent cluster structure. The insights revealed by accordant clusterings enabled experts in the field of medicine to isolate successful treatments for a neurodegenerative disease, and those in finance to discover patterns of unnecessary spending.Comment: accepted to SDM 2017 (oral

arXiv.org e-Print Archive

Crossref

Performance Evaluation of EM and K-Means Clustering Algorithms in Data Mining System

Author: Shaik Firoj Basha, Dr. S. Ramakrishna
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 30/06/2017
Field of study

In the Emerging field of Data Mining System there are different techniques namely Clustering, Prediction, Classification, and Association etc. Clustering technique performs by dividing the particular data set into associated groups such that every group does not have anything in common.Clustering algorithms have emerged as an alternative powerful meta-learning tool to accurately analyze the massive volume of data generated by modern applications. Actually the main goal is to classify data into clusters such that objects are clustered in the same cluster when they are related according to particular metrics. Classification is the organization of data sets into some predefined sets using various mathematical models. This research discusses the comparison of algorithms K-Means and Expectation-Maximization in clustering. Empirically, we focused on wide experiments where wecompared the best typical algorithm from each of the categories using a large number of real or bigdata sets. The effectiveness of the Expectation-Maximization clustering algorithm is measured through a number of internaland external validity metrics, stability, runtime and scalability tests

International Journal on Recent and Innovation Trends in Computing and Communication

WETLAND CHANGE DETECTION USING SENTINEL-2 IN THE PART OF LATVIA

Author: Breidaks Juris
Puķītis Mārtiņš
Skromulis Andris
Publication venue: 'Rezekne Academy of Technologies'
Publication date: 13/06/2023
Field of study

In the article, the possible impact of changes on wetland were analysed by the semi-supervised classification method of statistical analysis. The Sentinel-2 raw data between two different seasons are combined together. The data preparation is shortly described in the article. Data is clustered with unsupervised method. The article describes a supervised method – how data credibility and classification can be estimated if its reference is poor quality.

Journals of Rezekne Academy of Technologies

The Scientific Journal of Rezeknes Augstskola

Machine Learning for Detection of Cognitive Impairment

Author: Diaz Valeria
Rodríguez Guillermo Horacio
Publication venue: Budapest Tech
Publication date: 01/03/2022
Field of study

The detection of cognitive problems, especially in the early stages, is critical and the method by which it is diagnosed is manual and depends on one or more specialist doctors, to diagnose it as the cognitive decline escalates into the early stage of dementia, e.g., Alzheimer's disease (AD). The early stages of AD are very similar to Mild Cognitive Impairment (MCI); it is essential to identify the possible factors associated with the disease. This research aims to demonstrate that automated models can differentiate and classify MCI and AD in the early stages. The present research used a combination of Machine Learning (ML) algorithms to identify AD, using gene expressions. The algorithms used for the classification of cognitive problems and healthy people (control) were: Linear Regression, Decision Trees (DT), Naîve Bayes (NB) and Deep Learning (DP). The result of this research shows ML algorithms can identify AD, in early stages, with an 80% accuracy, using a Deep Learning (DL) algorithm.Fil: Diaz, Valeria. Universidad de Palermo. Facultad de Ingeniería; ArgentinaFil: Rodríguez, Guillermo Horacio. Universidad Nacional del Centro de la Provincia de Buenos Aires. Facultad de Ciencias Exactas. Instituto de Sistemas Tandil; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil. Instituto Superior de Ingeniería del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de Ingeniería del Software; Argentin

CONICET Digital

Clustering datasets by complex networks analysis

Author: A Arenas
A Frank
AK Jain
C Eick
HH Mark
K Alsabti
M Newman
MEJ Newman
O Sporns
P Mucha
R Albert
R Guimer
R Tibshirani
R Toivonen
T Heimo
U von Luxburg
V Gudkov
VD Blondel
Z Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

A Discriminative Locally-Adaptive Nearest Centroid Classifier for Phoneme Classification

Author: Sun Yong-Peng
Publication venue: 'University of Waterloo'
Publication date: 01/01/2012
Field of study

Phoneme classification is a key area of speech recognition. Phonemes are the basic modeling units in modern speech recognition and they are the constructive units of words. Thus, being able to quickly and accurately classify phonemes that are input to a speech-recognition system is a basic and important step towards improving and eventually perfecting speech recognition as a whole. Many classification approaches currently exist that can be applied to the task of classifying phonemes. These techniques range from simple ones such as the nearest centroid classifier to complex ones such as support vector machine. Amongst the existing classifiers, the simpler ones tend to be quicker to train but have lower accuracy, whereas the more complex ones tend to be higher in accuracy but are slower to train. Because phoneme classification involves very large datasets, it is desirable to have classifiers that are both quick to train and are high in accuracy. The formulation of such classifiers is still an active ongoing research topic in phoneme classification. One paradigm in formulating such classifiers attempts to increase the accuracies of the simpler classifiers with minimal sacrifice to their running times. The opposite paradigm attempts to increase the training speeds of the more complex classifiers with minimal sacrifice to their accuracies. The objective of this research is to develop a new centroid-based classifier that builds upon the simpler nearest centroid classifier by incorporating a new discriminative locally-adaptive training procedure developed from recent advances in machine learning. This new classifier, which is referred to as the discriminative locally-adaptive nearest centroid (DLANC) classifier, achieves much higher accuracies as compared to the nearest centroid classifier whilst having a relatively low computational complexity and being able to scale up to very large datasets

University of Waterloo's Institutional Repository

The Application of Unsupervised Clustering Methods to Alzheimer’s Disease

Author: Ahmed A. Moustafa
Areeg Abdalla
Hany Alashwal
Jacob J. Crouse
Mohamed El Halaby
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2019
Field of study

Clustering is a powerful machine learning tool for detecting structures in datasets. In the medical field, clustering has been proven to be a powerful tool for discovering patterns and structure in labeled and unlabeled datasets. Unlike supervised methods, clustering is an unsupervised method that works on datasets in which there is no outcome (target) variable nor is anything known about the relationship between the observations, that is, unlabeled data. In this paper, we focus on studying and reviewing clustering methods that have been applied to datasets of neurological diseases, especially Alzheimer’s disease (AD). The aim is to provide insights into which clustering technique is more suitable for partitioning patients of AD based on their similarity. This is important as clustering algorithms can find patterns across patients that are difficult for medical practitioners to find. We further discuss the implications of the use of clustering algorithms in the treatment of AD. We found that clustering analysis can point to several features that underlie the conversion from early-stage AD to advanced AD. Furthermore, future work can apply semi-clustering algorithms on AD datasets, which will enhance clusters by including additional information

Directory of Open Access Journals

Sydney eScholarship

Western Sydney ResearchDirect

Advanced models of supervised structural clustering

Author: Haponchyk Iryna
Publication venue: University of Trento
Publication date: 20/04/2018
Field of study

The strength and power of structured prediction approaches in machine learning originates from a proper recognition and exploitation of inherent structural dependencies within complex objects, which structural models are trained to output. Among the complex tasks that benefited from structured prediction approaches, clustering is of a special interest. Structured output models based on representing clusters by latent graph structures made the task of supervised clustering tractable. While in practice these models proved effective in solving the complex NLP task of coreference resolution, in this thesis, we aim at exploring their capacity to be extended to other tasks and domains, as well as the methods for performing such adaptation and for improvement in general, which, as a result, go beyond clustering and are commonly applicable in structured prediction. Studying the extensibility of the structural approaches for supervised clustering, we apply them to two different domains in two different ways. First, in the networking domain, we do clustering of network traffic by adapting the model, taking into account the continuity of incoming data. Our experiments demonstrate that the structural clustering approach is not only effective in such a scenario, but also, if changing the perspective, provides a novel potentially useful tool for detecting anomalies. The other part of our work is dedicated to assessing the amenability of the structural clustering model to joint learning with another structural model, for ranking. Our preliminary analysis in the context of the task of answer-passage reranking in question answering reveals a potential benefit of incorporating auxiliary clustering structures. Due to the intrinsic complexity of the clustering task and, respectively, its evaluation scenarios, it gave us grounds for studying the possibility and the effect from optimizing task-specific complex measures in structured prediction algorithms. It is common for structured prediction approaches to optimize surrogate loss functions, rather than the actual task-specific ones, in or- der to facilitate inference and preserve efficiency. In this thesis, we, first, study when surrogate losses are sufficient and, second, make a step towards enabling direct optimization of complex structural loss functions. We propose to learn an approximation of a complex loss by a regressor from data. We formulate a general structural framework for learning with a learned loss, which, applied to a particular case of a clustering problem – coreference resolution, i) enables the optimization of a coreference metric, by itself, having high computational complexity, and ii) delivers an improvement over the standard structural models optimizing simple surrogate objectives. We foresee this idea being helpful in many structured prediction applications, also as a means of adaptation to specific evaluation scenarios, and especially when a good loss approximation is found by a regressor from an induced feature space allowing good factorization over the underlying structure

Unitn-eprints PhD