10 research outputs found

    Landscape of clustering algorithms

    Full text link

    FARKLI BAĞLANTI YÖNTEMLERİ İLE HİYERARŞİK KÜMELEME TOPLULUĞU

    Get PDF
    Kümeleme topluluğu, yüksek kümeleme performansı sağlaması nedeniyle son yıllarda tercih edilen bir teknik haline gelmiştir. Bu çalışmada, Bağlantı-tabanlı Hiyerarşik Kümeleme Topluluğu (BHKT) olarak isimlendirilen yeni bir yaklaşım önerilmektedir. Önerilen yaklaşımda, topluluk elemanları farklı bağlantı yöntemleri kullanarak hiyerarşik kümeleme yapmakta ve sonrasında çoğunluk oylaması ile ortak karar üretmektedir. Çalışmada kullanılan bağlantı yöntemleri: tek bağlantı, tam bağlantı, ortalama bağlantı, merkez bağlantı, Ward yöntemi, komşu birleştirme yöntemi ve ayarlı tam bağlantıdır. Ayrıca çalışmada, farklı boyutlardaki hiyerarşik kümeleme toplulukları incelenmiş ve birbiriyle karşılaştırılmıştır. Deneysel çalışmalarda, hiyerarşik kümeleme toplulukları 8 farklı veri setinde uygulanmış ve tek bir kümeleme algoritmasına göre daha iyi sonuçlar elde edilmiştir

    Generative object detection and tracking in 3D range data

    Get PDF

    Interpreting and Extending Classical Agglomerative Clustering Algorithms Using a Model-Based Approach

    No full text
    We present two results which arise from a model-based approach to hierarchical agglomerative clustering. First, we show formally that the common heuristic agglomerative clustering algorithms -- Ward's method, single-link, complete-link, and a variant of group-average -- are each equivalent to a hierarchical model-based method. This interpretation gives a theoretical explanation of the empirical behavior of these algorithms, as well as a principled approach to resolving practical issues, such as number of clusters or the choice of method. Second, we show how a model-based viewpoint can suggest variations on these basic agglomerative algorithms

    A BUNDLED PAYMENT PROFILE FOR HEAD & NECK CANCER: DESCRIPTIVE STATISTICS, RISK ASSESSMENT, AND PRICING RECOMMENDATIONS FOR 1 YEAR TREATMENT BUNDLES USING A LARGE NATIONAL CLAIMS DATABASE

    Get PDF
    Bundled payments have the opportunity to promote care standardization and coordination while incentivizing efficiency and value-based healthcare delivery. However, bundled payments have been scrutinized due to challenges with defining the bundled lengths (also known as the episode of care period), limited inclusion/exclusion criteria, the absence of IT systems to support new payment models, and the lack of federal support. The objective of this study was to develop a profile for head and neck cancer, including descriptive statistics, risk assessment and bundled payment pricing recommendations using a large national claims database. The ability to assess pricing risks associated with head and neck cancer bundled payments across the US from a large claims database can provide evidence to either support or discredit the feasibility of bundled payment reform. The results of the study highlighted the total episode costs for head and neck cancer from start of treatment and a transparent bundled payment methodology. The results are as follows: 1) head and neck episode cancer costs on average 164,332withastandarddeviationof164,332 with a standard deviation of 106,500, and median episode costs of $143,806; 2) bundled payments were developed using a complete-linked hierarchical clustering analysis of 2 possible bundling approaches of either 3 bundled payment groups or 4 bundled payment groups; 3) a monte carlo simulation resulted in recommendations that pricing negotiations not start at the 50th percentile of the bundled payment group cost as suggested by previous studies but rather at the 75th percentile; 4) study aims were summarized and displayed in a visual framework to provide a practical ‘how-to guide’ for organizations looking to start modeling bundled payments. This analysis proves bundled payment grouping is feasible and viability, albeit dependent on an organization’s ability to control healthcare spending costs and negotiate bundled payment prices above costs. The results of this work demonstrate the use of statistical and financial models to support price models and sensitivity analyses. Healthcare leaders can use these models to better understand their expected costs/ profits and leverage their negotiations; however, it should be noted that this research does not suggest all bundled payment methodologies are profitable

    Urban air pollution modelling with machine learning using fixed and mobile sensors

    Get PDF
    Detailed air quality (AQ) information is crucial for sustainable urban management, and many regions in the world have built static AQ monitoring networks to provide AQ information. However, they can only monitor the region-level AQ conditions or sparse point-based air pollutant measurements, but cannot capture the urban dynamics with high-resolution spatio-temporal variations over the region. Without pollution details, citizens will not be able to make fully informed decisions when choosing their everyday outdoor routes or activities, and policy-makers can only make macroscopic regulating decisions on controlling pollution triggering factors and emission sources. An increasing research effort has been paid on mobile and ubiquitous sampling campaigns as they are deemed the more economically and operationally feasible methods to collect urban AQ data with high spatio-temporal resolution. The current research proposes a Machine Learning based AQ Inference (Deep AQ) framework from data-driven perspective, consisting of data pre-processing, feature extraction and transformation, and pixelwise (grid-level) AQ inference. The Deep AQ framework is adaptable to integrate AQ measurements from the fixed monitoring sites (temporally dense but spatially sparse), and mobile low-cost sensors (temporally sparse but spatially dense). While instantaneous pollutant concentration varies in the micro-environment, this research samples representative values in each grid-cell-unit and achieves AQ inference at 1 km \times 1 km pixelwise scale. This research explores the predictive power of the Deep AQ framework based on samples from only 40 fixed monitoring sites in Chengdu, China (4,900 {\mathrm{km}}^\mathrm{2}, 26 April - 12 June 2019) and collaborative sampling from 28 fixed monitoring sites and 15 low-cost sensors equipped with taxis deployed in Beijing, China (3,025 {\mathrm{km}}^\mathrm{2}, 19 June - 16 July 2018). The proposed Deep AQ framework is capable of producing high-resolution (1 km \times 1 km, hourly) pixelwise AQ inference based on multi-source AQ samples (fixed or mobile) and urban features (land use, population, traffic, and meteorological information, etc.). This research has achieved high-resolution (1 km \times 1 km, hourly) AQ inference (Chengdu: less than 1% spatio-temporal coverage; Beijing: less than 5% spatio-temporal coverage) with reasonable and satisfactory accuracy by the proposed methods in urban cases (Chengdu: SMAPE \mathrm{<} 20%; Beijing: SMAPE \mathrm{<} 15%). Detailed outcomes and main conclusions are provided in this thesis on the aspects of fixed and mobile sensing, spatio-temporal coverage and density, and the relative importance of urban features. Outcomes from this research facilitate to provide a scientific and detailed health impact assessment framework for exposure analysis and inform policy-makers with data driven evidence for sustainable urban management.Open Acces

    Advanced Probabilistic Models for Clustering and Projection

    Get PDF
    Probabilistic modeling for data mining and machine learning problems is a fundamental research area. The general approach is to assume a generative model underlying the observed data, and estimate model parameters via likelihood maximization. It has the deep probability theory as the mathematical background, and enjoys a large amount of methods from statistical learning, sampling theory and Bayesian statistics. In this thesis we study several advanced probabilistic models for data clustering and feature projection, which are the two important unsupervised learning problems. The goal of clustering is to group similar data points together to uncover the data clusters. While numerous methods exist for various clustering tasks, one important question still remains, i.e., how to automatically determine the number of clusters. The first part of the thesis answers this question from a mixture modeling perspective. A finite mixture model is first introduced for clustering, in which each mixture component is assumed to be an exponential family distribution for generality. The model is then extended to an infinite mixture model, and its strong connection to Dirichlet process (DP) is uncovered which is a non-parametric Bayesian framework. A variational Bayesian algorithm called VBDMA is derived from this new insight to learn the number of clusters automatically, and empirical studies on some 2D data sets and an image data set verify the effectiveness of this algorithm. In feature projection, we are interested in dimensionality reduction and aim to find a low-dimensional feature representation for the data. We first review the well-known principal component analysis (PCA) and its probabilistic interpretation (PPCA), and then generalize PPCA to a novel probabilistic model which is able to handle non-linear projection known as kernel PCA. An expectation-maximization (EM) algorithm is derived for kernel PCA such that it is fast and applicable to large data sets. Then we propose a novel supervised projection method called MORP, which can take the output information into account in a supervised learning context. Empirical studies on various data sets show much better results compared to unsupervised projection and other supervised projection methods. At the end we generalize MORP probabilistically to propose SPPCA for supervised projection, and we can also naturally extend the model to S2PPCA which is a semi-supervised projection method. This allows us to incorporate both the label information and the unlabeled data into the projection process. In the third part of the thesis, we introduce a unified probabilistic model which can handle data clustering and feature projection jointly. The model can be viewed as a clustering model with projected features, and a projection model with structured documents. A variational Bayesian learning algorithm can be derived, and it turns out to iterate the clustering operations and projection operations until convergence. Superior performance can be obtained for both clustering and projection

    Improving Seasonal Factor Estimates for Adjustment of Annual Average Daily Traffic

    Get PDF
    Traffic volume data are input to many transportation analyses including planning, roadway design, pavement design, air quality, roadway maintenance, funding allocation, etc. Annual Average Daily Traffic (AADT) is one of the most often used measures of traffic volume. Acquiring the actual AADT data requires the collection of traffic counts continuously throughout a year, which is expensive, thus, can only be conducted at a very limited number of locations. Typically, AADTs are estimated by applying seasonal factors (SFs) to short-term counts collected at portable traffic monitoring sites (PTMSs). Statewide in Florida, the Florida Department of Transportation (FDOT) operates about 300 permanent traffic monitoring sites (TTMSs) to collect traffic counts at these sites continuously. TTMSs are first manually classified into different groups (known as seasonal factor categories) based on both engineering judgment and similarities in the traffic and roadway characteristics. A seasonal factor category is then assigned to each PTMS according to the site’s functional classification and geographical location. The SFs of the assigned category are then used to adjust traffic counts collected at PTMSs to estimate the final AADTs. This dissertation research aims to develop a more objective and data-driven method to improve the accuracy of SFs for adjusting PTMSs. A statewide investigation was first conducted to identify potential influential factors that contribute to seasonal fluctuations in traffic volumes in both urban and rural areas in Florida. The influential factors considered include roadway functional classification, demographic, socioeconomic, land use, etc. Based on these factors, a methodology was developed for assigning seasonal factors from one or more TTMSs to each PTMS. The assigned seasonal factors were validated with data from existing TTMSs. The results show that the average errors of the estimated seasonal factors are, on average, about 4 percent. Nearly 95 percent of the estimated monthly SFs contain errors of no more than 10 percent. It was concluded that the method could be applied to improve the accuracy in AADT estimation for both urban and rural areas in Florida

    Colour morphological sieves for scale-space image processing

    Get PDF
    EThOS - Electronic Theses Online ServiceGBUnited Kingdo
    corecore