59 research outputs found

    Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm

    Full text link
    Over the past five decades, k-means has become the clustering algorithm of choice in many application domains primarily due to its simplicity, time/space efficiency, and invariance to the ordering of the data points. Unfortunately, the algorithm's sensitivity to the initial selection of the cluster centers remains to be its most serious drawback. Numerous initialization methods have been proposed to address this drawback. Many of these methods, however, have time complexity superlinear in the number of data points, which makes them impractical for large data sets. On the other hand, linear methods are often random and/or sensitive to the ordering of the data points. These methods are generally unreliable in that the quality of their results is unpredictable. Therefore, it is common practice to perform multiple runs of such methods and take the output of the run that produces the best results. Such a practice, however, greatly increases the computational requirements of the otherwise highly efficient k-means algorithm. In this chapter, we investigate the empirical performance of six linear, deterministic (non-random), and order-invariant k-means initialization methods on a large and diverse collection of data sets from the UCI Machine Learning Repository. The results demonstrate that two relatively unknown hierarchical initialization methods due to Su and Dy outperform the remaining four methods with respect to two objective effectiveness criteria. In addition, a recent method due to Erisoglu et al. performs surprisingly poorly.Comment: 21 pages, 2 figures, 5 tables, Partitional Clustering Algorithms (Springer, 2014). arXiv admin note: substantial text overlap with arXiv:1304.7465, arXiv:1209.196

    45S rDNA external transcribed spacer organization reveals new phylogenetic relationships in Avena genus

    Get PDF
    Research ArticleThe genus Avena comprises four distinct genomes organized in diploid (AA or CC), tetraploid (AABB or AACC) and hexaploid species (AACCDD), constituting an interesting model for phylogenetic analysis. The aim of this work was to characterize 45S rDNA intergenic spacer (IGS) variability in distinct species representative of Avena genome diversity±A. strigosa (AA), A. ventricosa (CvCv), A. eriantha (CpCp), A. barbata (AABB), A. murphyi (AACC), A. sativa (AACCDD) and A. sterilis (AACCDD) through the assessment of the 5' external transcribed spacer (5'-ETS), a promising IGS region for phylogenetic studies poorly studied in Avena genus. In this work, IGS length polymorphisms were detected mainly due to distinct 5'-ETS sequence types resulting from major differences in the number and organization of repeated motifs. Although species with A genome revealed a 5'-ETS organization (A-organization) similar to the one previously described in A. sativa, a distinct organization was unraveled in C genome diploid species (C-organization). Interestingly, such new organization presents a higher similarity with other Poaceae species than A-genome sequences, supporting the hypothesis of C-genome being the ancestral Avena genome. Additionally, polyploid species with both genomes mainly retain the A-genome 5'-ETS organization, confirming the preferential elimination of C-genome sequences in Avena polyploid species. Moreover, 5'-ETS sequences phylogenetic analysis consistently clustered the species studied according to ploidy and genomic constitution supporting the use of ribosomal genes to highlight Avena species evolutive pathways.info:eu-repo/semantics/publishedVersio

    Survival and long-term maintenance of tertiary trees in the Iberian Peninsula during the Pleistocene. First record of Aesculus L.

    Get PDF
    The Italian and Balkan peninsulas have been places traditionally highlighted as Pleistocene glacial refuges. The Iberian Peninsula, however, has been a focus of controversy between geobotanists and palaeobotanists as a result of its exclusion from this category on different occasions. In the current paper, we synthesise geological, molecular, palaeobotanical and geobotanical data that show the importance of the Iberian Peninsula in the Western Mediterranean as a refugium area. The presence of Aesculus aff. hippocastanum L. at the Iberian site at Cal Guardiola (Tarrasa, Barcelona, NE Spain) in the Lower– Middle Pleistocene transition helps to consolidate the remarkable role of the Iberian Peninsula in the survival of tertiary species during the Pleistocene. The palaeodistribution of the genus in Europe highlights a model of area abandonment for a widely-distributed species in the Miocene and Pliocene, leading to a diminished and fragmentary presence in the Pleistocene and Holocene on the southern Mediterranean peninsulas. Aesculus fossils are not uncommon within the series of Tertiary taxa. Many appear in the Pliocene and suffer a radical impoverishment in the Lower–Middle Pleistocene transition. Nonetheless some of these tertiary taxa persisted throughout the Pleistocene and Holocene up to the present in the Iberian Peninsula. Locating these refuge areas on the Peninsula is not an easy task, although areas characterised by a sustained level of humidity must have played an predominant role

    Glaciation Effects on the Phylogeographic Structure of Oligoryzomys longicaudatus (Rodentia: Sigmodontinae) in the Southern Andes

    Get PDF
    The long-tailed pygmy rice rat Oligoryzomys longicaudatus (Sigmodontinae), the major reservoir of Hantavirus in Chile and Patagonian Argentina, is widely distributed in the Mediterranean, Temperate and Patagonian Forests of Chile, as well as in adjacent areas in southern Argentina. We used molecular data to evaluate the effects of the last glacial event on the phylogeographic structure of this species. We examined if historical Pleistocene events had affected genetic variation and spatial distribution of this species along its distributional range. We sampled 223 individuals representing 47 localities along the species range, and sequenced the hypervariable domain I of the mtDNA control region. Aligned sequences were analyzed using haplotype network, Bayesian population structure and demographic analyses. Analysis of population structure and the haplotype network inferred three genetic clusters along the distribution of O. longicaudatus that mostly agreed with the three major ecogeographic regions in Chile: Mediterranean, Temperate Forests and Patagonian Forests. Bayesian Skyline Plots showed constant population sizes through time in all three clusters followed by an increase after and during the Last Glacial Maximum (LGM; between 26,000–13,000 years ago). Neutrality tests and the “g” parameter also suggest that populations of O. longicaudatus experienced demographic expansion across the species entire range. Past climate shifts have influenced population structure and lineage variation of O. longicaudatus. This species remained in refugia areas during Pleistocene times in southern Temperate Forests (and adjacent areas in Patagonia). From these refugia, O. longicaudatus experienced demographic expansions into Patagonian Forests and central Mediterranean Chile using glacial retreats

    Surgical site infection after gastrointestinal surgery in high-income, middle-income, and low-income countries: a prospective, international, multicentre cohort study

    Get PDF
    Background: Surgical site infection (SSI) is one of the most common infections associated with health care, but its importance as a global health priority is not fully understood. We quantified the burden of SSI after gastrointestinal surgery in countries in all parts of the world. Methods: This international, prospective, multicentre cohort study included consecutive patients undergoing elective or emergency gastrointestinal resection within 2-week time periods at any health-care facility in any country. Countries with participating centres were stratified into high-income, middle-income, and low-income groups according to the UN's Human Development Index (HDI). Data variables from the GlobalSurg 1 study and other studies that have been found to affect the likelihood of SSI were entered into risk adjustment models. The primary outcome measure was the 30-day SSI incidence (defined by US Centers for Disease Control and Prevention criteria for superficial and deep incisional SSI). Relationships with explanatory variables were examined using Bayesian multilevel logistic regression models. This trial is registered with ClinicalTrials.gov, number NCT02662231. Findings: Between Jan 4, 2016, and July 31, 2016, 13 265 records were submitted for analysis. 12 539 patients from 343 hospitals in 66 countries were included. 7339 (58·5%) patient were from high-HDI countries (193 hospitals in 30 countries), 3918 (31·2%) patients were from middle-HDI countries (82 hospitals in 18 countries), and 1282 (10·2%) patients were from low-HDI countries (68 hospitals in 18 countries). In total, 1538 (12·3%) patients had SSI within 30 days of surgery. The incidence of SSI varied between countries with high (691 [9·4%] of 7339 patients), middle (549 [14·0%] of 3918 patients), and low (298 [23·2%] of 1282) HDI (p < 0·001). The highest SSI incidence in each HDI group was after dirty surgery (102 [17·8%] of 574 patients in high-HDI countries; 74 [31·4%] of 236 patients in middle-HDI countries; 72 [39·8%] of 181 patients in low-HDI countries). Following risk factor adjustment, patients in low-HDI countries were at greatest risk of SSI (adjusted odds ratio 1·60, 95% credible interval 1·05–2·37; p=0·030). 132 (21·6%) of 610 patients with an SSI and a microbiology culture result had an infection that was resistant to the prophylactic antibiotic used. Resistant infections were detected in 49 (16·6%) of 295 patients in high-HDI countries, in 37 (19·8%) of 187 patients in middle-HDI countries, and in 46 (35·9%) of 128 patients in low-HDI countries (p < 0·001). Interpretation: Countries with a low HDI carry a disproportionately greater burden of SSI than countries with a middle or high HDI and might have higher rates of antibiotic resistance. In view of WHO recommendations on SSI prevention that highlight the absence of high-quality interventional research, urgent, pragmatic, randomised trials based in LMICs are needed to assess measures aiming to reduce this preventable complication

    Pooled analysis of WHO Surgical Safety Checklist use and mortality after emergency laparotomy

    Get PDF
    Background The World Health Organization (WHO) Surgical Safety Checklist has fostered safe practice for 10 years, yet its place in emergency surgery has not been assessed on a global scale. The aim of this study was to evaluate reported checklist use in emergency settings and examine the relationship with perioperative mortality in patients who had emergency laparotomy. Methods In two multinational cohort studies, adults undergoing emergency laparotomy were compared with those having elective gastrointestinal surgery. Relationships between reported checklist use and mortality were determined using multivariable logistic regression and bootstrapped simulation. Results Of 12 296 patients included from 76 countries, 4843 underwent emergency laparotomy. After adjusting for patient and disease factors, checklist use before emergency laparotomy was more common in countries with a high Human Development Index (HDI) (2455 of 2741, 89.6 per cent) compared with that in countries with a middle (753 of 1242, 60.6 per cent; odds ratio (OR) 0.17, 95 per cent c.i. 0.14 to 0.21, P <0001) or low (363 of 860, 422 per cent; OR 008, 007 to 010, P <0.001) HDI. Checklist use was less common in elective surgery than for emergency laparotomy in high-HDI countries (risk difference -94 (95 per cent c.i. -11.9 to -6.9) per cent; P <0001), but the relationship was reversed in low-HDI countries (+121 (+7.0 to +173) per cent; P <0001). In multivariable models, checklist use was associated with a lower 30-day perioperative mortality (OR 0.60, 0.50 to 073; P <0.001). The greatest absolute benefit was seen for emergency surgery in low- and middle-HDI countries. Conclusion Checklist use in emergency laparotomy was associated with a significantly lower perioperative mortality rate. Checklist use in low-HDI countries was half that in high-HDI countries.Peer reviewe

    QTL mapping for brown rot (Monilinia fructigena) resistance in an intraspecific peach (Prunus persica L. Batsch) F1 progeny

    Get PDF
    Brown rot (BR) caused by Monilinia spp. leads to significant post-harvest losses in stone fruit production, especially peach. Previous genetic analyses in peach progenies suggested that BR resistance segregates as a quantitative trait. In order to uncover genomic regions associated with this trait and identify molecular markers for assisted selection (MAS) in peach, an F1 progeny from the cross "Contender" (C, resistant) 7 "Elegant Lady" (EL, susceptible) was chosen for quantitative trait loci (QTL) analysis. Over two phenotyping seasons, skin (SK) and flesh (FL) artificial infections were performed on fruits using a Monilinia fructigena isolate. For each treatment, infection frequency (if) and average rot diameter (rd) were scored. Significant seasonal and intertrait correlations were found. Maturity date (MD) was significantly correlated with disease impact. Sixty-three simple sequence repeats (SSRs) plus 26 single-nucleotide polymorphism (SNP) markers were used to genotype the C 7 EL population and to construct a linkage map. C 7 EL map included the eight Prunus linkage groups (LG), spanning 572.92 cM, with an average interval distance of 6.9 cM, covering 78.73 % of the peach genome (V1.0). Multiple QTL mapping analysis including MD trait as covariate uncovered three genomic regions associated with BR resistance in the two phenotyping seasons: one containing QTLs for SK resistance traits near M1a (LG C 7 EL-2, R2 = 13.1-31.5 %) and EPPISF032 (LG C 7 EL-4, R2 = 11-14 %) and the others containing QTLs for FL resistance, near markers SNP_IGA_320761 and SNP_IGA_321601 (LG3, R2 = 3.0-11.0 %). These results suggest that in the C 7 EL F1 progeny, skin resistance to fungal penetration and flesh resistance to rot spread are distinguishable mechanisms constituting BR resistance trait, associated with different genomic regions. Discovered QTLs and their associated markers could assist selection of new cultivars with enhanced resistance to Monilinia spp. in fruit

    Global variation in anastomosis and end colostomy formation following left-sided colorectal resection

    Get PDF
    Background End colostomy rates following colorectal resection vary across institutions in high-income settings, being influenced by patient, disease, surgeon and system factors. This study aimed to assess global variation in end colostomy rates after left-sided colorectal resection. Methods This study comprised an analysis of GlobalSurg-1 and -2 international, prospective, observational cohort studies (2014, 2016), including consecutive adult patients undergoing elective or emergency left-sided colorectal resection within discrete 2-week windows. Countries were grouped into high-, middle- and low-income tertiles according to the United Nations Human Development Index (HDI). Factors associated with colostomy formation versus primary anastomosis were explored using a multilevel, multivariable logistic regression model. Results In total, 1635 patients from 242 hospitals in 57 countries undergoing left-sided colorectal resection were included: 113 (6·9 per cent) from low-HDI, 254 (15·5 per cent) from middle-HDI and 1268 (77·6 per cent) from high-HDI countries. There was a higher proportion of patients with perforated disease (57·5, 40·9 and 35·4 per cent; P < 0·001) and subsequent use of end colostomy (52·2, 24·8 and 18·9 per cent; P < 0·001) in low- compared with middle- and high-HDI settings. The association with colostomy use in low-HDI settings persisted (odds ratio (OR) 3·20, 95 per cent c.i. 1·35 to 7·57; P = 0·008) after risk adjustment for malignant disease (OR 2·34, 1·65 to 3·32; P < 0·001), emergency surgery (OR 4·08, 2·73 to 6·10; P < 0·001), time to operation at least 48 h (OR 1·99, 1·28 to 3·09; P = 0·002) and disease perforation (OR 4·00, 2·81 to 5·69; P < 0·001). Conclusion Global differences existed in the proportion of patients receiving end stomas after left-sided colorectal resection based on income, which went beyond case mix alone

    An Analysis of the Application of Simplified Silhouette to the Evaluation of k-means Clustering Validity

    Get PDF
    Silhouette is one of the most popular and effective internal measures for the evaluation of clustering validity. Simplified Silhouette is a computationally simplified version of Silhouette. However, to date Simplified Silhouette has not been systematically analysed in a specific clustering algorithm. This paper analyses the application of Simplified Silhouette to the evaluation of k-means clustering validity and compares it with the k-means Cost Function and the original Silhouette from both theoretical and empirical perspectives. The theoretical analysis shows that Simplified Silhouette has a mathematical relationship with both the k-means Cost Function and the original Silhouette, while empirically, we show that it has comparative performances with the original Silhouette, but is much faster in calculation. Based on our analysis, we conclude that for a given dataset the k-means Cost Function is still the most valid and efficient measure in the evaluation of the validity of k-means clustering with the same k value, but that Simplified Silhouette is more suitable than the original Silhouette in the selection of the best result from k-means clustering with different k values
    corecore