Search CORE

15,987 research outputs found

A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm

Author: Al Hasan
Al-Daoud
Aloise
Aloise
Anderberg
Babu
Babu
Ball
Bei
Bergmann
Bottou
Breunig
Cao
Celebi
Chen
Chen
Daniel
Forgy
Friedman
Garcia
Garcia
Gonzalez
Hartigan
Hassan A. Kingravi
Hotelling
Huang
Huang
Hubert
Hyvärinen
Iman
Jain
Jain
Jancey
Kanungo
Katsavounidis
Kaufman
Lance
Likas
Linde
Lloyd
Lu
Luengo
M. Emre Celebi
Maitra
Mao
Matsumoto
Meilă
Milligan
Milligan
Norušis
Onoda
Ordonez
Pal
Patricio A. Vela
Pena
Redmond
Selim
Späth
Su
Tarsitano
Tou
Wu
Zhang
Publication venue: 'Elsevier BV'
Publication date: 10/09/2012
Field of study

K-means is undoubtedly the most widely used partitional clustering algorithm. Unfortunately, due to its gradient descent nature, this algorithm is highly sensitive to the initial placement of the cluster centers. Numerous initialization methods have been proposed to address this problem. In this paper, we first present an overview of these methods with an emphasis on their computational efficiency. We then compare eight commonly used linear time complexity initialization methods on a large and diverse collection of data sets using various performance criteria. Finally, we analyze the experimental results using non-parametric statistical tests and provide recommendations for practitioners. We demonstrate that popular initialization methods often perform poorly and that there are in fact strong alternatives to these methods.Comment: 17 pages, 1 figure, 7 table

arXiv.org e-Print Archive

Crossref

Analysis of Binding Site Hot Spots on the Surface of Ras GTPase

Author: Buhrman Greg
Kearney Bradley M.
Kovrigina Elizaveta A.
Kovriguine Evgueni
Kozakov Dima
Mattos Carla
Napoleon Raeanne
O\u27Connor Casey
Vajda Sandor
Zerbe Brandon
Publication venue: e-Publications@Marquette
Publication date: 01/11/2011
Field of study

We have recently discovered an allosteric switch in Ras, bringing an additional level of complexity to this GTPase whose mutants are involved in nearly 30% of cancers. Upon activation of the allosteric switch, there is a shift in helix 3/loop 7 associated with a disorder to order transition in the active site. Here, we use a combination of multiple solvent crystal structures and computational solvent mapping (FTMap) to determine binding site hot spots in the “off” and “on” allosteric states of the GTP-bound form of H-Ras. Thirteen sites are revealed, expanding possible target sites for ligand binding well beyond the active site. Comparison of FTMaps for the H and K isoforms reveals essentially identical hot spots. Furthermore, using NMR measurements of spin relaxation, we determined that K-Ras exhibits global conformational dynamics very similar to those we previously reported for H-Ras. We thus hypothesize that the global conformational rearrangement serves as a mechanism for allosteric coupling between the effector interface and remote hot spots in all Ras isoforms. At least with respect to the binding sites involving the G domain, H-Ras is an excellent model for K-Ras and probably N-Ras as well. Ras has so far been elusive as a target for drug design. The present work identifies various unexplored hot spots throughout the entire surface of Ras, extending the focus from the disordered active site to well-ordered locations that should be easier to target

epublications@Marquette

PubMed Central

Disentangling different types of El Ni\~no episodes by evolving climate network analysis

Author: Donges Jonathan F.
Donner Reik V.
Kurths Jürgen
Radebach Alexander
Runge Jakob
Publication venue: 'American Physical Society (APS)'
Publication date: 21/10/2013
Field of study

Complex network theory provides a powerful toolbox for studying the structure of statistical interrelationships between multiple time series in various scientific disciplines. In this work, we apply the recently proposed climate network approach for characterizing the evolving correlation structure of the Earth's climate system based on reanalysis data of surface air temperatures. We provide a detailed study on the temporal variability of several global climate network characteristics. Based on a simple conceptual view on red climate networks (i.e., networks with a comparably low number of edges), we give a thorough interpretation of our evolving climate network characteristics, which allows a functional discrimination between recently recognized different types of El Ni\~no episodes. Our analysis provides deep insights into the Earth's climate system, particularly its global response to strong volcanic eruptions and large-scale impacts of different phases of the El Ni\~no Southern Oscillation (ENSO).Comment: 20 pages, 12 figure

arXiv.org e-Print Archive

Aberdeen University Research

An electrostatic mechanism for Ca(2+)-mediated regulation of gap junction channels.

Author: Abagyan Ruben
Acharya Chayan
Baker Kent A
Bennett Brad C
Harris Andrew L
McIntire William E
Purdy Michael D
Stevens Raymond C
Yeager Mark
Zhang Qinghai
Publication venue: eScholarship, University of California
Publication date: 01/01/2016
Field of study

Gap junction channels mediate intercellular signalling that is crucial in tissue development, homeostasis and pathologic states such as cardiac arrhythmias, cancer and trauma. To explore the mechanism by which Ca(2+) blocks intercellular communication during tissue injury, we determined the X-ray crystal structures of the human Cx26 gap junction channel with and without bound Ca(2+). The two structures were nearly identical, ruling out both a large-scale structural change and a local steric constriction of the pore. Ca(2+) coordination sites reside at the interfaces between adjacent subunits, near the entrance to the extracellular gap, where local, side chain conformational rearrangements enable Ca(2+)chelation. Computational analysis revealed that Ca(2+)-binding generates a positive electrostatic barrier that substantially inhibits permeation of cations such as K(+) into the pore. Our results provide structural evidence for a unique mechanism of channel regulation: ionic conduction block via an electrostatic barrier rather than steric occlusion of the channel pore

PubMed Central

eScholarship - University of California

EEG sleep stages identification based on weighted undirected complex networks

Author: Abdulla Shahab
Diykh Mohammed
Li Yan
Publication venue: 'Elsevier BV'
Publication date: 01/02/2020
Field of study

Sleep scoring is important in sleep research because any errors in the scoring of the patient's sleep electroencephalography (EEG) recordings can cause serious problems such as incorrect diagnosis, medication errors, and misinterpretations of patient's EEG recordings. The aim of this research is to develop a new automatic method for EEG sleep stages classification based on a statistical model and weighted brain networks. Methods each EEG segment is partitioned into a number of blocks using a sliding window technique. A set of statistical features are extracted from each block. As a result, a vector of features is obtained to represent each EEG segment. Then, the vector of features is mapped into a weighted undirected network. Different structural and spectral attributes of the networks are extracted and forwarded to a least square support vector machine (LS-SVM) classifier. At the same time the network's attributes are also thoroughly investigated. It is found that the network's characteristics vary with their sleep stages. Each sleep stage is best represented using the key features of their networks. Results In this paper, the proposed method is evaluated using two datasets acquired from different channels of EEG (Pz-Oz and C3-A2) according to the R&K and the AASM without pre-processing the original EEG data. The obtained results by the LS-SVM are compared with those by Naïve, k-nearest and a multi-class-SVM. The proposed method is also compared with other benchmark sleep stages classification methods. The comparison results demonstrate that the proposed method has an advantage in scoring sleep stages based on single channel EEG signals. Conclusions An average accuracy of 96.74% is obtained with the C3-A2 channel according to the AASM standard, and 96% with the Pz-Oz channel based on the R&K standard

University of Southern Queensland ePrints

DENSITY OF AMORPHOUS CARBON BY USING DENSITY FUNCTIONAL THEORY

Author: Chinkanjanarot Sorayot
Publication venue: Digital Commons @ Michigan Tech
Publication date: 01/01/2014
Field of study

Amorphous carbon has been investigated for a long time. Since it has the random orientation of carbon atoms, its density depends on the position of each carbon atom. It is important to know the density of amorphous carbon to use it for modeling advance carbon materials in the future. Two methods were used to create the initial structures of amorphous carbon. One is the random placement method by randomly locating 100 carbon atoms in a cubic lattice. Another method is the liquid-quench method by using reactive force field (ReaxFF) to rapidly decrease the system of 100 carbon atoms from the melting temperature. Density functional theory (DFT) was used to refine the position of each carbon atom and the dimensions of the boundaries to minimize the ground energy of the structure. The average densities of amorphous carbon structures created by the random placement method and the liquid-quench method are 2.59 and 2.44 g/cm3, respectively. Both densities have a good agreement with previous works. In addition, the final structure of amorphous carbon generated by the liquid-quench method has lower energy

Michigan Technological University

QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment

Author: Camille Coti
Camille Coti
Camille Coti
Emmanuel Agullo
Emmanuel Agullo
Emmanuel Agullo
Jack Dongarra
Jack Dongarra
Jack Dongarra
Julien Langou
Julien Langou
Qr Fac
Thomas Herault
Thomas Herault
Thomas Herault
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/12/2009
Field of study

Previous studies have reported that common dense linear algebra operations do not achieve speed up by using multiple geographical sites of a computational grid. Because such operations are the building blocks of most scientific applications, conventional supercomputers are still strongly predominant in high-performance computing and the use of grids for speeding up large-scale scientific problems is limited to applications exhibiting parallelism at a higher level. We have identified two performance bottlenecks in the distributed memory algorithms implemented in ScaLAPACK, a state-of-the-art dense linear algebra library. First, because ScaLAPACK assumes a homogeneous communication network, the implementations of ScaLAPACK algorithms lack locality in their communication pattern. Second, the number of messages sent in the ScaLAPACK algorithms is significantly greater than other algorithms that trade flops for communication. In this paper, we present a new approach for computing a QR factorization -- one of the main dense linear algebra kernels -- of tall and skinny matrices in a grid computing environment that overcomes these two bottlenecks. Our contribution is to articulate a recently proposed algorithm (Communication Avoiding QR) with a topology-aware middleware (QCG-OMPI) in order to confine intensive communications (ScaLAPACK calls) within the different geographical sites. An experimental study conducted on the Grid'5000 platform shows that the resulting performance increases linearly with the number of geographical sites on large-scale problems (and is in particular consistently higher than ScaLAPACK's).Comment: Accepted at IPDPS10. (IEEE International Parallel & Distributed Processing Symposium 2010 in Atlanta, GA, USA.

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL-Rennes 1