42,014 research outputs found
clues: An R Package for Nonparametric Clustering Based on Local Shrinking
Determining the optimal number of clusters appears to be a persistent and controversial issue in cluster analysis. Most existing R packages targeting clustering require the user to specify the number of clusters in advance. However, if this subjectively chosen number is far from optimal, clustering may produce seriously misleading results. In order to address this vexing problem, we develop the R package clues to automate and evaluate the selection of an optimal number of clusters, which is widely applicable in the field of clustering analysis. Package clues uses two main procedures, shrinking and partitioning, to estimate an optimal number of clusters by maximizing an index function, either the CH index or the Silhouette index, rather than relying on guessing a pre-specified number. Five agreement indices (Rand index, Hubert and ArabieâÂÂs adjusted Rand index, Morey and AgrestiâÂÂs adjusted Rand index, Fowlkes and Mallows index and Jaccard index), which measure the degree of agreement between any two partitions, are also provided in clues. In addition to numerical evidence, clues also supplies a deeper insight into the partitioning process with trajectory plots.
Quality of Radiomic Features in Glioblastoma Multiforme: Impact of Semi-Automated Tumor Segmentation Software.
ObjectiveThe purpose of this study was to evaluate the reliability and quality of radiomic features in glioblastoma multiforme (GBM) derived from tumor volumes obtained with semi-automated tumor segmentation software.Materials and methodsMR images of 45 GBM patients (29 males, 16 females) were downloaded from The Cancer Imaging Archive, in which post-contrast T1-weighted imaging and fluid-attenuated inversion recovery MR sequences were used. Two raters independently segmented the tumors using two semi-automated segmentation tools (TumorPrism3D and 3D Slicer). Regions of interest corresponding to contrast-enhancing lesion, necrotic portions, and non-enhancing T2 high signal intensity component were segmented for each tumor. A total of 180 imaging features were extracted, and their quality was evaluated in terms of stability, normalized dynamic range (NDR), and redundancy, using intra-class correlation coefficients, cluster consensus, and Rand Statistic.ResultsOur study results showed that most of the radiomic features in GBM were highly stable. Over 90% of 180 features showed good stability (intra-class correlation coefficient [ICC] ≥ 0.8), whereas only 7 features were of poor stability (ICC < 0.5). Most first order statistics and morphometric features showed moderate-to-high NDR (4 > NDR ≥1), while above 35% of the texture features showed poor NDR (< 1). Features were shown to cluster into only 5 groups, indicating that they were highly redundant.ConclusionThe use of semi-automated software tools provided sufficiently reliable tumor segmentation and feature stability; thus helping to overcome the inherent inter-rater and intra-rater variability of user intervention. However, certain aspects of feature quality, including NDR and redundancy, need to be assessed for determination of representative signature features before further development of radiomics
Solar and Stellar Photospheric Abundances
The determination of photospheric abundances in late-type stars from
spectroscopic observations is a well-established field, built on solid
theoretical foundations. Improving those foundations to refine the accuracy of
the inferred abundances has proven challenging, but progress has been made. In
parallel, developments on instrumentation, chiefly regarding multi-object
spectroscopy, have been spectacular, and a number of projects are collecting
large numbers of observations for stars across the Milky Way and nearby
galaxies, promising important advances in our understanding of galaxy formation
and evolution. After providing a brief description of the basic physics and
input data involved in the analysis of stellar spectra, a review is made of the
analysis steps, and the available tools to cope with large observational
efforts. The paper closes with a quick overview of relevant ongoing and planned
spectroscopic surveys, and highlights of recent research on photospheric
abundances.Comment: Invited review to appear in Living Reviews in Solar Physics. 39
pages, 7 figure
RACS: Rapid Analysis of ChIP-Seq data for contig based genomes
Background: Chromatin immunoprecipitation coupled to next generation
sequencing (ChIP-Seq) is a widely used technique to investigate the function of
chromatin-related proteins in a genome-wide manner. ChIP-Seq generates large
quantities of data which can be difficult to process and analyse, particularly
for organisms with contig based genomes. Contig-based genomes often have poor
annotations for cis-elements, for example enhancers, that are important for
gene expression. Poorly annotated genomes make a comprehensive analysis of
ChIP-Seq data difficult and as such standardized analysis pipelines are
lacking. Methods: We report a computational pipeline that utilizes traditional
High-Performance Computing techniques and open source tools for processing and
analysing data obtained from ChIP-Seq. We applied our computational pipeline
"Rapid Analysis of ChIP-Seq data" (RACS) to ChIP-Seq data that was generated in
the model organism Tetrahymena thermophila, an example of an organism with a
genome that is available in contigs. Results: To test the performance and
efficiency of RACs, we performed control ChIP-Seq experiments allowing us to
rapidly eliminate false positives when analyzing our previously published data
set. Our pipeline segregates the found read accumulations between genic and
intergenic regions and is highly efficient for rapid downstream analyses.
Conclusions: Altogether, the computational pipeline presented in this report is
an efficient and highly reliable tool to analyze genome-wide ChIP-Seq data
generated in model organisms with contig-based genomes.
RACS is an open source computational pipeline available to download from:
https://bitbucket.org/mjponce/racs --or--
https://gitrepos.scinet.utoronto.ca/public/?a=summary&p=RACSComment: Submitted to BMC Bioinformatics. Computational pipeline available at
https://bitbucket.org/mjponce/rac
EMMIXcskew: an R Package for the Fitting of a Mixture of Canonical Fundamental Skew t-Distributions
This paper presents an R package EMMIXcskew for the fitting of the canonical
fundamental skew t-distribution (CFUST) and finite mixtures of this
distribution (FM-CFUST) via maximum likelihood (ML). The CFUST distribution
provides a flexible family of models to handle non-normal data, with parameters
for capturing skewness and heavy-tails in the data. It formally encompasses the
normal, t, and skew-normal distributions as special and/or limiting cases. A
few other versions of the skew t-distributions are also nested within the CFUST
distribution. In this paper, an Expectation-Maximization (EM) algorithm is
described for computing the ML estimates of the parameters of the FM-CFUST
model, and different strategies for initializing the algorithm are discussed
and illustrated. The methodology is implemented in the EMMIXcskew package, and
examples are presented using two real datasets. The EMMIXcskew package contains
functions to fit the FM-CFUST model, including procedures for generating
different initial values. Additional features include random sample generation
and contour visualization in 2D and 3D
- …