320 research outputs found
Random forest for gene selection and microarray data classification
A random forest method has been selected to perform both gene selection and classification of the microarray data. In this
embedded method, the selection of smallest possible sets of genes with lowest error rates is the key factor in achieving highest
classification accuracy. Hence, improved gene selection method using random forest has been proposed to obtain the smallest
subset of genes as well as biggest subset of genes prior to classification. The option for biggest subset selection is done to assist
researchers who intend to use the informative genes for further research. Enhanced random forest gene selection has performed
better in terms of selecting the smallest subset as well as biggest subset of informative genes with lowest out of bag error rates
through gene selection. Furthermore, the classification performed on the selected subset of genes using random forest has lead to
lower prediction error rates compared to existing method and other similar available methods
Physical Orbit for Lambda Virginis and a Test of Stellar Evolution Models
Lambda Virginis (LamVir) is a well-known double-lined spectroscopic Am binary
with the interesting property that both stars are very similar in abundance but
one is sharp-lined and the other is broad-lined. We present combined
interferometric and spectroscopic studies of LamVir. The small scale of the
LamVir orbit (~20 mas) is well resolved by the Infrared Optical Telescope Array
(IOTA), allowing us to determine its elements as well as the physical
properties of the components to high accuracy. The masses of the two stars are
determined to be 1.897 Msun and 1.721 Msun, with 0.7% and 1.5% errors
respectively, and the two stars are found to have the same temperature of 8280
+/- 200 K. The accurately determined properties of LamVir allow comparisons
between observations and current stellar evolution models, and reasonable
matches are found. The best-fit stellar model gives LamVir a subsolar
metallicity of Z=0.0097, and an age of 935 Myr. The orbital and physical
parameters of LamVir also allow us to study its tidal evolution time scales and
status. Although currently atomic diffusion is considered to be the most
plausible cause of the Am phenomenon, the issue is still being actively debated
in the literature. With the present study of the properties and evolutionary
status of LamVir, this system is an ideal candidate for further detailed
abundance analyses that might shed more light on the source of the chemical
anomalies in these A stars.Comment: 43 Pages, 13 figures. Accepted for publication in Ap
The photometric-amplitude and mass-ratio distributions of contact binary stars
The distribution of the light-variation amplitudes, A(a), in addition to
determining the number of undiscovered contact binary systems falling below
photometric detection thresholds and thus lost to statistics, can serve as a
tool in determination of the mass-ratio distribution, Q(q), which is very
important for understanding of the evolution of contact binaries. Calculations
of the expected A(a) show that it tends to converge to a mass-ratio dependent
constant value for a->0. Strong dependence of A(a) on Q(q) can be used to
determine the latter distribution, but the technique is limited by the presence
of unresolved visual companions and by blending in crowded areas of the sky.
The bright-star sample to 7.5 magnitude is too small for an application of the
technique while the the Baade's Window sample from the OGLE project may suffer
stronger blending; thus the present results are preliminary and illustrative
only. Estimates based on the Baade's Window data from the OGLE project, for
amplitudes a>0.3 mag. where the statistics appear to be complete allowing
determination of Q(q) over 0.12<q<1, suggest a steep increase of Q(q) with
q->0. The mass-ratio distribution can be approximated by a power law, either
Q(q)~(1-q)^a1 with a1=6+/-2 or Q(q)~q^b1, with b1=-2+/-0.5, with a slight
preference for the former form. Both forms must be modified by the
theoretically expected cut-off caused by a tidal instability at about q_min
0.07-0.1. An expected maximum in Q(q), is expected to be mapped into a local
maximum in A(a) around 0.2-0.25 mag.Comment: AASTeX5, 12 figures, 5 tables, accepted by AJ, Aug.200
A Consistency Test of Spectroscopic Gravities for Late-Type Stars
Chemical analyses of late-type stars are usually carried out following the
classical recipe: LTE line formation and homogeneous, plane-parallel,
flux-constant, and LTE model atmospheres. We review different results in the
literature that have suggested significant inconsistencies in the spectroscopic
analyses, pointing out the difficulties in deriving independent estimates of
the stellar fundamental parameters and hence,detecting systematic errors.
The trigonometric parallaxes measured by the HIPPARCOS mission provide
accurate appraisals of the stellar surface gravity for nearby stars, which are
used here to check the gravities obtained from the photospheric iron ionization
balance. We find an approximate agreement for stars in the metallicity range -1
<= [Fe/H] <= 0, but the comparison shows that the differences between the
spectroscopic and trigonometric gravities decrease towards lower metallicities
for more metal-deficient dwarfs (-2.5 <= [Fe/H] <= -1.0), which casts a shadow
upon the abundance analyses for extreme metal-poor stars that make use of the
ionization equilibrium to constrain the gravity. The comparison with the
strong-line gravities derived by Edvardsson (1988) and Fuhrmann (1998a)
confirms that this method provides systematically larger gravities than the
ionization balance. The strong-line gravities get closer to the physical ones
for the stars analyzed by Fuhrmann, but they are even further away than the
iron ionization gravities for the stars of lower gravities in Edvardsson's
sample. The confrontation of the deviations of the iron ionization gravities in
metal-poor stars reported here with departures from the excitation balance
found in the literature, show that they are likely to be induced by the same
physical mechanism(s).Comment: AAS LaTeX v4.0, 35 pages, 10 PostScript files; to appear in The
Astrophysical Journa
Incorporating topological information for predicting robust cancer subnetwork markers in human protein-protein interaction network
BACKGROUND: Discovering robust markers for cancer prognosis based on gene expression data is an important yet challenging problem in translational bioinformatics. By integrating additional information in biological pathways or a protein-protein interaction (PPI) network, we can find better biomarkers that lead to more accurate and reproducible prognostic predictions. In fact, recent studies have shown that, “modular markers,” that integrate multiple genes with potential interactions can improve disease classification and also provide better understanding of the disease mechanisms. RESULTS: In this work, we propose a novel algorithm for finding robust and effective subnetwork markers that can accurately predict cancer prognosis. To simultaneously discover multiple synergistic subnetwork markers in a human PPI network, we build on our previous work that uses affinity propagation, an efficient clustering algorithm based on a message-passing scheme. Using affinity propagation, we identify potential subnetwork markers that consist of discriminative genes that display coherent expression patterns and whose protein products are closely located on the PPI network. Furthermore, we incorporate the topological information from the PPI network to evaluate the potential of a given set of proteins to be involved in a functional module. Primarily, we adopt widely made assumptions that densely connected subnetworks may likely be potential functional modules and that proteins that are not directly connected but interact with similar sets of other proteins may share similar functionalities. CONCLUSIONS: Incorporating topological attributes based on these assumptions can enhance the prediction of potential subnetwork markers. We evaluate the performance of the proposed subnetwork marker identification method by performing classification experiments using multiple independent breast cancer gene expression datasets and PPI networks. We show that our method leads to the discovery of robust subnetwork markers that can improve cancer classification. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1224-1) contains supplementary material, which is available to authorized users
Validation of a novel numerical model to predict regionalized blood flow in the coronary arteries
Aims
Ischaemic heart disease results from insufficient coronary blood flow. Direct measurement of absolute flow (mL/min) is feasible, but has not entered routine clinical practice in most catheterization laboratories. Interventional cardiologists, therefore, rely on surrogate markers of flow. Recently, we described a computational fluid dynamics (CFD) method for predicting flow that differentiates inlet, side branch, and outlet flows during angiography. In the current study, we evaluate a new method that regionalizes flow along the length of the artery.
Methods and results
Three-dimensional coronary anatomy was reconstructed from angiograms from 20 patients with chronic coronary syndrome. All flows were computed using CFD by applying the pressure gradient to the reconstructed geometry. Side branch flow was modelled as a porous wall boundary. Side branch flow magnitude was based on morphometric scaling laws with two models: a homogeneous model with flow loss along the entire arterial length; and a regionalized model with flow proportional to local taper. Flow results were validated against invasive measurements of flow by continuous infusion thermodilution (Coroventis™, Abbott). Both methods quantified flow relative to the invasive measures: homogeneous (r 0.47, P 0.006; zero bias; 95% CI −168 to +168 mL/min); regionalized method (r 0.43, P 0.013; zero bias; 95% CI −175 to +175 mL/min).
Conclusion
During angiography and pressure wire assessment, coronary flow can now be regionalized and differentiated at the inlet, outlet, and side branches. The effect of epicardial disease on agreement suggests the model may be best targeted at cases with a stenosis close to side branches
Multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery
<p>Abstract</p> <p>Background</p> <p>Although high-throughput microarray based molecular diagnostic technologies show a great promise in cancer diagnosis, it is still far from a clinical application due to its low and instable sensitivities and specificities in cancer molecular pattern recognition. In fact, high-dimensional and heterogeneous tumor profiles challenge current machine learning methodologies for its small number of samples and large or even huge number of variables (genes). This naturally calls for the use of an effective feature selection in microarray data classification.</p> <p>Methods</p> <p>We propose a novel feature selection method: multi-resolution independent component analysis (MICA) for large-scale gene expression data. This method overcomes the weak points of the widely used transform-based feature selection methods such as principal component analysis (PCA), independent component analysis (ICA), and nonnegative matrix factorization (NMF) by avoiding their global feature-selection mechanism. In addition to demonstrating the effectiveness of the multi-resolution independent component analysis in meaningful biomarker discovery, we present a multi-resolution independent component analysis based support vector machines (MICA-SVM) and linear discriminant analysis (MICA-LDA) to attain high-performance classifications in low-dimensional spaces.</p> <p>Results</p> <p>We have demonstrated the superiority and stability of our algorithms by performing comprehensive experimental comparisons with nine state-of-the-art algorithms on six high-dimensional heterogeneous profiles under cross validations. Our classification algorithms, especially, MICA-SVM, not only accomplish clinical or near-clinical level sensitivities and specificities, but also show strong performance stability over its peers in classification. Software that implements the major algorithm and data sets on which this paper focuses are freely available at <url>https://sites.google.com/site/heyaumapbc2011/</url>.</p> <p>Conclusions</p> <p>This work suggests a new direction to accelerate microarray technologies into a clinical routine through building a high-performance classifier to attain clinical-level sensitivities and specificities by treating an input profile as a ‘profile-biomarker’. The multi-resolution data analysis based redundant global feature suppressing and effective local feature extraction also have a positive impact on large scale ‘omics’ data mining.</p
The actin-myosin regulatory MRCK kinases: regulation, biological functions and associations with human cancer
The contractile actin-myosin cytoskeleton provides much of the force required for numerous cellular activities such as motility, adhesion, cytokinesis and changes in morphology. Key elements that respond to various signal pathways are the myosin II regulatory light chains (MLC), which participate in actin-myosin contraction by modulating the ATPase activity and consequent contractile force generation mediated by myosin heavy chain heads. Considerable effort has focussed on the role of MLC kinases, and yet the contributions of the myotonic dystrophy-related Cdc42-binding kinases (MRCK) proteins in MLC phosphorylation and cytoskeleton regulation have not been well characterized. In contrast to the closely related ROCK1 and ROCK2 kinases that are regulated by the RhoA and RhoC GTPases, there is relatively little information about the CDC42-regulated MRCKα, MRCKβ and MRCKγ members of the AGC (PKA, PKG and PKC) kinase family. As well as differences in upstream activation pathways, MRCK and ROCK kinases apparently differ in the way that they spatially regulate MLC phosphorylation, which ultimately affects their influence on the organization and dynamics of the actin-myosin cytoskeleton. In this review, we will summarize the MRCK protein structures, expression patterns, small molecule inhibitors, biological functions and associations with human diseases such as cancer
Modeling precision treatment of breast cancer
Background: First-generation molecular profiles for human breast cancers have enabled the identification of features that can predict therapeutic response; however, little is known about how the various data types can best be combined to yield optimal predictors. Collections of breast cancer cell lines mirror many aspects of breast cancer molecular pathobiology, and measurements of their omic and biological therapeutic responses are well-suited for development of strategies to identify the most predictive molecular feature sets. Results: We used least squares-support vector machines and random forest algorithms to identify molecular features associated with responses of a collection of 70 breast cancer cell lines to 90 experimental or approved therapeutic agents. The datasets analyzed included measurements of copy number aberrations, mutations, gene and isoform expression, promoter methylation and protein expression. Transcriptional subtype contributed strongly to response predictors for 25% of compounds, and adding other molecular data types improved prediction for 65%. No single molecular dataset consistently out-performed the others, suggesting that therapeutic response is mediated at multiple levels in the genome. Response predictors were developed and applied to TCGA data, and were found to be present in subsets of those patient samples. Conclusions: These results suggest that matching patients to treatments based on transcriptional subtype will improve response rates, and inclusion of additional features from other profiling data types may provide additional benefit. Further, we suggest a systems biology strategy for guiding clinical trials so that patient cohorts most likely to respond to new therapies may be more efficiently identified
- …