320 research outputs found

    Random forest for gene selection and microarray data classification

    Get PDF
    A random forest method has been selected to perform both gene selection and classification of the microarray data. In this embedded method, the selection of smallest possible sets of genes with lowest error rates is the key factor in achieving highest classification accuracy. Hence, improved gene selection method using random forest has been proposed to obtain the smallest subset of genes as well as biggest subset of genes prior to classification. The option for biggest subset selection is done to assist researchers who intend to use the informative genes for further research. Enhanced random forest gene selection has performed better in terms of selecting the smallest subset as well as biggest subset of informative genes with lowest out of bag error rates through gene selection. Furthermore, the classification performed on the selected subset of genes using random forest has lead to lower prediction error rates compared to existing method and other similar available methods

    Physical Orbit for Lambda Virginis and a Test of Stellar Evolution Models

    Get PDF
    Lambda Virginis (LamVir) is a well-known double-lined spectroscopic Am binary with the interesting property that both stars are very similar in abundance but one is sharp-lined and the other is broad-lined. We present combined interferometric and spectroscopic studies of LamVir. The small scale of the LamVir orbit (~20 mas) is well resolved by the Infrared Optical Telescope Array (IOTA), allowing us to determine its elements as well as the physical properties of the components to high accuracy. The masses of the two stars are determined to be 1.897 Msun and 1.721 Msun, with 0.7% and 1.5% errors respectively, and the two stars are found to have the same temperature of 8280 +/- 200 K. The accurately determined properties of LamVir allow comparisons between observations and current stellar evolution models, and reasonable matches are found. The best-fit stellar model gives LamVir a subsolar metallicity of Z=0.0097, and an age of 935 Myr. The orbital and physical parameters of LamVir also allow us to study its tidal evolution time scales and status. Although currently atomic diffusion is considered to be the most plausible cause of the Am phenomenon, the issue is still being actively debated in the literature. With the present study of the properties and evolutionary status of LamVir, this system is an ideal candidate for further detailed abundance analyses that might shed more light on the source of the chemical anomalies in these A stars.Comment: 43 Pages, 13 figures. Accepted for publication in Ap

    The photometric-amplitude and mass-ratio distributions of contact binary stars

    Get PDF
    The distribution of the light-variation amplitudes, A(a), in addition to determining the number of undiscovered contact binary systems falling below photometric detection thresholds and thus lost to statistics, can serve as a tool in determination of the mass-ratio distribution, Q(q), which is very important for understanding of the evolution of contact binaries. Calculations of the expected A(a) show that it tends to converge to a mass-ratio dependent constant value for a->0. Strong dependence of A(a) on Q(q) can be used to determine the latter distribution, but the technique is limited by the presence of unresolved visual companions and by blending in crowded areas of the sky. The bright-star sample to 7.5 magnitude is too small for an application of the technique while the the Baade's Window sample from the OGLE project may suffer stronger blending; thus the present results are preliminary and illustrative only. Estimates based on the Baade's Window data from the OGLE project, for amplitudes a>0.3 mag. where the statistics appear to be complete allowing determination of Q(q) over 0.12<q<1, suggest a steep increase of Q(q) with q->0. The mass-ratio distribution can be approximated by a power law, either Q(q)~(1-q)^a1 with a1=6+/-2 or Q(q)~q^b1, with b1=-2+/-0.5, with a slight preference for the former form. Both forms must be modified by the theoretically expected cut-off caused by a tidal instability at about q_min 0.07-0.1. An expected maximum in Q(q), is expected to be mapped into a local maximum in A(a) around 0.2-0.25 mag.Comment: AASTeX5, 12 figures, 5 tables, accepted by AJ, Aug.200

    A Consistency Test of Spectroscopic Gravities for Late-Type Stars

    Get PDF
    Chemical analyses of late-type stars are usually carried out following the classical recipe: LTE line formation and homogeneous, plane-parallel, flux-constant, and LTE model atmospheres. We review different results in the literature that have suggested significant inconsistencies in the spectroscopic analyses, pointing out the difficulties in deriving independent estimates of the stellar fundamental parameters and hence,detecting systematic errors. The trigonometric parallaxes measured by the HIPPARCOS mission provide accurate appraisals of the stellar surface gravity for nearby stars, which are used here to check the gravities obtained from the photospheric iron ionization balance. We find an approximate agreement for stars in the metallicity range -1 <= [Fe/H] <= 0, but the comparison shows that the differences between the spectroscopic and trigonometric gravities decrease towards lower metallicities for more metal-deficient dwarfs (-2.5 <= [Fe/H] <= -1.0), which casts a shadow upon the abundance analyses for extreme metal-poor stars that make use of the ionization equilibrium to constrain the gravity. The comparison with the strong-line gravities derived by Edvardsson (1988) and Fuhrmann (1998a) confirms that this method provides systematically larger gravities than the ionization balance. The strong-line gravities get closer to the physical ones for the stars analyzed by Fuhrmann, but they are even further away than the iron ionization gravities for the stars of lower gravities in Edvardsson's sample. The confrontation of the deviations of the iron ionization gravities in metal-poor stars reported here with departures from the excitation balance found in the literature, show that they are likely to be induced by the same physical mechanism(s).Comment: AAS LaTeX v4.0, 35 pages, 10 PostScript files; to appear in The Astrophysical Journa

    Incorporating topological information for predicting robust cancer subnetwork markers in human protein-protein interaction network

    Get PDF
    BACKGROUND: Discovering robust markers for cancer prognosis based on gene expression data is an important yet challenging problem in translational bioinformatics. By integrating additional information in biological pathways or a protein-protein interaction (PPI) network, we can find better biomarkers that lead to more accurate and reproducible prognostic predictions. In fact, recent studies have shown that, “modular markers,” that integrate multiple genes with potential interactions can improve disease classification and also provide better understanding of the disease mechanisms. RESULTS: In this work, we propose a novel algorithm for finding robust and effective subnetwork markers that can accurately predict cancer prognosis. To simultaneously discover multiple synergistic subnetwork markers in a human PPI network, we build on our previous work that uses affinity propagation, an efficient clustering algorithm based on a message-passing scheme. Using affinity propagation, we identify potential subnetwork markers that consist of discriminative genes that display coherent expression patterns and whose protein products are closely located on the PPI network. Furthermore, we incorporate the topological information from the PPI network to evaluate the potential of a given set of proteins to be involved in a functional module. Primarily, we adopt widely made assumptions that densely connected subnetworks may likely be potential functional modules and that proteins that are not directly connected but interact with similar sets of other proteins may share similar functionalities. CONCLUSIONS: Incorporating topological attributes based on these assumptions can enhance the prediction of potential subnetwork markers. We evaluate the performance of the proposed subnetwork marker identification method by performing classification experiments using multiple independent breast cancer gene expression datasets and PPI networks. We show that our method leads to the discovery of robust subnetwork markers that can improve cancer classification. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1224-1) contains supplementary material, which is available to authorized users

    Validation of a novel numerical model to predict regionalized blood flow in the coronary arteries

    Get PDF
    Aims Ischaemic heart disease results from insufficient coronary blood flow. Direct measurement of absolute flow (mL/min) is feasible, but has not entered routine clinical practice in most catheterization laboratories. Interventional cardiologists, therefore, rely on surrogate markers of flow. Recently, we described a computational fluid dynamics (CFD) method for predicting flow that differentiates inlet, side branch, and outlet flows during angiography. In the current study, we evaluate a new method that regionalizes flow along the length of the artery. Methods and results Three-dimensional coronary anatomy was reconstructed from angiograms from 20 patients with chronic coronary syndrome. All flows were computed using CFD by applying the pressure gradient to the reconstructed geometry. Side branch flow was modelled as a porous wall boundary. Side branch flow magnitude was based on morphometric scaling laws with two models: a homogeneous model with flow loss along the entire arterial length; and a regionalized model with flow proportional to local taper. Flow results were validated against invasive measurements of flow by continuous infusion thermodilution (Coroventis™, Abbott). Both methods quantified flow relative to the invasive measures: homogeneous (r 0.47, P 0.006; zero bias; 95% CI −168 to +168 mL/min); regionalized method (r 0.43, P 0.013; zero bias; 95% CI −175 to +175 mL/min). Conclusion During angiography and pressure wire assessment, coronary flow can now be regionalized and differentiated at the inlet, outlet, and side branches. The effect of epicardial disease on agreement suggests the model may be best targeted at cases with a stenosis close to side branches

    Multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Although high-throughput microarray based molecular diagnostic technologies show a great promise in cancer diagnosis, it is still far from a clinical application due to its low and instable sensitivities and specificities in cancer molecular pattern recognition. In fact, high-dimensional and heterogeneous tumor profiles challenge current machine learning methodologies for its small number of samples and large or even huge number of variables (genes). This naturally calls for the use of an effective feature selection in microarray data classification.</p> <p>Methods</p> <p>We propose a novel feature selection method: multi-resolution independent component analysis (MICA) for large-scale gene expression data. This method overcomes the weak points of the widely used transform-based feature selection methods such as principal component analysis (PCA), independent component analysis (ICA), and nonnegative matrix factorization (NMF) by avoiding their global feature-selection mechanism. In addition to demonstrating the effectiveness of the multi-resolution independent component analysis in meaningful biomarker discovery, we present a multi-resolution independent component analysis based support vector machines (MICA-SVM) and linear discriminant analysis (MICA-LDA) to attain high-performance classifications in low-dimensional spaces.</p> <p>Results</p> <p>We have demonstrated the superiority and stability of our algorithms by performing comprehensive experimental comparisons with nine state-of-the-art algorithms on six high-dimensional heterogeneous profiles under cross validations. Our classification algorithms, especially, MICA-SVM, not only accomplish clinical or near-clinical level sensitivities and specificities, but also show strong performance stability over its peers in classification. Software that implements the major algorithm and data sets on which this paper focuses are freely available at <url>https://sites.google.com/site/heyaumapbc2011/</url>.</p> <p>Conclusions</p> <p>This work suggests a new direction to accelerate microarray technologies into a clinical routine through building a high-performance classifier to attain clinical-level sensitivities and specificities by treating an input profile as a ‘profile-biomarker’. The multi-resolution data analysis based redundant global feature suppressing and effective local feature extraction also have a positive impact on large scale ‘omics’ data mining.</p

    The actin-myosin regulatory MRCK kinases: regulation, biological functions and associations with human cancer

    Get PDF
    The contractile actin-myosin cytoskeleton provides much of the force required for numerous cellular activities such as motility, adhesion, cytokinesis and changes in morphology. Key elements that respond to various signal pathways are the myosin II regulatory light chains (MLC), which participate in actin-myosin contraction by modulating the ATPase activity and consequent contractile force generation mediated by myosin heavy chain heads. Considerable effort has focussed on the role of MLC kinases, and yet the contributions of the myotonic dystrophy-related Cdc42-binding kinases (MRCK) proteins in MLC phosphorylation and cytoskeleton regulation have not been well characterized. In contrast to the closely related ROCK1 and ROCK2 kinases that are regulated by the RhoA and RhoC GTPases, there is relatively little information about the CDC42-regulated MRCKα, MRCKβ and MRCKγ members of the AGC (PKA, PKG and PKC) kinase family. As well as differences in upstream activation pathways, MRCK and ROCK kinases apparently differ in the way that they spatially regulate MLC phosphorylation, which ultimately affects their influence on the organization and dynamics of the actin-myosin cytoskeleton. In this review, we will summarize the MRCK protein structures, expression patterns, small molecule inhibitors, biological functions and associations with human diseases such as cancer

    Modeling precision treatment of breast cancer

    Get PDF
    Background: First-generation molecular profiles for human breast cancers have enabled the identification of features that can predict therapeutic response; however, little is known about how the various data types can best be combined to yield optimal predictors. Collections of breast cancer cell lines mirror many aspects of breast cancer molecular pathobiology, and measurements of their omic and biological therapeutic responses are well-suited for development of strategies to identify the most predictive molecular feature sets. Results: We used least squares-support vector machines and random forest algorithms to identify molecular features associated with responses of a collection of 70 breast cancer cell lines to 90 experimental or approved therapeutic agents. The datasets analyzed included measurements of copy number aberrations, mutations, gene and isoform expression, promoter methylation and protein expression. Transcriptional subtype contributed strongly to response predictors for 25% of compounds, and adding other molecular data types improved prediction for 65%. No single molecular dataset consistently out-performed the others, suggesting that therapeutic response is mediated at multiple levels in the genome. Response predictors were developed and applied to TCGA data, and were found to be present in subsets of those patient samples. Conclusions: These results suggest that matching patients to treatments based on transcriptional subtype will improve response rates, and inclusion of additional features from other profiling data types may provide additional benefit. Further, we suggest a systems biology strategy for guiding clinical trials so that patient cohorts most likely to respond to new therapies may be more efficiently identified
    corecore