1,507 research outputs found

    Towards ultrahigh dimensional feature selection for big data

    Full text link
    In this paper, we present a new adaptive feature scaling scheme for ultrahigh-dimensional feature selection on Big Data, and then reformulate it as a convex semi-infinite programming (SIP) problem. To address the SIP, we propose an eficient feature generating paradigm. Different from traditional gradient-based approaches that conduct optimization on all input features, the proposed paradigm iteratively activates a group of features, and solves a sequence of multiple kernel learning (MKL) subproblems. To further speed up the training, we propose to solve the MKL subproblems in their primal forms through a modified accelerated proximal gradient approach. Due to such optimization scheme, some eficient cache techniques are also developed. The feature generating paradigm is guaranteed to converge globally under mild conditions, and can achieve lower feature selection bias. Moreover, the proposed method can tackle two challenging tasks in feature selection: 1) group-based feature selection with complex structures, and 2) nonlinear feature selection with explicit feature mappings. Comprehensive experiments on a wide range of synthetic and real-world data sets of tens of million data points with O(1014) features demonstrate the competitive performance of the proposed method over state-of-the-art feature selection methods in terms of generalization performance and training eficiency. © 2014 Mingkui Tan, Ivor W. Tsang and Li Wang

    Principal Graph and Structure Learning Based on Reversed Graph Embedding

    Full text link
    © 2017 IEEE. Many scientific datasets are of high dimension, and the analysis usually requires retaining the most important structures of data. Principal curve is a widely used approach for this purpose. However, many existing methods work only for data with structures that are mathematically formulated by curves, which is quite restrictive for real applications. A few methods can overcome the above problem, but they either require complicated human-made rules for a specific task with lack of adaption flexibility to different tasks, or cannot obtain explicit structures of data. To address these issues, we develop a novel principal graph and structure learning framework that captures the local information of the underlying graph structure based on reversed graph embedding. As showcases, models that can learn a spanning tree or a weighted undirected ℓ1 graph are proposed, and a new learning algorithm is developed that learns a set of principal points and a graph structure from data, simultaneously. The new algorithm is simple with guaranteed convergence. We then extend the proposed framework to deal with large-scale data. Experimental results on various synthetic and six real world datasets show that the proposed method compares favorably with baselines and can uncover the underlying structure correctly

    Increased entropy of signal transduction in the cancer metastasis phenotype

    Get PDF
    Studies into the statistical properties of biological networks have led to important biological insights, such as the presence of hubs and hierarchical modularity. There is also a growing interest in studying the statistical properties of networks in the context of cancer genomics. However, relatively little is known as to what network features differ between the cancer and normal cell physiologies, or between different cancer cell phenotypes. Based on the observation that frequent genomic alterations underlie a more aggressive cancer phenotype, we asked if such an effect could be detectable as an increase in the randomness of local gene expression patterns. Using a breast cancer gene expression data set and a model network of protein interactions we derive constrained weighted networks defined by a stochastic information flux matrix reflecting expression correlations between interacting proteins. Based on this stochastic matrix we propose and compute an entropy measure that quantifies the degree of randomness in the local pattern of information flux around single genes. By comparing the local entropies in the non-metastatic versus metastatic breast cancer networks, we here show that breast cancers that metastasize are characterised by a small yet significant increase in the degree of randomness of local expression patterns. We validate this result in three additional breast cancer expression data sets and demonstrate that local entropy better characterises the metastatic phenotype than other non-entropy based measures. We show that increases in entropy can be used to identify genes and signalling pathways implicated in breast cancer metastasis. Further exploration of such integrated cancer expression and protein interaction networks will therefore be a fruitful endeavour.Comment: 5 figures, 2 Supplementary Figures and Table

    A critical evaluation of network and pathway based classifiers for outcome prediction in breast cancer

    Get PDF
    Recently, several classifiers that combine primary tumor data, like gene expression data, and secondary data sources, such as protein-protein interaction networks, have been proposed for predicting outcome in breast cancer. In these approaches, new composite features are typically constructed by aggregating the expression levels of several genes. The secondary data sources are employed to guide this aggregation. Although many studies claim that these approaches improve classification performance over single gene classifiers, the gain in performance is difficult to assess. This stems mainly from the fact that different breast cancer data sets and validation procedures are employed to assess the performance. Here we address these issues by employing a large cohort of six breast cancer data sets as benchmark set and by performing an unbiased evaluation of the classification accuracies of the different approaches. Contrary to previous claims, we find that composite feature classifiers do not outperform simple single gene classifiers. We investigate the effect of (1) the number of selected features; (2) the specific gene set from which features are selected; (3) the size of the training set and (4) the heterogeneity of the data set on the performance of composite feature and single gene classifiers. Strikingly, we find that randomization of secondary data sources, which destroys all biological information in these sources, does not result in a deterioration in performance of composite feature classifiers. Finally, we show that when a proper correction for gene set size is performed, the stability of single gene sets is similar to the stability of composite feature sets. Based on these results there is currently no reason to prefer prognostic classifiers based on composite features over single gene classifiers for predicting outcome in breast cancer

    Biofilter aquaponic system for nutrients removal from fresh market wastewater

    Get PDF
    Aquaponics is a significant wastewater treatment system which refers to the combination of conventional aquaculture (raising aquatic organism) with hydroponics (cultivating plants in water) in a symbiotic environment. This system has a high ability in removing nutrients compared to conventional methods because it is a natural and environmentally friendly system (aquaponics). The current chapter aimed to review the possible application of aquaponics system to treat fresh market wastewater with the intention to highlight the mechanism of phytoremediation occurs in aquaponic system. The literature revealed that aquaponic system was able to remove nutrients in terms of nitrogen and phosphorus

    Structure and mechanism of human DNA polymerase η

    Get PDF
    The variant form of the human syndrome xeroderma pigmentosum (XPV) is caused by a deficiency in DNA polymerase eta (Pol eta), a DNA polymerase that enables replication through ultraviolet-induced pyrimidine dimers. Here we report high-resolution crystal structures of human Pol eta at four consecutive steps during DNA synthesis through cis-syn cyclobutane thymine dimers. Pol eta acts like a 'molecular splint' to stabilize damaged DNA in a normal B-form conformation. An enlarged active site accommodates the thymine dimer with excellent stereochemistry for two-metal ion catalysis. Two residues conserved among Pol eta orthologues form specific hydrogen bonds with the lesion and the incoming nucleotide to assist translesion synthesis. On the basis of the structures, eight Pol eta missense mutations causing XPV can be rationalized as undermining the molecular splint or perturbing the active-site alignment. The structures also provide an insight into the role of Pol eta in replicating through D loop and DNA fragile sites

    An effective theory for jet propagation in dense QCD matter: jet broadening and medium-induced bremsstrahlung

    Full text link
    Two effects, jet broadening and gluon bremsstrahlung induced by the propagation of a highly energetic quark in dense QCD matter, are reconsidered from effective theory point of view. We modify the standard Soft Collinear Effective Theory (SCET) Lagrangian to include Glauber modes, which are needed to implement the interactions between the medium and the collinear fields. We derive the Feynman rules for this Lagrangian and show that it is invariant under soft and collinear gauge transformations. We find that the newly constructed theory SCETG_{\rm G} recovers exactly the general result for the transverse momentum broadening of jets. In the limit where the radiated gluons are significantly less energetic than the parent quark, we obtain a jet energy-loss kernel identical to the one discussed in the reaction operator approach to parton propagation in matter. In the framework of SCETG_{\rm G} we present results for the fully-differential bremsstrahlung spectrum for both the incoherent and the Landau-Pomeranchunk-Migdal suppressed regimes beyond the soft-gluon approximation. Gauge invariance of the physics results is demonstrated explicitly by performing the calculations in both the light-cone and covariant RξR_{\xi} gauges. We also show how the process-dependent medium-induced radiative corrections factorize from the jet production cross section on the example of the quark jets considered here.Comment: 52 pages, 15 pdf figures, as published in JHE

    Effects of Redispersible Polymer Powder on Mechanical and Durability Properties of Preplaced Aggregate Concrete with Recycled Railway Ballast

    Get PDF
    The rapid-hardening method employing the injection of calcium sulfoaluminate (CSA) cement mortar into voids between preplaced ballast aggregates has recently emerged as a promising approach for the renovation of existing ballasted railway tracks to concrete tracks. This method typically involves the use of a redispersible polymer powder to enhance the durability of the resulting recycled aggregate concrete. However, the effects of the amount of polymer on the mechanical and durability properties of recycled ballast aggregate concrete were not clearly understood. In addition, the effects of the cleanness condition of ballast aggregates were never examined. This study aimed at investigating these two aspects through compression and flexure tests, shrinkage tests, freezing-thawing resistance tests, and optical microscopy. The results revealed that an increase in the amount of polymer generally decreased the compressive strength at the curing age of 28 days. However, the use of a higher polymer ratio enhanced the modulus of rupture, freezing-thawing resistance, and shrinkage resistance, likely because it improved the microstructure of the interfacial transition zones between recycled ballast aggregates and injected mortar. In addition, a higher cleanness level of ballast aggregates generally improved the mechanical and durability qualities of concrete

    Prognostic gene network modules in breast cancer hold promise

    Get PDF
    A substantial proportion of lymph node-negative patients who receive adjuvant chemotherapy do not derive any benefit from this aggressive and potentially toxic treatment. However, standard histopathological indices cannot reliably detect patients at low risk of relapse or distant metastasis. In the past few years several prognostic gene expression signatures have been developed and shown to potentially outperform histopathological factors in identifying low-risk patients in specific breast cancer subgroups with predictive values of around 90%, and therefore hold promise for clinical application. We envisage that further improvements and insights may come from integrative expression pathway analyses that dissect prognostic signatures into modules related to cancer hallmarks
    corecore