1,507 research outputs found
Towards ultrahigh dimensional feature selection for big data
In this paper, we present a new adaptive feature scaling scheme for ultrahigh-dimensional feature selection on Big Data, and then reformulate it as a convex semi-infinite programming (SIP) problem. To address the SIP, we propose an eficient feature generating paradigm. Different from traditional gradient-based approaches that conduct optimization on all input features, the proposed paradigm iteratively activates a group of features, and solves a sequence of multiple kernel learning (MKL) subproblems. To further speed up the training, we propose to solve the MKL subproblems in their primal forms through a modified accelerated proximal gradient approach. Due to such optimization scheme, some eficient cache techniques are also developed. The feature generating paradigm is guaranteed to converge globally under mild conditions, and can achieve lower feature selection bias. Moreover, the proposed method can tackle two challenging tasks in feature selection: 1) group-based feature selection with complex structures, and 2) nonlinear feature selection with explicit feature mappings. Comprehensive experiments on a wide range of synthetic and real-world data sets of tens of million data points with O(1014) features demonstrate the competitive performance of the proposed method over state-of-the-art feature selection methods in terms of generalization performance and training eficiency. © 2014 Mingkui Tan, Ivor W. Tsang and Li Wang
Principal Graph and Structure Learning Based on Reversed Graph Embedding
© 2017 IEEE. Many scientific datasets are of high dimension, and the analysis usually requires retaining the most important structures of data. Principal curve is a widely used approach for this purpose. However, many existing methods work only for data with structures that are mathematically formulated by curves, which is quite restrictive for real applications. A few methods can overcome the above problem, but they either require complicated human-made rules for a specific task with lack of adaption flexibility to different tasks, or cannot obtain explicit structures of data. To address these issues, we develop a novel principal graph and structure learning framework that captures the local information of the underlying graph structure based on reversed graph embedding. As showcases, models that can learn a spanning tree or a weighted undirected ℓ1 graph are proposed, and a new learning algorithm is developed that learns a set of principal points and a graph structure from data, simultaneously. The new algorithm is simple with guaranteed convergence. We then extend the proposed framework to deal with large-scale data. Experimental results on various synthetic and six real world datasets show that the proposed method compares favorably with baselines and can uncover the underlying structure correctly
Increased entropy of signal transduction in the cancer metastasis phenotype
Studies into the statistical properties of biological networks have led to
important biological insights, such as the presence of hubs and hierarchical
modularity. There is also a growing interest in studying the statistical
properties of networks in the context of cancer genomics. However, relatively
little is known as to what network features differ between the cancer and
normal cell physiologies, or between different cancer cell phenotypes. Based on
the observation that frequent genomic alterations underlie a more aggressive
cancer phenotype, we asked if such an effect could be detectable as an increase
in the randomness of local gene expression patterns. Using a breast cancer gene
expression data set and a model network of protein interactions we derive
constrained weighted networks defined by a stochastic information flux matrix
reflecting expression correlations between interacting proteins. Based on this
stochastic matrix we propose and compute an entropy measure that quantifies the
degree of randomness in the local pattern of information flux around single
genes. By comparing the local entropies in the non-metastatic versus metastatic
breast cancer networks, we here show that breast cancers that metastasize are
characterised by a small yet significant increase in the degree of randomness
of local expression patterns. We validate this result in three additional
breast cancer expression data sets and demonstrate that local entropy better
characterises the metastatic phenotype than other non-entropy based measures.
We show that increases in entropy can be used to identify genes and signalling
pathways implicated in breast cancer metastasis. Further exploration of such
integrated cancer expression and protein interaction networks will therefore be
a fruitful endeavour.Comment: 5 figures, 2 Supplementary Figures and Table
A critical evaluation of network and pathway based classifiers for outcome prediction in breast cancer
Recently, several classifiers that combine primary tumor data, like gene
expression data, and secondary data sources, such as protein-protein
interaction networks, have been proposed for predicting outcome in breast
cancer. In these approaches, new composite features are typically constructed
by aggregating the expression levels of several genes. The secondary data
sources are employed to guide this aggregation. Although many studies claim
that these approaches improve classification performance over single gene
classifiers, the gain in performance is difficult to assess. This stems mainly
from the fact that different breast cancer data sets and validation procedures
are employed to assess the performance. Here we address these issues by
employing a large cohort of six breast cancer data sets as benchmark set and by
performing an unbiased evaluation of the classification accuracies of the
different approaches. Contrary to previous claims, we find that composite
feature classifiers do not outperform simple single gene classifiers. We
investigate the effect of (1) the number of selected features; (2) the specific
gene set from which features are selected; (3) the size of the training set and
(4) the heterogeneity of the data set on the performance of composite feature
and single gene classifiers. Strikingly, we find that randomization of
secondary data sources, which destroys all biological information in these
sources, does not result in a deterioration in performance of composite feature
classifiers. Finally, we show that when a proper correction for gene set size
is performed, the stability of single gene sets is similar to the stability of
composite feature sets. Based on these results there is currently no reason to
prefer prognostic classifiers based on composite features over single gene
classifiers for predicting outcome in breast cancer
Violations of local stochastic independence exaggerate scalability in Mokken scaling analysis of the Chinese Mandarin SF-36
Biofilter aquaponic system for nutrients removal from fresh market wastewater
Aquaponics is a significant wastewater treatment system which refers to the combination of conventional aquaculture (raising aquatic organism) with hydroponics (cultivating plants in water) in a symbiotic environment. This system has a high ability in removing nutrients compared to conventional methods because it is a natural and environmentally friendly system (aquaponics). The current chapter aimed to review the possible application of aquaponics system to treat fresh market wastewater with the intention to highlight the mechanism of phytoremediation occurs in aquaponic system. The literature revealed that aquaponic system was able to remove nutrients in terms of nitrogen and phosphorus
Structure and mechanism of human DNA polymerase η
The variant form of the human syndrome xeroderma pigmentosum (XPV) is caused by a deficiency in DNA polymerase eta (Pol eta), a DNA polymerase that enables replication through ultraviolet-induced pyrimidine dimers. Here we report high-resolution crystal structures of human Pol eta at four consecutive steps during DNA synthesis through cis-syn cyclobutane thymine dimers. Pol eta acts like a 'molecular splint' to stabilize damaged DNA in a normal B-form conformation. An enlarged active site accommodates the thymine dimer with excellent stereochemistry for two-metal ion catalysis. Two residues conserved among Pol eta orthologues form specific hydrogen bonds with the lesion and the incoming nucleotide to assist translesion synthesis. On the basis of the structures, eight Pol eta missense mutations causing XPV can be rationalized as undermining the molecular splint or perturbing the active-site alignment. The structures also provide an insight into the role of Pol eta in replicating through D loop and DNA fragile sites
An effective theory for jet propagation in dense QCD matter: jet broadening and medium-induced bremsstrahlung
Two effects, jet broadening and gluon bremsstrahlung induced by the
propagation of a highly energetic quark in dense QCD matter, are reconsidered
from effective theory point of view. We modify the standard Soft Collinear
Effective Theory (SCET) Lagrangian to include Glauber modes, which are needed
to implement the interactions between the medium and the collinear fields. We
derive the Feynman rules for this Lagrangian and show that it is invariant
under soft and collinear gauge transformations. We find that the newly
constructed theory SCET recovers exactly the general result for the
transverse momentum broadening of jets. In the limit where the radiated gluons
are significantly less energetic than the parent quark, we obtain a jet
energy-loss kernel identical to the one discussed in the reaction operator
approach to parton propagation in matter. In the framework of SCET we
present results for the fully-differential bremsstrahlung spectrum for both the
incoherent and the Landau-Pomeranchunk-Migdal suppressed regimes beyond the
soft-gluon approximation. Gauge invariance of the physics results is
demonstrated explicitly by performing the calculations in both the light-cone
and covariant gauges. We also show how the process-dependent
medium-induced radiative corrections factorize from the jet production cross
section on the example of the quark jets considered here.Comment: 52 pages, 15 pdf figures, as published in JHE
Effects of Redispersible Polymer Powder on Mechanical and Durability Properties of Preplaced Aggregate Concrete with Recycled Railway Ballast
The rapid-hardening method employing the injection of calcium sulfoaluminate (CSA) cement mortar into voids between preplaced ballast aggregates has recently emerged as a promising approach for the renovation of existing ballasted railway tracks to concrete tracks. This method typically involves the use of a redispersible polymer powder to enhance the durability of the resulting recycled aggregate concrete. However, the effects of the amount of polymer on the mechanical and durability properties of recycled ballast aggregate concrete were not clearly understood. In addition, the effects of the cleanness condition of ballast aggregates were never examined. This study aimed at investigating these two aspects through compression and flexure tests, shrinkage tests, freezing-thawing resistance tests, and optical microscopy. The results revealed that an increase in the amount of polymer generally decreased the compressive strength at the curing age of 28 days. However, the use of a higher polymer ratio enhanced the modulus of rupture, freezing-thawing resistance, and shrinkage resistance, likely because it improved the microstructure of the interfacial transition zones between recycled ballast aggregates and injected mortar. In addition, a higher cleanness level of ballast aggregates generally improved the mechanical and durability qualities of concrete
Prognostic gene network modules in breast cancer hold promise
A substantial proportion of lymph node-negative patients who receive adjuvant chemotherapy do not derive any benefit from this aggressive and potentially toxic treatment. However, standard histopathological indices cannot reliably detect patients at low risk of relapse or distant metastasis. In the past few years several prognostic gene expression signatures have been developed and shown to potentially outperform histopathological factors in identifying low-risk patients in specific breast cancer subgroups with predictive values of around 90%, and therefore hold promise for clinical application. We envisage that further improvements and insights may come from integrative expression pathway analyses that dissect prognostic signatures into modules related to cancer hallmarks
- …