119,038 research outputs found

    On selecting interacting features from high-dimensional data

    Get PDF
    For high-dimensional data, most feature-selection methods, such as SIS and the lasso, involve ranking and selecting features individually. These methods do not require many computational resources, but they ignore feature interactions. A simple recursive approach, which, without requiring many more computational resources, also allows identification of interactions, is investigated. This approach can lead to substantial improvements in the performance of classifiers, and can provide insight into the way in which features work together in a given population. It also enjoys attractive statistical properties

    The Naming Game in Social Networks: Community Formation and Consensus Engineering

    Full text link
    We study the dynamics of the Naming Game [Baronchelli et al., (2006) J. Stat. Mech.: Theory Exp. P06014] in empirical social networks. This stylized agent-based model captures essential features of agreement dynamics in a network of autonomous agents, corresponding to the development of shared classification schemes in a network of artificial agents or opinion spreading and social dynamics in social networks. Our study focuses on the impact that communities in the underlying social graphs have on the outcome of the agreement process. We find that networks with strong community structure hinder the system from reaching global agreement; the evolution of the Naming Game in these networks maintains clusters of coexisting opinions indefinitely. Further, we investigate agent-based network strategies to facilitate convergence to global consensus.Comment: The original publication is available at http://www.springerlink.com/content/70370l311m1u0ng3

    Cancer3D: understanding cancer mutations through protein structures.

    Get PDF
    The new era of cancer genomics is providing us with extensive knowledge of mutations and other alterations in cancer. The Cancer3D database at http://www.cancer3d.org gives an open and user-friendly way to analyze cancer missense mutations in the context of structures of proteins in which they are found. The database also helps users analyze the distribution patterns of the mutations as well as their relationship to changes in drug activity through two algorithms: e-Driver and e-Drug. These algorithms use knowledge of modular structure of genes and proteins to separately study each region. This approach allows users to find novel candidate driver regions or drug biomarkers that cannot be found when similar analyses are done on the whole-gene level. The Cancer3D database provides access to the results of such analyses based on data from The Cancer Genome Atlas (TCGA) and the Cancer Cell Line Encyclopedia (CCLE). In addition, it displays mutations from over 14,700 proteins mapped to more than 24,300 structures from PDB. This helps users visualize the distribution of mutations and identify novel three-dimensional patterns in their distribution

    An Adaptive Interacting Wang-Landau Algorithm for Automatic Density Exploration

    Full text link
    While statisticians are well-accustomed to performing exploratory analysis in the modeling stage of an analysis, the notion of conducting preliminary general-purpose exploratory analysis in the Monte Carlo stage (or more generally, the model-fitting stage) of an analysis is an area which we feel deserves much further attention. Towards this aim, this paper proposes a general-purpose algorithm for automatic density exploration. The proposed exploration algorithm combines and expands upon components from various adaptive Markov chain Monte Carlo methods, with the Wang-Landau algorithm at its heart. Additionally, the algorithm is run on interacting parallel chains -- a feature which both decreases computational cost as well as stabilizes the algorithm, improving its ability to explore the density. Performance is studied in several applications. Through a Bayesian variable selection example, the authors demonstrate the convergence gains obtained with interacting chains. The ability of the algorithm's adaptive proposal to induce mode-jumping is illustrated through a trimodal density and a Bayesian mixture modeling application. Lastly, through a 2D Ising model, the authors demonstrate the ability of the algorithm to overcome the high correlations encountered in spatial models.Comment: 33 pages, 20 figures (the supplementary materials are included as appendices

    Persistent Homology Guided Force-Directed Graph Layouts

    Full text link
    Graphs are commonly used to encode relationships among entities, yet their abstractness makes them difficult to analyze. Node-link diagrams are popular for drawing graphs, and force-directed layouts provide a flexible method for node arrangements that use local relationships in an attempt to reveal the global shape of the graph. However, clutter and overlap of unrelated structures can lead to confusing graph visualizations. This paper leverages the persistent homology features of an undirected graph as derived information for interactive manipulation of force-directed layouts. We first discuss how to efficiently extract 0-dimensional persistent homology features from both weighted and unweighted undirected graphs. We then introduce the interactive persistence barcode used to manipulate the force-directed graph layout. In particular, the user adds and removes contracting and repulsing forces generated by the persistent homology features, eventually selecting the set of persistent homology features that most improve the layout. Finally, we demonstrate the utility of our approach across a variety of synthetic and real datasets
    • …
    corecore