12 research outputs found

    Parametric t-Distributed Stochastic Exemplar-centered Embedding

    Full text link
    Parametric embedding methods such as parametric t-SNE (pt-SNE) have been widely adopted for data visualization and out-of-sample data embedding without further computationally expensive optimization or approximation. However, the performance of pt-SNE is highly sensitive to the hyper-parameter batch size due to conflicting optimization goals, and often produces dramatically different embeddings with different choices of user-defined perplexities. To effectively solve these issues, we present parametric t-distributed stochastic exemplar-centered embedding methods. Our strategy learns embedding parameters by comparing given data only with precomputed exemplars, resulting in a cost function with linear computational and memory complexity, which is further reduced by noise contrastive samples. Moreover, we propose a shallow embedding network with high-order feature interactions for data visualization, which is much easier to tune but produces comparable performance in contrast to a deep neural network employed by pt-SNE. We empirically demonstrate, using several benchmark datasets, that our proposed methods significantly outperform pt-SNE in terms of robustness, visual effects, and quantitative evaluations.Comment: fixed typo

    A fast weak motif-finding algorithm based on community detection in graphs

    Get PDF
    BACKGROUND: Identification of transcription factor binding sites (also called ‘motif discovery’) in DNA sequences is a basic step in understanding genetic regulation. Although many successful programs have been developed, the problem is far from being solved on account of diversity in gene expression/regulation and the low specificity of binding sites. State-of-the-art algorithms have their own constraints (e.g., high time or space complexity for finding long motifs, low precision in identification of weak motifs, or the OOPS constraint: one occurrence of the motif instance per sequence) which limit their scope of application. RESULTS: In this paper, we present a novel and fast algorithm we call TFBSGroup. It is based on community detection from a graph and is used to discover long and weak (l,d) motifs under the ZOMOPS constraint (zero, one or multiple occurrence(s) of the motif instance(s) per sequence), where l is the length of a motif and d is the maximum number of mutations between a motif instance and the motif itself. Firstly, TFBSGroup transforms the (l, d) motif search in sequences to focus on the discovery of dense subgraphs within a graph. It identifies these subgraphs using a fast community detection method for obtaining coarse-grained candidate motifs. Next, it greedily refines these candidate motifs towards the true motif within their own communities. Empirical studies on synthetic (l, d) samples have shown that TFBSGroup is very efficient (e.g., it can find true (18, 6), (24, 8) motifs within 30 seconds). More importantly, the algorithm has succeeded in rapidly identifying motifs in a large data set of prokaryotic promoters generated from the Escherichia coli database RegulonDB. The algorithm has also accurately identified motifs in ChIP-seq data sets for 12 mouse transcription factors involved in ES cell pluripotency and self-renewal. CONCLUSIONS: Our novel heuristic algorithm, TFBSGroup, is able to quickly identify nearly exact matches for long and weak (l, d) motifs in DNA sequences under the ZOMOPS constraint. It is also capable of finding motifs in real applications. The source code for TFBSGroup can be obtained from http://bioinformatics.bioengr.uic.edu/TFBSGroup/

    A novel small molecule chaperone of rod opsin and its potential therapy for retinal degeneration

    No full text
    Rhodopsin homeostasis is tightly coupled to rod photoreceptor cell survival and vision. Mutations resulting in the misfolding of rhodopsin can lead to autosomal dominant retinitis pigmentosa (adRP), a progressive retinal degeneration that currently is untreatable. Using a cell-based high-throughput screen (HTS) to identify small molecules that can stabilize the P23H-opsin mutant, which causes most cases of adRP, we identified a novel pharmacological chaperone of rod photoreceptor opsin, YC-001. As a non-retinoid molecule, YC-001 demonstrates micromolar potency and efficacy greater than 9-cis-retinal with lower cytotoxicity. YC-001 binds to bovine rod opsin with an EC50 similar to 9-cis-retinal. The chaperone activity of YC-001 is evidenced by its ability to rescue the transport of multiple rod opsin mutants in mammalian cells. YC-001 is also an inverse agonist that non-competitively antagonizes rod opsin signaling. Significantly, a single dose of YC-001 protects Abca4 -/- Rdh8 -/- mice from bright light-induced retinal degeneration, suggesting its broad therapeutic potential
    corecore