17 research outputs found

    Truncated Inference for Latent Variable Optimization Problems: Application to Robust Estimation and Learning

    Full text link
    Optimization problems with an auxiliary latent variable structure in addition to the main model parameters occur frequently in computer vision and machine learning. The additional latent variables make the underlying optimization task expensive, either in terms of memory (by maintaining the latent variables), or in terms of runtime (repeated exact inference of latent variables). We aim to remove the need to maintain the latent variables and propose two formally justified methods, that dynamically adapt the required accuracy of latent variable inference. These methods have applications in large scale robust estimation and in learning energy-based models from labeled data.Comment: 16 page

    Sparse non-negative matrix factorization for retrieving genomes across metagenomes

    No full text
    International audienceThe development of massively parallel sequencing technologies enables to sequence DNA at high-throughput and low cost, fueling the rise of metagenomics which is the study of complex microbial communities sequenced in their natural environment. A metagenomic dataset consists of billions of unordered small fragments of genomes (reads), originating from hundreds or thousands of different organisms. The de novo reconstruction of individual genomes from metagenomes is practically challenging, both because of the complexity of the problem (sequence assembly is NP-hard) and the large data volumes. The clustering of sequences into biologically meaningful partitions (e.g. strains), known as binning, is a key step with most computational tools performing read assembly as a pre-processing. However, metagenome assembly (and even more cross-assembly) is computationally intensive, requiring terabytes of memory; it is also error-prone (yielding artefacts like chimeric contigs) and discards vast amounts of information in the form of unassembled reads (up to 50% for highly diverse metagenomes). Here we show how online learning methods for sparse non-negative matrix factorization can recover relative abundances of genomes across multiple metagenomes and support assembly-free read binning by using abundance covariation signals derived from the occurrence of unique k-mers (subsequences of size k) across samples. The combinatorial explosion of k-mers is controlled by indexing them using locality sensitive hashing, and sparse coding and dictionary learning techniques are used to decompose the k-mer abundance covariation signal into genome-resolved components in latent space

    Majorization-minimization procedures and convergence of SQP methods for semi-algebraic and tame programs

    No full text
    International audienceIn view of solving nonsmooth and nonconvex problems involving complex constraints (like standard NLP problems), we study general maximization-minimization procedures produced by families of strongly convex sub-problems. Using techniques from semi-algebraic geometry and variational analysis -in particular Lojasiewicz inequality- we establish the convergence of sequences generated by this type of schemes to critical points. The broad applicability of this process is illustrated in the context of NLP. In that case critical points coincide with KKT points. When the data are semi-algebraic or real analytic our method applies (for instance) to the study of various SQP methods: the moving balls method, Sl1QP, ESQP. Under standard qualification conditions, this provides -to the best of our knowledge- the first general convergence results for general nonlinear programming problems. We emphasize the fact that, unlike most works on this subject, no second-order assumption and/or convexity assumptions whatsoever are made. Rate of convergence are shown to be of the same form as those commonly encountered with first order methods

    Truncated Inference for Latent Variable Optimization Problems: Application to Robust Estimation and Learning

    No full text
    Optimization problems with an auxiliary latent variable structure in addition to the main model parameters occur frequently in computer vision and machine learning. The additional latent variables make the underlying optimization task expensive, either in terms of memory (by maintaining the latent variables), or in terms of runtime (repeated exact inference of latent variables). We aim to remove the need to maintain the latent variables and propose two formally justified methods, that dynamically adapt the required accuracy of latent variable inference. These methods have applications in large scale robust estimation and in learning energy-based models from labeled data

    Pleomorphic spindle cell sarcoma (PSCS) formerly known as malignant fibrous histiocytoma (MFH): a complex malignant soft-tissue tumor

    No full text
    A presentation defining the nature, characteristics, causation, treatment and outcome of patients with lesions formerly known as malignant fibrous histiocytoma and now as pleomorphic spindle cell sarcoma is clearly a very difficult subject. Many authors do not believe that the tumor exists and instead describe them as forms of fibrosarcomas, fibromyxoid lesions, dedifferentiated chondrosarcomas or even leiomyosarcomas. The reasons for this confusion are presumably related to the fact that the malignant pleomorphic spindle cell sarcoma does not seem to be a distinct type of lesion with specific histologic and genetic characteristics. Instead, the tumor has at least four separate histologic variations and no specific gene signature and in fact does not seem to be either familial or ethnic in presentation. In view of the fact that the tumor was traditionally the most frequently encountered malignant soft-tissue neoplasm, the world of orthopedic oncology is clearly distressed by the problems that these patients have and is joined by the radiation oncologists and chemotherapists in seeking new solutions

    Soft tissue sarcomas with complex genomic profiles.

    Get PDF
    Soft tissue sarcomas (STS) with complex genomic profiles (50% of all STS) are predominantly composed of spindle cell/pleomorphic sarcomas, including leiomyosarcoma, myxofibrosarcoma, pleomorphic liposarcoma, pleomorphic rhabdomyosarcoma, malignant peripheral nerve sheath tumor, angiosarcoma, extraskeletal osteosarcoma, and spindle cell/pleomorphic unclassified sarcoma (previously called spindle cell/pleomorphic malignant fibrous histiocytoma). These neoplasms show, characteristically, gains and losses of numerous chromosomes or chromosome regions, as well as amplifications. Many of them share recurrent aberrations (e.g., gain of 5p13-p15) that seem to play a significant role in tumor progression and/or metastatic dissemination. In this paper, we review the cytogenetic, molecular genetic, and clinicopathologic characteristics of the most common STS displaying complex genomic profiles. Features of diagnostic or prognostic relevance will be discussed when needed
    corecore