4,450 research outputs found

    Structure of conflict graphs in constraint alignment problems and algorithms

    Get PDF
    We consider the constrained graph alignment problem which has applications in biological network analysis. Given two input graphs G1=(V1,E1),G2=(V2,E2)G_1=(V_1,E_1), G_2=(V_2,E_2), a pair of vertex mappings induces an {\it edge conservation} if the vertex pairs are adjacent in their respective graphs. %In general terms The goal is to provide a one-to-one mapping between the vertices of the input graphs in order to maximize edge conservation. However the allowed mappings are restricted since each vertex from V1V_1 (resp. V2V_2) is allowed to be mapped to at most m1m_1 (resp. m2m_2) specified vertices in V2V_2 (resp. V1V_1). Most of results in this paper deal with the case m2=1m_2=1 which attracted most attention in the related literature. We formulate the problem as a maximum independent set problem in a related {\em conflict graph} and investigate structural properties of this graph in terms of forbidden subgraphs. We are interested, in particular, in excluding certain wheals, fans, cliques or claws (all terms are defined in the paper), which corresponds in excluding certain cycles, paths, cliques or independent sets in the neighborhood of each vertex. Then, we investigate algorithmic consequences of some of these properties, which illustrates the potential of this approach and raises new horizons for further works. In particular this approach allows us to reinterpret a known polynomial case in terms of conflict graph and to improve known approximation and fixed-parameter tractability results through efficiently solving the maximum independent set problem in conflict graphs. Some of our new approximation results involve approximation ratios that are function of the optimal value, in particular its square root; this kind of results cannot be achieved for maximum independent set in general graphs.Comment: 22 pages, 6 figure

    Hybrid modeling, HMM/NN architectures, and protein applications

    Get PDF
    We describe a hybrid modeling approach where the parameters of a model are calculated and modulated by another model, typically a neural network (NN), to avoid both overfitting and underfitting. We develop the approach for the case of Hidden Markov Models (HMMs), by deriving a class of hybrid HMM/NN architectures. These architectures can be trained with unified algorithms that blend HMM dynamic programming with NN backpropagation. In the case of complex data, mixtures of HMMs or modulated HMMs must be used. NNs can then be applied both to the parameters of each single HMM, and to the switching or modulation of the models, as a function of input or context. Hybrid HMM/NN architectures provide a flexible NN parameterization for the control of model structure and complexity. At the same time, they can capture distributions that, in practice, are inaccessible to single HMMs. The HMM/NN hybrid approach is tested, in its simplest form, by constructing a model of the immunoglobulin protein family. A hybrid model is trained, and a multiple alignment derived, with less than a fourth of the number of parameters used with previous single HMMs

    Solving Maximum Clique Problem for Protein Structure Similarity

    Get PDF
    A basic assumption of molecular biology is that proteins sharing close three-dimensional (3D) structures are likely to share a common function and in most cases derive from a same ancestor. Computing the similarity between two protein structures is therefore a crucial task and has been extensively investigated. Evaluating the similarity of two proteins can be done by finding an optimal one-to-one matching between their components, which is equivalent to identifying a maximum weighted clique in a specific "alignment graph". In this paper we present a new integer programming formulation for solving such clique problems. The model has been implemented using the ILOG CPLEX Callable Library. In addition, we designed a dedicated branch and bound algorithm for solving the maximum cardinality clique problem. Both approaches have been integrated in VAST (Vector Alignment Search Tool) - a software for aligning protein 3D structures largely used in NCBI (National Center for Biotechnology Information). The original VAST clique solver uses the well known Bron and Kerbosh algorithm (BK). Our computational results on real life protein alignment instances show that our branch and bound algorithm is up to 116 times faster than BK for the largest proteins

    Protein alignment HW/SW optimizations

    Get PDF
    Biosequence alignment recently received an amazing support from both commodity and dedicated hardware platforms. The limitless requirements of this application motivate the search for improved implementations to boost processing time and capabilities. We propose an unprecedented hardware improvement to the classic Smith-Waterman (S-W) algorithm based on a twofold approach: i) an on-the-fly gap-open/gap-extension selection that reduces the hardware implementation complexity; ii) a pre-selection filter that uses reduced amino-acid alphabets to screen out not-significant sequences and to shorten the S-Witerations on huge reference databases.We demonstrated the improvements w.r.t. a classic approach both from the point of view of algorithm efficiency and of HW performance (FPGA and ASIC post-synthesis analysis)
    • 

    corecore