67 research outputs found

    Towards a better solution to the shortest common supersequence problem: the deposition and reduction algorithm

    Get PDF
    BACKGROUND: The problem of finding a Shortest Common Supersequence (SCS) of a set of sequences is an important problem with applications in many areas. It is a key problem in biological sequences analysis. The SCS problem is well-known to be NP-complete. Many heuristic algorithms have been proposed. Some heuristics work well on a few long sequences (as in sequence comparison applications); others work well on many short sequences (as in oligo-array synthesis). Unfortunately, most do not work well on large SCS instances where there are many, long sequences. RESULTS: In this paper, we present a Deposition and Reduction (DR) algorithm for solving large SCS instances of biological sequences. There are two processes in our DR algorithm: deposition process, and reduction process. The deposition process is responsible for generating a small set of common supersequences; and the reduction process shortens these common supersequences by removing some characters while preserving the common supersequence property. Our evaluation on simulated data and real DNA and protein sequences show that our algorithm consistently produces the best results compared to many well-known heuristic algorithms, and especially on large instances. CONCLUSION: Our DR algorithm provides a partial answer to the open problem of designing efficient heuristic algorithm for SCS problem on many long sequences. Our algorithm has a bounded approximation ratio. The algorithm is efficient, both in running time and space complexity and our evaluation shows that it is practical even for SCS problems on many long sequences

    Oxidation of benzoin catalyzed by oxovanadium (IV) schiff base complexes

    Get PDF
    BACKGROUND: The oxidative transformation of benzoin to benzil has been accomplished by the use of a wide variety of reagents or catalysts and different reaction procedures. The conventional oxidizing agents yielded mainly benzaldehyde or/and benzoic acid and only a trace amount of benzil. The limits of practical utilization of these reagents involves the use of stoichiometric amounts of corrosive acids or toxic metallic reagents, which in turn produce undesirable waste materials and required high reaction temperatures. In recent years, vanadium complexes have attracted much attention for their potential utility as catalysts for various types of reactions. RESULTS: Active and selective catalytic systems of new unsymmetrical oxovanadium(IV) Schiff base complexes for the oxidation of benzoin is reported. The Schiff base ligands are derived between 2-aminoethanol and 2-hydroxy-1- naphthaldehyde (H2L1) or 3-ethoxy salicylaldehyde (H2L3); and 2-aminophenol and 3-ethoxysalicylaldehyde (H2L2) or 2-hydroxy-1-naphthaldehyde (H2L4). The unsymmetrical Schiff bases behave as tridentate dibasic ONO donor ligands. Reaction of these Schiff base ligands with oxovanadyl sulphate afforded the mononuclear oxovanadium(IV) complexes (VIVOLx.H2O), which are characterized by various physico-chemical techniques. The catalytic oxidation activities of these complexes for benzoin were evaluated using H2O2 as an oxidant. The best reaction conditions are obtained by considering the effect of solvent, reaction time and temperature. Under the optimized reaction conditions, VOL4 catalyst showed high conversion (>99%) with excellent selectivity to benzil (~100%) in a shorter reaction time compared to the other catalysts considered. CONCLUSION: Four tridentate ONO type Schiff base ligands were synthesized. Complexation of these ligands with vanadyl(IV) sulphate leads to the formation of new oxovanadium(IV) complexes of type VIVOL.H2O. Elemental analyses and spectral data of the free ligands and their oxovanadium(IV) complexes were found to be in good agreement with their structures, indicating high purity of all the compounds. Oxovanadium complexes were screened for the oxidation of benzoin to benzil using H2O2 as oxidant. The effect of time, solvent and temperature were optimized to obtain maximum yield. The catalytic activity results demonstrate that these catalytic systems are both highly active and selective for the oxidation of benzoin under mild reaction conditions.Web of Scienc

    Clustered ChIP-Seq-defined transcription factor binding sites and histone modifications map distinct classes of regulatory elements

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Transcription factor binding to DNA requires both an appropriate binding element and suitably open chromatin, which together help to define regulatory elements within the genome. Current methods of identifying regulatory elements, such as promoters or enhancers, typically rely on sequence conservation, existing gene annotations or specific marks, such as histone modifications and p300 binding methods, each of which has its own biases.</p> <p>Results</p> <p>Herein we show that an approach based on clustering of transcription factor peaks from high-throughput sequencing coupled with chromatin immunoprecipitation (Chip-Seq) can be used to evaluate markers for regulatory elements. We used 67 data sets for 54 unique transcription factors distributed over two cell lines to create regulatory element clusters. By integrating the clusters from our approach with histone modifications and data for open chromatin, we identified general methylation of lysine 4 on histone H3 (H3K4me) as the most specific marker for transcription factor clusters. Clusters mapping to annotated genes showed distinct patterns in cluster composition related to gene expression and histone modifications. Clusters mapping to intergenic regions fall into two groups either directly involved in transcription, including miRNAs and long noncoding RNAs, or facilitating transcription by long-range interactions. The latter clusters were specifically enriched with H3K4me1, but less with acetylation of lysine 27 on histone 3 or p300 binding.</p> <p>Conclusion</p> <p>By integrating genomewide data of transcription factor binding and chromatin structure and using our data-driven approach, we pinpointed the chromatin marks that best explain transcription factor association with different regulatory elements. Our results also indicate that a modest selection of transcription factors may be sufficient to map most regulatory elements in the human genome.</p

    A combinatorial optimization approach for diverse motif finding applications

    Get PDF
    BACKGROUND: Discovering approximately repeated patterns, or motifs, in biological sequences is an important and widely-studied problem in computational molecular biology. Most frequently, motif finding applications arise when identifying shared regulatory signals within DNA sequences or shared functional and structural elements within protein sequences. Due to the diversity of contexts in which motif finding is applied, several variations of the problem are commonly studied. RESULTS: We introduce a versatile combinatorial optimization framework for motif finding that couples graph pruning techniques with a novel integer linear programming formulation. Our approach is flexible and robust enough to model several variants of the motif finding problem, including those incorporating substitution matrices and phylogenetic distances. Additionally, we give an approach for determining statistical significance of uncovered motifs. In testing on numerous DNA and protein datasets, we demonstrate that our approach typically identifies statistically significant motifs corresponding to either known motifs or other motifs of high conservation. Moreover, in most cases, our approach finds provably optimal solutions to the underlying optimization problem. CONCLUSION: Our results demonstrate that a combined graph theoretic and mathematical programming approach can be the basis for effective and powerful techniques for diverse motif finding applications

    Transcriptional Enhancers in Protein-Coding Exons of Vertebrate Developmental Genes

    Get PDF
    Many conserved noncoding sequences function as transcriptional enhancers that regulate gene expression. Here, we report that protein-coding DNA also frequently contains enhancers functioning at the transcriptional level. We tested the enhancer activity of 31 protein-coding exons, which we chose based on strong sequence conservation between zebrafish and human, and occurrence in developmental genes, using a Tol2 transposable GFP reporter assay in zebrafish. For each exon we measured GFP expression in hundreds of embryos in 10 anatomies via a novel system that implements the voice-recognition capabilities of a cellular phone. We find that 24/31 (77%) exons drive GFP expression compared to a minimal promoter control, and 14/24 are anatomy-specific (expression in four anatomies or less). GFP expression driven by these coding enhancers frequently overlaps the anatomies where the host gene is expressed (60%), suggesting self-regulation. Highly conserved coding sequences and highly conserved noncoding sequences do not significantly differ in enhancer activity (coding: 24/31 vs. noncoding: 105/147) or tissue-specificity (coding: 14/24 vs. noncoding: 50/105). Furthermore, coding and noncoding enhancers display similar levels of the enhancer-related histone modification H3K4me1 (coding: 9/24 vs noncoding: 34/81). Meanwhile, coding enhancers are over three times as likely to contain an H3K4me1 mark as other exons of the host gene. Our work suggests that developmental transcriptional enhancers do not discriminate between coding and noncoding DNA and reveals widespread dual functions in protein-coding DNA

    Cross-species inference of long non-coding RNAs greatly expands the ruminant transcriptome

    Get PDF
    Additional file 3. This file contains all supplementary tables relating to lncRNA identification via the conservation of synteny. Table S3. lncRNAs inferred in one species by the genomic alignment of a transcript assembled with the RNA-seq libraries from a related spdecies. Table S12. Presence of intergenic lncRNAs both in sheep and cattle, in regions of conserved synteny. Table S13. Presence of intergenic lncRNAs both in sheep and goat, in regions of conserved synteny. Table S14. Presence of intergenic lncRNAs both in cattle and goat, in regions of conserved synteny. Table S15. Presence of intergenic lncRNAs both in sheep and humans, in regions of conserved synteny. Table S16. Presence of intergenic lncRNAs both in goat and humans, in regions of conserved synteny. Table S17. Presence of intergenic lncRNAs both in cattle and humans, in regions of conserved synteny. Table S18. High-confidence lncRNA pairs, those conserved across species both sequentially and positionally

    Preparing for low surface brightness science with the Vera C. Rubin Observatory: characterisation of tidal features from mock images

    Get PDF
    Tidal features in the outskirts of galaxies yield unique information about their past interactions and are a key prediction of the hierarchical structure formation paradigm. The Vera C. Rubin Observatory is poised to deliver deep observations for potentially of millions of objects with visible tidal features, but the inference of galaxy interaction histories from such features is not straightforward. Utilising automated techniques and human visual classification in conjunction with realistic mock images produced using the NEWHORIZON cosmological simulation, we investigate the nature, frequency and visibility of tidal features and debris across a range of environments and stellar masses. In our simulated sample, around 80 per cent of the flux in the tidal features around Milky Way or greater mass galaxies is detected at the 10-year depth of the Legacy Survey of Space and Time (30-31 mag / sq. arcsec), falling to 60 per cent assuming a shallower final depth of 29.5 mag / sq. arcsec. The fraction of total flux found in tidal features increases towards higher masses, rising to 10 per cent for the most massive objects in our sample (M*~10^{11.5} Msun). When observed at sufficient depth, such objects frequently exhibit many distinct tidal features with complex shapes. The interpretation and characterisation of such features varies significantly with image depth and object orientation, introducing significant biases in their classification. Assuming the data reduction pipeline is properly optimised, we expect the Rubin Observatory to be capable of recovering much of the flux found in the outskirts of Milky Way mass galaxies, even at intermediate redshifts (z<0.2)

    Targeting ion channels for cancer treatment : current progress and future challenges

    Get PDF

    Establishing an eLearning Platform in Clinical Neurosciences

    No full text
    HKU Li Ka Shing Faculty of Medicine Frontiers Series: ‘MOOCs in Postmodern Asia’ (27 Oct 2014) ‘Big Data and Precision Medicine’ (28 Oct 2014
    corecore