1,239 research outputs found

    PatMatch: a program for finding patterns in peptide and nucleotide sequences

    Get PDF
    Here, we present PatMatch, an efficient, web-based pattern-matching program that enables searches for short nucleotide or peptide sequences such as cis-elements in nucleotide sequences or small domains and motifs in protein sequences. The program can be used to find matches to a user-specified sequence pattern that can be described using ambiguous sequence codes and a powerful and flexible pattern syntax based on regular expressions. A recent upgrade has improved performance and now supports both mismatches and wildcards in a single pattern. This enhancement has been achieved by replacing the previous searching algorithm, scan_for_matches [D'Souza et al. (1997), Trends in Genetics, 13, 497–498], with nondeterministic-reverse grep (NR-grep), a general pattern matching tool that allows for approximate string matching [Navarro (2001), Software Practice and Experience, 31, 1265–1312]. We have tailored NR-grep to be used for DNA and protein searches with PatMatch. The stand-alone version of the software can be adapted for use with any sequence dataset and is available for download at The Arabidopsis Information Resource (TAIR) at . The PatMatch server is available on the web at for searching Arabidopsis thaliana sequences

    Personal genome editing algorithms to identify increased variant-induced off-target potential

    Get PDF
    Clustered regularly interspaced short palindromic repeats (CRISPR) technologies allow for facile genomic modification in a site-specific manner. A key step in this process is the in-silico design of single guide RNAs (sgRNAs) to efficiently and specifically target a site of interest. To this end, it is necessary to enumerate all potential off-target sites within a given genome that could be inadvertently altered by nuclease-mediated cleavage. Off-target sites are quasi-complementary regions of the genome in which the specified sgRNA can bind, even without a perfect complementary nucleotides sequence. This problem is known as off-target sites enumeration and became common after discovery of CRISPR technology. To solve this problem, many in-silico solutions were proposed in the last years but, currently available software for this task are limited by computational efficiency, variant support, genetic annotation, assessment of the functional impact of potential off-target effects at population and individual level, and a user-friendly graphical interface designed to be usable by non-informatician without any programming knowledge. This thesis addresses all these topics by proposing two software to directly answer the off-target enumeration problem and perform all the related analysis. In details, the thesis proposes CRISPRitz, a tool designed and developed to compute fast and exhaustive searches on reference and alternative genome to enumerate all the possible off-target for a user-defined set of sgRNAs with specific thresholds of mismatches (non-complementary bps in RNA-DNA binding) and bulges (bubbles that alters the physical structure of RNA and DNA limiting the binding activity). The thesis also proposes CRISPRme, a tool developed starting from CRISPRitz, which answers the requests of professionals and technicians to implement a comprehensive and easy to use interface to perform off-target enumeration, analysis and assessment, with graphical reports, a graphical interface and the capability of performing real-time query on the resulting data to extract desired targets, with a focus on individual and personalized genome analysis

    SNP-RFLPing 2: an updated and integrated PCR-RFLP tool for SNP genotyping

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>PCR-restriction fragment length polymorphism (RFLP) assay is a cost-effective method for SNP genotyping and mutation detection, but the manual mining for restriction enzyme sites is challenging and cumbersome. Three years after we constructed SNP-RFLPing, a freely accessible database and analysis tool for restriction enzyme mining of SNPs, significant improvements over the 2006 version have been made and incorporated into the latest version, SNP-RFLPing 2.</p> <p>Results</p> <p>The primary aim of SNP-RFLPing 2 is to provide comprehensive PCR-RFLP information with multiple functionality about SNPs, such as SNP retrieval to multiple species, different polymorphism types (bi-allelic, tri-allelic, tetra-allelic or indels), gene-centric searching, HapMap tagSNPs, gene ontology-based searching, miRNAs, and SNP500Cancer. The RFLP restriction enzymes and the corresponding PCR primers for the natural and mutagenic types of each SNP are simultaneously analyzed. All the RFLP restriction enzyme prices are also provided to aid selection. Furthermore, the previously encountered updating problems for most SNP related databases are resolved by an on-line retrieval system.</p> <p>Conclusions</p> <p>The user interfaces for functional SNP analyses have been substantially improved and integrated. SNP-RFLPing 2 offers a new and user-friendly interface for RFLP genotyping that can be used in association studies and is freely available at <url>http://bio.kuas.edu.tw/snp-rflping2</url>.</p

    Mutual Enrichment in Ranked Lists and the Statistical Assessment of Position Weight Matrix Motifs

    Get PDF
    Statistics in ranked lists is important in analyzing molecular biology measurement data, such as ChIP-seq, which yields ranked lists of genomic sequences. State of the art methods study fixed motifs in ranked lists. More flexible models such as position weight matrix (PWM) motifs are not addressed in this context. To assess the enrichment of a PWM motif in a ranked list we use a PWM induced second ranking on the same set of elements. Possible orders of one ranked list relative to the other are modeled by permutations. Due to sample space complexity, it is difficult to characterize tail distributions in the group of permutations. In this paper we develop tight upper bounds on tail distributions of the size of the intersection of the top of two uniformly and independently drawn permutations and demonstrate advantages of this approach using our software implementation, mmHG-Finder, to study PWMs in several datasets.Comment: Peer-reviewed and presented as part of the 13th Workshop on Algorithms in Bioinformatics (WABI2013

    SNP-RFLPing: restriction enzyme mining for SNPs in genomes

    Get PDF
    BACKGROUND: The restriction fragment length polymorphism (RFLP) is a common laboratory method for the genotyping of single nucleotide polymorphisms (SNPs). Here, we describe a web-based software, named SNP-RFLPing, which provides the restriction enzyme for RFLP assays on a batch of SNPs and genes from the human, rat, and mouse genomes. RESULTS: Three user-friendly inputs are included: 1) NCBI dbSNP "rs" or "ss" IDs; 2) NCBI Entrez gene ID and HUGO gene name; 3) any formats of SNP-in-sequence, are allowed to perform the SNP-RFLPing assay. These inputs are auto-programmed to SNP-containing sequences and their complementary sequences for the selection of restriction enzymes. All SNPs with available RFLP restriction enzymes of each input genes are provided even if many SNPs exist. The SNP-RFLPing analysis provides the SNP contig position, heterozygosity, function, protein residue, and amino acid position for cSNPs, as well as commercial and non-commercial restriction enzymes. CONCLUSION: This web-based software solves the input format problems in similar softwares and greatly simplifies the procedure for providing the RFLP enzyme. Mixed free forms of input data are friendly to users who perform the SNP-RFLPing assay. SNP-RFLPing offers a time-saving application for association studies in personalized medicine and is freely available at

    Mixed Sequence Reader: A Program for Analyzing DNA Sequences with Heterozygous Base Calling

    Get PDF
    The direct sequencing of PCR products generates heterozygous base-calling fluorescence chromatograms that are useful for identifying single-nucleotide polymorphisms (SNPs), insertion-deletions (indels), short tandem repeats (STRs), and paralogous genes. Indels and STRs can be easily detected using the currently available Indelligent or ShiftDetector programs, which do not search reference sequences. However, the detection of other genomic variants remains a challenge due to the lack of appropriate tools for heterozygous base-calling fluorescence chromatogram data analysis. In this study, we developed a free web-based program, Mixed Sequence Reader (MSR), which can directly analyze heterozygous base-calling fluorescence chromatogram data in .abi file format using comparisons with reference sequences. The heterozygous sequences are identified as two distinct sequences and aligned with reference sequences. Our results showed that MSR may be used to (i) physically locate indel and STR sequences and determine STR copy number by searching NCBI reference sequences; (ii) predict combinations of microsatellite patterns using the Federal Bureau of Investigation Combined DNA Index System (CODIS); (iii) determine human papilloma virus (HPV) genotypes by searching current viral databases in cases of double infections; (iv) estimate the copy number of paralogous genes, such as β-defensin 4 (DEFB4) and its paralog HSPDP3

    PubDNA Finder in a Nutshell. Searching the Life Sciences Literature with Sequences of Nucleic Acids

    Get PDF
    Biomedical researchers and clinicians working with molecular technologies in routine clinical practice often need to review the available literature to gather information regarding specific sequences of nucleic acids. This includes, for instance, finding articles related to a concrete DNA sequence, or identifying empirically-validated primer/probe sequences to evaluate the presence of different micro-organisms. Unfortunately, these hard and time-consuming tasks often need to be manually performed by researchers themselves since no publicly available biomedical literature search engine, e.g. PubMed, PubMed Central (PMC), etc., provides the required search functionalities. In this article, we describe PubDNA Finder, a web service that enables users to perform advanced searches on PubMed Central-indexed full text articles with sequences of nucleic acid

    SNPmasker: automatic masking of SNPs and repeats across eukaryotic genomes

    Get PDF
    SNPmasker is a comprehensive web interface for masking large eukaryotic genomes. The program is designed to mask SNPs from recent dbSNP database and to mask the repeats with two alternative programs. In addition to the SNP masking, we also offer population-specific substitution of SNP alleles in genomic sequence according to SNP frequencies in HapMap Phase II data. The input to SNPmasker can be defined in chromosomal coordinates or inserted as a sequence. The sequences masked by our web server are most useful as a preliminary step for different primer and probe design tasks. The service is available at and is free for all users
    corecore