3,336 research outputs found

    Repairing Syntax Errors in LR Parsers

    Get PDF
    This article reports on an error-repair algorithm for LR parsers. It locally inserts, deletes or shifts symbols at the positions where errors are detected, thus modifying the right context in order to resume parsing on a valid piece of input. This method improves on others in that it does not require the user to provide additional information about the repair process, it does not require precalculation of auxiliary tables, and it can be easily integrated into existing LR parser generators. A Yacc-based implementation is presented along with some experimental results and comparisons with other well-known methods.Comisión Interministerial de Ciencia y Tecnología TIC 2000–1106–C02–0

    Automatic error recovery for LR parsers in theory and practice

    Get PDF
    This thesis argues the need for good syntax error handling schemes in language translation systems such as compilers, and for the automatic incorporation of such schemes into parser-generators. Syntax errors are studied in a theoretical framework and practical methods for handling syntax errors are presented. The theoretical framework consists of a model for syntax errors based on the concept of a minimum prefix-defined error correction,a sentence obtainable from an erroneous string by performing edit operations at prefix-defined (parser defined) errors. It is shown that for an arbitrary context-free language, it is undecidable whether a better than arbitrary choice of edit operations can be made at a prefix-defined error. For common programming languages,it is shown that minimum-distance errors and prefix-defined errors do not necessarily coincide, and that there exists an infinite number of programs that differ in a single symbol only; sets of equivalent insertions are exhibited. Two methods for syntax error recovery are, presented. The methods are language independent and suitable for automatic generation. The first method consists of two stages, local repair followed if necessary by phrase-level repair. The second method consists of a single stage in which a locally minimum-distance repair is computed. Both methods are developed for use in the practical LR parser-generator yacc, requiring no additional specifications from the user. A scheme for the automatic generation of diagnostic messages in terms of the source input is presented. Performance of the methods in practice is evaluated using a formal method based on minimum-distance and prefix-defined error correction. The methods compare favourably with existing methods for error recovery

    Synchronization Strings: List Decoding for Insertions and Deletions

    Get PDF
    We study codes that are list-decodable under insertions and deletions ("insdel codes"). Specifically, we consider the setting where, given a codeword x of length n over some finite alphabet Sigma of size q, delta * n codeword symbols may be adversarially deleted and gamma * n symbols may be adversarially inserted to yield a corrupted word w. A code is said to be list-decodable if there is an (efficient) algorithm that, given w, reports a small list of codewords that include the original codeword x. Given delta and gamma we study what is the rate R for which there exists a constant q and list size L such that there exist codes of rate R correcting delta-fraction insertions and gamma-fraction deletions while reporting lists of size at most L. Using the concept of synchronization strings, introduced by the first two authors [Proc. STOC 2017], we show some surprising results. We show that for every 0 0 there exist codes of rate 1 - delta - epsilon and constant alphabet (so q = O_{delta,gamma,epsilon}(1)) and sub-logarithmic list sizes. Furthermore, our codes are accompanied by efficient (polynomial time) decoding algorithms. We stress that the fraction of insertions can be arbitrarily large (more than 100%), and the rate is independent of this parameter. We also prove several tight bounds on the parameters of list-decodable insdel codes. In particular, we show that the alphabet size of insdel codes needs to be exponentially large in epsilon^{-1}, where epsilon is the gap to capacity above. Our result even applies to settings where the unique-decoding capacity equals the list-decoding capacity and when it does so, it shows that the alphabet size needs to be exponentially large in the gap to capacity. This is sharp contrast to the Hamming error model where alphabet size polynomial in epsilon^{-1} suffices for unique decoding. This lower bound also shows that the exponential dependence on the alphabet size in previous works that constructed insdel codes is actually necessary! Our result sheds light on the remarkable asymmetry between the impact of insertions and deletions from the point of view of error-correction: Whereas deletions cost in the rate of the code, insertion costs are borne by the adversary and not the code! Our results also highlight the dominance of the model of insertions and deletions over the Hamming model: A Hamming error is equal to one insertion and one deletion (at the same location). Thus the effect of delta-fraction Hamming errors can be simulated by delta-fraction of deletions and delta-fraction of insertions - but insdel codes can deal with much more insertions without loss in rate (though at the price of higher alphabet size)

    Reactivating Fetal Hemoglobin Expression in Human Adult Erythroblasts Through BCL11A Knockdown Using Targeted Endonucleases.

    Get PDF
    We examined the efficiency, specificity, and mutational signatures of zinc finger nucleases (ZFNs), transcriptional activator-like effector nucleases (TALENs), and clustered regularly interspaced short palindromic repeat (CRISPR)/Cas9 systems designed to target the gene encoding the transcriptional repressor BCL11A, in human K562 cells and human CD34+ progenitor cells. ZFNs and TALENs were delivered as in vitro transcribed mRNA through electroporation; CRISPR/Cas9 was codelivered by Cas9 mRNA with plasmid-encoded guideRNA (gRNA) (pU6.g1) or in vitro transcribed gRNA (gR.1). Analyses of efficacy revealed that for these specific reagents and the delivery methods used, the ZFNs gave rise to more allelic disruption in the targeted locus compared to the TALENs and CRISPR/Cas9, which was associated with increased levels of fetal hemoglobin in erythroid cells produced in vitro from nuclease-treated CD34+ cells. Genome-wide analysis to evaluate the specificity of the nucleases revealed high specificity of this specific ZFN to the target site, while specific TALENs and CRISPRs evaluated showed off-target cleavage activity. ZFN gene-edited CD34+ cells had the capacity to engraft in NOD-PrkdcSCID-IL2Rγnull mice, while retaining multi-lineage potential, in contrast to TALEN gene-edited CD34+ cells. CRISPR engraftment levels mirrored the increased relative plasmid-mediated toxicity of pU6.g1/Cas9 in hematopoietic stem/progenitor cells (HSPCs), highlighting the value for the further improvements of CRISPR/Cas9 delivery in primary human HSPCs

    Automatic error correction in syntax-directed compilers /

    Get PDF

    Contributions to the Construction of Extensible Semantic Editors

    Get PDF
    This dissertation addresses the need for easier construction and extension of language tools. Specifically, the construction and extension of so-called semantic editors is considered, that is, editors providing semantic services for code comprehension and manipulation. Editors like these are typically found in state-of-the-art development environments, where they have been developed by hand. The list of programming languages available today is extensive and, with the lively creation of new programming languages and the evolution of old languages, it keeps growing. Many of these languages would benefit from proper tool support. Unfortunately, the development of a semantic editor can be a time-consuming and error-prone endeavor, and too large an effort for most language communities. Given the complex nature of programming, and the huge benefits of good tool support, this lack of tools is problematic. In this dissertation, an attempt is made at narrowing the gap between generative solutions and how state-of-the-art editors are constructed today. A generative alternative for construction of textual semantic editors is explored with focus on how to specify extensible semantic editor services. Specifically, this dissertation shows how semantic services can be specified using a semantic formalism called refer- ence attribute grammars (RAGs), and how these services can be made responsive enough for editing, and be provided also when the text in an editor is erroneous. Results presented in this dissertation have been found useful, both in industry and in academia, suggesting that the explored approach may help to reduce the effort of editor construction

    A Lower Bound on the List-Decodability of Insdel Codes

    Full text link
    For codes equipped with metrics such as Hamming metric, symbol pair metric or cover metric, the Johnson bound guarantees list-decodability of such codes. That is, the Johnson bound provides a lower bound on the list-decoding radius of a code in terms of its relative minimum distance δ\delta, list size LL and the alphabet size q.q. For study of list-decodability of codes with insertion and deletion errors (we call such codes insdel codes), it is natural to ask the open problem whether there is also a Johnson-type bound. The problem was first investigated by Wachter-Zeh and the result was amended by Hayashi and Yasunaga where a lower bound on the list-decodability for insdel codes was derived. The main purpose of this paper is to move a step further towards solving the above open problem. In this work, we provide a new lower bound for the list-decodability of an insdel code. As a consequence, we show that unlike the Johnson bound for codes under other metrics that is tight, the bound on list-decodability of insdel codes given by Hayashi and Yasunaga is not tight. Our main idea is to show that if an insdel code with a given Levenshtein distance dd is not list-decodable with list size LL, then the list decoding radius is lower bounded by a bound involving LL and dd. In other words, if the list decoding radius is less than this lower bound, the code must be list-decodable with list size LL. At the end of the paper we use such bound to provide an insdel-list-decodability bound for various well-known codes, which has not been extensively studied before

    Development of a novel platform for high-throughput gene design and artificial gene synthesis to produce large libraries of recombinant venom peptides for drug discovery

    Get PDF
    Tese de Doutoramento em Ciências Veterinárias na Especialidade de Ciências Biológicas e BiomédicasAnimal venoms are complex mixtures of biologically active molecules that, while presenting low immunogenicity, target with high selectivity and efficacy a variety of membrane receptors. It is believed that animal venoms comprise a natural library of more than 40 million different natural compounds that have been continuously fine-tuned during the evolutionary process to disturb cellular function. Within animal venoms, reticulated peptides are the most attractive class of molecules for drug discovery. However, the use of animal venoms to develop novel pharmacological compounds is still hampered by difficulties in obtaining these low molecular mass cysteine-rich polypeptides in sufficient amounts. Here, a high-throughput gene synthesis platform was developed to produce synthetic genes encoding venom peptides. The final goal of this project is the production of large libraries of recombinant venom peptides that can be screened for drug discovery. A robust and efficient Polymerase Chain Reaction (PCR) methodology was refined to assemble overlapping oligonucleotides into small artificial genes (< 500 bp) with high-fidelity. In addition, two bioinformatics tools were constructed to design multiple optimized genes (ATGenium) and overlapping oligonucleotides (NZYOligo designer), in order to allow automation of the high-throughput gene synthesis platform. The platform can assemble 96 synthetic genes encoding venom peptides simultaneously, with an error rate of 1.1 mutations per kb. To decrease the error rate associated with artificial gene synthesis, an error removal step using phage T7 endonuclease I was designed and integrated into the gene synthesis methodology. T7 endonuclease I was shown to be highly effective to specifically recognize and cleave DNA mismatches allowing a dramatically reduction of error frequency in large synthetic genes, from 3.45 to 0.43 errors per kb. Combining the knowledge acquired in the initial stages of the work, a comprehensive study was performed to investigate the influence of gene design, presence of fusion tags, cellular localization of expression, and usage of Tobacco Etch Virus (TEV) protease for tag removal, on the recombinant expression of disulfide-rich venom peptides in Escherichia coli. Codon usage dramatically affected the levels of recombinant expression in E. coli. In addition, a significant pressure in the usage of the two cysteine codons suggests that both need to be present at equivalent levels in genes designed de novo to ensure high levels of expression. This study also revealed that DsbC was the best fusion tag for recombinant expression of disulfide-rich peptides, in particular when expression of the fusion peptide was directed to the bacterial periplasm. TEV protease was highly effective for efficient tag removal and its recognition sites can tolerate all residues at its C-terminal, with exception of proline, confirming that no extra residues need to be incorporated at the N-terminus of recombinant venom peptides. This study revealed that E. coli is a convenient heterologous host for the expression of soluble and potentially functional venom peptides. Thus, this novel high-throughput gene synthesis platform was used to produce ~5,000 synthetic genes with a low error rate. This genetic library supported the production of the largest library of recombinant venom peptides constructed until now. The library contains 2736 animal venom peptides and it is presently being screened for the discovery of novel drug leads related to different diseases.RESUMO - Desenvolvimento de uma nova plataforma de alta capacidade para desenhar e sintetizar genes artificiais, para a produção de péptidos venómicos recombinantes - Os venenos animais são misturas complexas de moléculas biologicamente activas que se ligam com elevada selectividade e eficácia a uma grande variedade de receptores de membrana. Embora apresentem baixa imunogenicidade, os venenos podem afectar a função celular actuando ao nível dos seus receptores. Actualmente, pensa-se que os venenos de animais constituam uma biblioteca natural de mais de 40 milhões de moléculas diferentes que têm sido continuamente aperfeiçoadas ao longo do processo evolutivo. Tendo em conta a composição dos venenos, os péptidos reticulados são a classe mais atractiva de moléculas com interesse farmacológico. No entanto, a utilização de venenos para o desenvolvimento de novos fármacos está limitada por dificuldades em obter estas moléculas em quantidades adequadas ao seu estudo. Neste trabalho desenvolveu-se uma plataforma de alta capacidade para a síntese de genes sintéticos codificadores de péptidos venómicos, com o objectivo de produzir bibliotecas de péptidos venómicos recombinantes que possam ser rastreadas para a descoberta de novos medicamentos. Com o objectivo de sintetizar genes pequenos (< 500 pares de bases) com elevada fidelidade e em simultâneo, desenvolveu-se uma metodologia de PCR (polymerase chain reaction) robusta e eficiente, que se baseia na extensão de oligonucleótidos sobrepostos. Para possibilitar a automatização da plataforma de síntese de genes, foram construídas duas ferramentas bioinformáticas para desenhar simultaneamente dezenas a milhares de genes optimizados para a expressão em Escherichia coli (ATGenium) e os respectivos oligonucleótios sobrepostos (NZYOligo designer). Esta plataforma foi optimizada para sintetizar em simultâneo 96 genes sintéticos, tendo-se obtido uma taxa de erro de 1.1 mutações por kb de DNA sintetizado. A fim de diminuir a taxa de erro associada à produção de genes sintéticos, desenvolveu-se um método para remoção de erros utilizando a enzima T7 endonuclease I. A enzima T7 endonuclease I mostrou-se muito eficaz no reconhecimento e clivagem de moléculas DNA que apresentam emparelhamentos incorrectos, reduzindo drasticamente a frequência de erros identificados em genes grandes, de 3.45 para 0.43 erros por kb de DNA sintetizado. Investigou-se também a influência do desenho dos genes, da presença de tags de fusão, da localização celular da expressão e da actividade da protease Tobacco Etch Virus (TEV) para a remoção eficiente de tags, na expressão de péptidos venómicos ricos em cisteínas em E. coli. A utilização de codões meticulosamente escolhidos afectou drasticamente os níveis de expressão em E. coli. Para além disso, os resultados mostram que existe uma pressão significativa no uso dos dois codões que codificam para o resíduo cisteína, o que sugere que ambos os codões têm de estar presentes, em níveis equivalentes, nos genes que foram desenhados e optimizados para garantir elevados níveis de expressão. Este trabalho indicou também que o tag de fusão DsbC foi o mais apropriado para a expressão eficiente de péptidos venómicos ricos em cisteínas, particularmente quando os péptidos recombinantes foram expressos no periplasma bacteriano. Confirmou-se que a protease TEV é eficaz na remoção de tags de fusão, podendo o seu local de reconhecimento conter quaisquer aminoácidos na extremidade C-terminal, com excepção da prolina. Desta forma, verificou-se não ser necessário incorporar qualquer aminoácido extra na extremidade N-terminal dos péptidos venómicos recombinantes. Reunindo todos os resultados, verificou-se que a E. coli é um hospedeiro adequado para a expressão, na forma solúvel, de péptidos venómicos potencialmente funcionais. Por último, foram produzidos, com uma taxa de erro reduzida, ~5000 genes sintéticos codificadores de péptidos venómicos utilizando a nova plataforma de elevada capacidade para a síntese de genes aqui desenvolvida. A nova biblioteca de genes sintéticos foi usada para produzir a maior biblioteca de péptidos venómicos recombinantes construída até agora, que inclui 2736 péptidos venómicos. Esta biblioteca recombinante está presentemente a ser rastreada com o objectivo de descobrir novas drogas com interesse para a saúde humana
    corecore