171 research outputs found

    Two-Dimensional Digitized Picture Arrays and Parikh Matrices

    Get PDF
    Parikh matrix mapping or Parikh matrix of a word has been introduced in the literature to count the scattered subwords in the word. Several properties of a Parikh matrix have been extensively investigated. A picture array is a two-dimensional connected digitized rectangular array consisting of a finite number of pixels with each pixel in a cell having a label from a finite alphabet. Here we extend the notion of Parikh matrix of a word to a picture array and associate with it two kinds of Parikh matrices, called row Parikh matrix and column Parikh matrix. Two picture arrays A and B are defined to be M-equivalent if their row Parikh matrices are the same and their column Parikh matrices are the same. This enables to extend the notion of M-ambiguity to a picture array. In the binary and ternary cases, conditions that ensure M-ambiguity are then obtained

    Subword balance, position indices and power sums

    Get PDF
    AbstractIn this paper, we investigate various ways of characterizing words, mainly over a binary alphabet, using information about the positions of occurrences of letters in words. We introduce two new measures associated with words, the position index and sum of position indices. We establish some characterizations, connections with Parikh matrices, and connections with power sums. One particular emphasis concerns the effect of morphisms and iterated morphisms on words

    On inequalities between subword histories

    Get PDF

    Injectivity of the Parikh matrix mappings revisited

    Get PDF
    Abstract. We deal with the notion of M-unambiguit

    Repetitive subwords

    Get PDF
    The central notionof thisthesisis repetitionsin words. We studyproblemsrelated to contiguous repetitions. More specifically we will consider repeating scattered subwords of non-primitive words, i.e. words which are complete repetitions of other words. We will present inequalities concerning these occurrences as well as giving apartial solutionto an openproblemposedby Salomaaet al. We will characterize languages, whichare closed under the operation ofduplication, thatis repeating any factor of a word. We alsogive newbounds onthe number of occurrencesof certain types of repetitions of words. We give a solution to an open problem posed by Calbrix and Nivat concerning regular languages consisting of non-primitive words. We alsopresentsomeresultsregarding theduplication closureoflanguages,among which a new proof to a problem of Bovet and Varricchio

    Spoken content retrieval: A survey of techniques and technologies

    Get PDF
    Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

    Certain Distance-Based Topological Indices of Parikh Word Representable Graphs

    Get PDF
    Relating graph structures with words which are finite sequences of symbols, Parikh word representable graphs (PWRGsPWRGs) were introduced. On the other hand in chemical graph theory, graphs have been associated with molecular structures. Also several topological indices have been defined in terms of graph parameters and studied for different classes of graphs. In this paper, we derive expressions for computing certain topological indices of PWRGsPWRGs of binary core words, thereby enriching the study of $PWRGs.

    Understanding and Enhancing the Use of Context for Machine Translation

    Get PDF
    To understand and infer meaning in language, neural models have to learn complicated nuances. Discovering distinctive linguistic phenomena from data is not an easy task. For instance, lexical ambiguity is a fundamental feature of language which is challenging to learn. Even more prominently, inferring the meaning of rare and unseen lexical units is difficult with neural networks. Meaning is often determined from context. With context, languages allow meaning to be conveyed even when the specific words used are not known by the reader. To model this learning process, a system has to learn from a few instances in context and be able to generalize well to unseen cases. The learning process is hindered when training data is scarce for a task. Even with sufficient data, learning patterns for the long tail of the lexical distribution is challenging. In this thesis, we focus on understanding certain potentials of contexts in neural models and design augmentation models to benefit from them. We focus on machine translation as an important instance of the more general language understanding problem. To translate from a source language to a target language, a neural model has to understand the meaning of constituents in the provided context and generate constituents with the same meanings in the target language. This task accentuates the value of capturing nuances of language and the necessity of generalization from few observations. The main problem we study in this thesis is what neural machine translation models learn from data and how we can devise more focused contexts to enhance this learning. Looking more in-depth into the role of context and the impact of data on learning models is essential to advance the NLP field. Moreover, it helps highlight the vulnerabilities of current neural networks and provides insights into designing more robust models.Comment: PhD dissertation defended on November 10th, 202
    corecore