270,413 research outputs found

    Using Regular Expressions for Data Management in Stata

    Get PDF
    Regular expressions make a number of data management operations involving string variables much easier. They do this by allowing the user to search for (and copy or replace) complex patterns of characters within a string. Examples of when regular expression are useful include extracting zip codes from addresses, reformatting dates if they were entered in an inconsistent manner, and removing excess spaces from string expressions. This presentation will give the user a basic introduction to the use of regular expressions, and the Stata functions related to regular expressions, as well as examples of applications where regular expressions can be used to streamline data management.

    Ovid, the Fasti and the stars

    Get PDF
    According to Quintilian, poetry cannot be fully understood without a good knowledge of the stars. As one example he cites the fact that poets frequently indicate the time of year by the rising and setting of stars and constellations, a device familiar to us from Hesiod onwards.1 For Quintilian, who had the benefit of a stable civil calendar, there may have seemed little reason beyond a desire for poetic expression to specify the date in this manner: but before Caesar’s calendar reforms in 45 BC, the appearance and disappearance of certain stars just before sunrise and just after sunset provided a much more regular guide to the year than the erratic calendars of Greece and Rome, which were often out of step with the solar year.2 It is therefore not surprising to find the same method of specifying the date in prose authors too;3 and lists of these stellar phenomena, arranged in various calendar-like formats, are found in both texts and inscriptions. These lists, known as parapegmata, can be traced back to fifth century Greece, but the tradition may be considerably older.4 Whatever our reaction to Quintilian’s claim, it is certainly the case that a good knowledge of the stars is important for a full understanding of Ovid’s calendar poem, the Fasti. To a large extent the poem presents itself as a poetic version of the Roman calendar: each book covers a different month, and as the year and the work progress, Ovid marks the dates of various religious festivals and historical events, as in the real fasti. However, unlike many of the extant fasti, Ovid combines this material with material from the parapegmatic tradition, giving dates for the rising and setting of various stars and constellations, and for the journey of the sun through the zodiac. The inclusion of the constellations – and of the aetiological tales explaining their presence in the sky – enables Ovid to introduce a variety of Greek myths into the Roman calendar, where they would otherwise have no place

    Subpath Queries on Compressed Graphs: A Survey

    Get PDF
    Text indexing is a classical algorithmic problem that has been studied for over four decades: given a text T, pre-process it off-line so that, later, we can quickly count and locate the occurrences of any string (the query pattern) in T in time proportional to the query’s length. The earliest optimal-time solution to the problem, the suffix tree, dates back to 1973 and requires up to two orders of magnitude more space than the plain text just to be stored. In the year 2000, two breakthrough works showed that efficient queries can be achieved without this space overhead: a fast index be stored in a space proportional to the text’s entropy. These contributions had an enormous impact in bioinformatics: today, virtually any DNA aligner employs compressed indexes. Recent trends considered more powerful compression schemes (dictionary compressors) and generalizations of the problem to labeled graphs: after all, texts can be viewed as labeled directed paths. In turn, since finite state automata can be considered as a particular case of labeled graphs, these findings created a bridge between the fields of compressed indexing and regular language theory, ultimately allowing to index regular languages and promising to shed new light on problems, such as regular expression matching. This survey is a gentle introduction to the main landmarks of the fascinating journey that took us from suffix trees to today’s compressed indexes for labeled graphs and regular languages

    Corpora and evaluation tools for multilingual named entity grammar development

    Get PDF
    We present an effort for the development of multilingual named entity grammars in a unification-based finite-state formalism (SProUT). Following an extended version of the MUC7 standard, we have developed Named Entity Recognition grammars for German, Chinese, Japanese, French, Spanish, English, and Czech. The grammars recognize person names, organizations, geographical locations, currency, time and date expressions. Subgrammars and gazetteers are shared as much as possible for the grammars of the different languages. Multilingual corpora from the business domain are used for grammar development and evaluation. The annotation format (named entity and other linguistic information) is described. We present an evaluation tool which provides detailed statistics and diagnostics, allows for partial matching of annotations, and supports user-defined mappings between different annotation and grammar output formats

    Fund Finder: A case study of database-to-ontology mapping

    Get PDF
    The mapping between databases and ontologies is a basic problem when trying to "upgrade" deep web content to the semantic web. Our approach suggests the declarative definition of mappings as a way to achieve domain independency and reusability. A specific language (expressive enough to cover some real world mapping situations like lightly structured databases or not 1st normal form ones) is defined for this purpose. Along with this mapping description language, the ODEMapster processor is in charge of carrying out the effective instance data migration. We illustrate this by testing both the mappings definition and processor on a case study
    • 

    corecore