270,413 research outputs found
Using Regular Expressions for Data Management in Stata
Regular expressions make a number of data management operations involving string variables much easier. They do this by allowing the user to search for (and copy or replace) complex patterns of characters within a string. Examples of when regular expression are useful include extracting zip codes from addresses, reformatting dates if they were entered in an inconsistent manner, and removing excess spaces from string expressions. This presentation will give the user a basic introduction to the use of regular expressions, and the Stata functions related to regular expressions, as well as examples of applications where regular expressions can be used to streamline data management.
Ovid, the Fasti and the stars
According to Quintilian, poetry cannot be fully understood without a good knowledge of the
stars. As one example he cites the fact that poets frequently indicate the time of year by the
rising and setting of stars and constellations, a device familiar to us from Hesiod onwards.1
For Quintilian, who had the benefit of a stable civil calendar, there may have seemed little
reason beyond a desire for poetic expression to specify the date in this manner: but before
Caesarâs calendar reforms in 45 BC, the appearance and disappearance of certain stars just
before sunrise and just after sunset provided a much more regular guide to the year than the
erratic calendars of Greece and Rome, which were often out of step with the solar year.2 It is
therefore not surprising to find the same method of specifying the date in prose authors too;3
and lists of these stellar phenomena, arranged in various calendar-like formats, are found in
both texts and inscriptions. These lists, known as parapegmata, can be traced back to fifth
century Greece, but the tradition may be considerably older.4
Whatever our reaction to Quintilianâs claim, it is certainly the case that a good knowledge of
the stars is important for a full understanding of Ovidâs calendar poem, the Fasti. To a large
extent the poem presents itself as a poetic version of the Roman calendar: each book covers a
different month, and as the year and the work progress, Ovid marks the dates of various
religious festivals and historical events, as in the real fasti. However, unlike many of the
extant fasti, Ovid combines this material with material from the parapegmatic tradition, giving
dates for the rising and setting of various stars and constellations, and for the journey of the
sun through the zodiac. The inclusion of the constellations â and of the aetiological tales
explaining their presence in the sky â enables Ovid to introduce a variety of Greek myths into
the Roman calendar, where they would otherwise have no place
Subpath Queries on Compressed Graphs: A Survey
Text indexing is a classical algorithmic problem that has been studied for over four decades: given a text T, pre-process it off-line so that, later, we can quickly count and locate the occurrences of any string (the query pattern) in T in time proportional to the query’s length. The earliest optimal-time solution to the problem, the suffix tree, dates back to 1973 and requires up to two orders of magnitude more space than the plain text just to be stored. In the year 2000, two breakthrough works showed that efficient queries can be achieved without this space overhead: a fast index be stored in a space proportional to the text’s entropy. These contributions had an enormous impact in bioinformatics: today, virtually any DNA aligner employs compressed indexes. Recent trends considered more powerful compression schemes (dictionary compressors) and generalizations of the problem to labeled graphs: after all, texts can be viewed as labeled directed paths. In turn, since finite state automata can be considered as a particular case of labeled graphs, these findings created a bridge between the fields of compressed indexing and regular language theory, ultimately allowing to index regular languages and promising to shed new light on problems, such as regular expression matching. This survey is a gentle introduction to the main landmarks of the fascinating journey that took us from suffix trees to today’s compressed indexes for labeled graphs and regular languages
Corpora and evaluation tools for multilingual named entity grammar development
We present an effort for the development of multilingual named entity grammars in a unification-based finite-state formalism (SProUT). Following an extended version of the MUC7 standard, we have developed Named Entity Recognition grammars for German, Chinese, Japanese, French, Spanish, English, and Czech. The grammars recognize person names, organizations, geographical locations, currency, time and date expressions. Subgrammars and gazetteers are shared as much as possible for the grammars of the different languages. Multilingual corpora from the business domain are used for grammar development and evaluation. The annotation format (named entity and other linguistic information) is described. We present an evaluation tool which provides detailed statistics and diagnostics, allows for partial matching of annotations, and supports user-defined mappings between different annotation and grammar output formats
Fund Finder: A case study of database-to-ontology mapping
The mapping between databases and ontologies is a basic problem when trying to "upgrade" deep web content to the semantic web. Our approach suggests the declarative definition of mappings as a way to achieve domain independency and reusability. A specific language (expressive enough to cover some real world mapping situations like lightly structured databases or not 1st normal form ones) is defined for this purpose. Along with this mapping description language, the ODEMapster processor is in charge of carrying out the effective instance data migration. We illustrate this by testing both the mappings definition and processor on a case study
- âŠ