20,456 research outputs found
Fitting Ranked English and Spanish Letter Frequency Distribution in U.S. and Mexican Presidential Speeches
The limited range in its abscissa of ranked letter frequency distributions
causes multiple functions to fit the observed distribution reasonably well. In
order to critically compare various functions, we apply the statistical model
selections on ten functions, using the texts of U.S. and Mexican presidential
speeches in the last 1-2 centuries. Dispite minor switching of ranking order of
certain letters during the temporal evolution for both datasets, the letter
usage is generally stable. The best fitting function, judged by either
least-square-error or by AIC/BIC model selection, is the Cocho/Beta function.
We also use a novel method to discover clusters of letters by their
observed-over-expected frequency ratios.Comment: 7 figure
Recommended from our members
Set-related restrictions for semantic groupings
Semantic database models utilize several fundamental forms of groupings to increase their expressive power. In this paper we consider four of the most common of these constructs; basic set groupings, is-a related groupings, power set groupings, and Cartesian aggregation groupings. For each, we define a number of useful restrictions that control its structure and composition. This permits each grouping to capture more subtle distinctions of the concepts or situations in the application environment. The resulting set of restrictions forms a framework which increases the expressive power of semantic models and specifies various set-related integrity constraints
Edsger Wybe Dijkstra (1930 -- 2002): A Portrait of a Genius
We discuss the scientific contributions of Edsger Wybe Dijkstra, his opinions
and his legacy.Comment: 10 pages. To appear in Formal Aspects of Computin
Beyond Stemming and Lemmatization: Ultra-stemming to Improve Automatic Text Summarization
In Automatic Text Summarization, preprocessing is an important phase to
reduce the space of textual representation. Classically, stemming and
lemmatization have been widely used for normalizing words. However, even using
normalization on large texts, the curse of dimensionality can disturb the
performance of summarizers. This paper describes a new method for normalization
of words to further reduce the space of representation. We propose to reduce
each word to its initial letters, as a form of Ultra-stemming. The results show
that Ultra-stemming not only preserve the content of summaries produced by this
representation, but often the performances of the systems can be dramatically
improved. Summaries on trilingual corpora were evaluated automatically with
Fresa. Results confirm an increase in the performance, regardless of summarizer
system used.Comment: 22 pages, 12 figures, 9 table
A Computer-Based Method to Improve the Spelling of Children with Dyslexia
In this paper we present a method which aims to improve the spelling of
children with dyslexia through playful and targeted exercises. In contrast to
previous approaches, our method does not use correct words or positive examples
to follow, but presents the child a misspelled word as an exercise to solve. We
created these training exercises on the basis of the linguistic knowledge
extracted from the errors found in texts written by children with dyslexia. To
test the effectiveness of this method in Spanish, we integrated the exercises
in a game for iPad, DysEggxia (Piruletras in Spanish), and carried out a
within-subject experiment. During eight weeks, 48 children played either
DysEggxia or Word Search, which is another word game. We conducted tests and
questionnaires at the beginning of the study, after four weeks when the games
were switched, and at the end of the study. The children who played DysEggxia
for four weeks in a row had significantly less writing errors in the tests that
after playing Word Search for the same time. This provides evidence that
error-based exercises presented in a tablet help children with dyslexia improve
their spelling skills.Comment: 8 pages, ASSETS'14, October 20-22, 2014, Rochester, NY, US
Information Outlook, May 2004
Volume 8, Issue 5https://scholarworks.sjsu.edu/sla_io_2004/1004/thumbnail.jp
Assessing candidate preference through web browsing history
Predicting election outcomes is of considerable interest to candidates, political scientists, and the public at large. We propose the use of Web browsing history as a new indicator of candidate preference among the electorate, one that has potential to overcome a number of the drawbacks of election polls. However, there are a number of challenges that must be overcome to effectively use Web browsing for assessing candidate preference—including the lack of suitable ground truth data and the heterogeneity of user populations in time and space. We address these challenges, and show that the resulting methods can shed considerable light on the dynamics of voters’ candidate preferences in ways that are difficult to achieve using polls.Accepted manuscrip
Improving a Strong Neural Parser with Conjunction-Specific Features
While dependency parsers reach very high overall accuracy, some dependency
relations are much harder than others. In particular, dependency parsers
perform poorly in coordination construction (i.e., correctly attaching the
"conj" relation). We extend a state-of-the-art dependency parser with
conjunction-specific features, focusing on the similarity between the conjuncts
head words. Training the extended parser yields an improvement in "conj"
attachment as well as in overall dependency parsing accuracy on the Stanford
dependency conversion of the Penn TreeBank
The Cowl - Vol LNVII - n.4 - Oct 15, 1992
The Cowl - student newspaper of Providence College. Volume LNVII - Number 4 - October 15, 1992. 24 pages
- …