Search CORE

20,456 research outputs found

Fitting Ranked English and Spanish Letter Frequency Distribution in U.S. and Mexican Presidential Speeches

Author: Altmann G.
Borodovsky M. Y.
Burnham K. P.
David P. A.
Friedman W.
Grzybek P.
Grzybek P.
Gusein-Zade S. M.
Gusein-Zade S. M.
Heaps H. S.
Kelih E.
Mrayati M.
Pedro Miramontes
Venables W. N.
Vlad A.
Wentian Li
Zipf G. K.
Publication venue: 'Informa UK Limited'
Publication date: 15/03/2011
Field of study

The limited range in its abscissa of ranked letter frequency distributions causes multiple functions to fit the observed distribution reasonably well. In order to critically compare various functions, we apply the statistical model selections on ten functions, using the texts of U.S. and Mexican presidential speeches in the last 1-2 centuries. Dispite minor switching of ranking order of certain letters during the temporal evolution for both datasets, the letter usage is generally stable. The best fitting function, judged by either least-square-error or by AIC/BIC model selection, is the Cocho/Beta function. We also use a novel method to discover clusters of letters by their observed-over-expected frequency ratios.Comment: 7 figure

arXiv.org e-Print Archive

Crossref

Recommended from our members

Set-related restrictions for semantic groupings

Author: Bic Lubomir
Gilbert Jonathan
Rundensteiner Elke
Yin Meng-Lai
Publication venue: eScholarship, University of California
Publication date: 01/01/1989
Field of study

Semantic database models utilize several fundamental forms of groupings to increase their expressive power. In this paper we consider four of the most common of these constructs; basic set groupings, is-a related groupings, power set groupings, and Cartesian aggregation groupings. For each, we define a number of useful restrictions that control its structure and composition. This permits each grouping to capture more subtle distinctions of the concepts or situations in the application environment. The resulting set of restrictions forms a framework which increases the expressive power of semantic models and specifies various set-related integrity constraints

eScholarship - University of California

Edsger Wybe Dijkstra (1930 -- 2002): A Portrait of a Genius

Author: Apt Krzysztof R.
Publication venue
Publication date: 01/01/2002
Field of study

We discuss the scientific contributions of Edsger Wybe Dijkstra, his opinions and his legacy.Comment: 10 pages. To appear in Formal Aspects of Computin

arXiv.org e-Print Archive

CWI's Institutional Repository

International Migration, Integration and Social Cohesion online publications

Beyond Stemming and Lemmatization: Ultra-stemming to Improve Automatic Text Summarization

Author: Torres-Moreno Juan-Manuel
Publication venue
Publication date: 14/09/2012
Field of study

In Automatic Text Summarization, preprocessing is an important phase to reduce the space of textual representation. Classically, stemming and lemmatization have been widely used for normalizing words. However, even using normalization on large texts, the curse of dimensionality can disturb the performance of summarizers. This paper describes a new method for normalization of words to further reduce the space of representation. We propose to reduce each word to its initial letters, as a form of Ultra-stemming. The results show that Ultra-stemming not only preserve the content of summaries produced by this representation, but often the performances of the systems can be dramatically improved. Summaries on trilingual corpora were evaluated automatically with Fresa. Results confirm an increase in the performance, regardless of summarizer system used.Comment: 22 pages, 12 figures, 9 table

arXiv.org e-Print Archive

CiteSeerX

A Computer-Based Method to Improve the Spelling of Children with Dyslexia

Author: Centre Creix
Clara Bayarri
Cookie Cloud
Luz Rello
Martin Pielot
Yolanda Otal
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

In this paper we present a method which aims to improve the spelling of children with dyslexia through playful and targeted exercises. In contrast to previous approaches, our method does not use correct words or positive examples to follow, but presents the child a misspelled word as an exercise to solve. We created these training exercises on the basis of the linguistic knowledge extracted from the errors found in texts written by children with dyslexia. To test the effectiveness of this method in Spanish, we integrated the exercises in a game for iPad, DysEggxia (Piruletras in Spanish), and carried out a within-subject experiment. During eight weeks, 48 children played either DysEggxia or Word Search, which is another word game. We conducted tests and questionnaires at the beginning of the study, after four weeks when the games were switched, and at the end of the study. The children who played DysEggxia for four weeks in a row had significantly less writing errors in the tests that after playing Word Search for the same time. This provides evidence that error-based exercises presented in a tablet help children with dyslexia improve their spelling skills.Comment: 8 pages, ASSETS'14, October 20-22, 2014, Rochester, NY, US

arXiv.org e-Print Archive

CiteSeerX

Crossref

Information Outlook, May 2004

Author: Special Libraries Association
Publication venue: SJSU ScholarWorks
Publication date: 01/05/2004
Field of study

Volume 8, Issue 5https://scholarworks.sjsu.edu/sla_io_2004/1004/thumbnail.jp

SJSU ScholarWorks

Assessing candidate preference through web browsing history

Author: Avello Daniel Gayo
Conover Michael D
Imai Kosuke
Lohr S
O'Connor Brendan
Patrini Giorgio
Pennacchiotti Marco
Tumasjan Andranik
Yu Felix X.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

Predicting election outcomes is of considerable interest to candidates, political scientists, and the public at large. We propose the use of Web browsing history as a new indicator of candidate preference among the electorate, one that has potential to overcome a number of the drawbacks of election polls. However, there are a number of challenges that must be overcome to effectively use Web browsing for assessing candidate preference—including the lack of suitable ground truth data and the heterogeneity of user populations in time and space. We address these challenges, and show that the resulting methods can shed considerable light on the dynamics of voters’ candidate preferences in ways that are difficult to achieve using polls.Accepted manuscrip

Crossref

Boston University Institutional Repository (OpenBU)

Improving a Strong Neural Parser with Conjunction-Specific Features

Author: Ficler Jessica
Goldberg Yoav
Publication venue
Publication date: 01/01/2017
Field of study

While dependency parsers reach very high overall accuracy, some dependency relations are much harder than others. In particular, dependency parsers perform poorly in coordination construction (i.e., correctly attaching the "conj" relation). We extend a state-of-the-art dependency parser with conjunction-specific features, focusing on the similarity between the conjuncts head words. Training the extended parser yields an improvement in "conj" attachment as well as in overall dependency parsing accuracy on the Stanford dependency conversion of the Penn TreeBank

arXiv.org e-Print Archive

Crossref

The Cowl - Vol LNVII - n.4 - Oct 15, 1992

Author
Publication venue: DigitalCommons@Providence
Publication date: 15/10/1992
Field of study

The Cowl - student newspaper of Providence College. Volume LNVII - Number 4 - October 15, 1992. 24 pages

DigitalCommons@Providence