4,330,116 research outputs found
Language Time Series Analysis
We use the Detrended Fluctuation Analysis (DFA) and the Grassberger-Proccacia
analysis (GP) methods in order to study language characteristics. Despite that
we construct our signals using only word lengths or word frequencies, excluding
in this way huge amount of information from language, the application of
Grassberger- Proccacia (GP) analysis indicates that linguistic signals may be
considered as the manifestation of a complex system of high dimensionality,
different from random signals or systems of low dimensionality such as the
earth climate. The DFA method is additionally able to distinguish a natural
language signal from a computer code signal. This last result may be useful in
the field of cryptography.Comment: 21 pages, 5 figures, accepted in Physica
Dialectometric analysis of language variation in Twitter
In the last few years, microblogging platforms such as Twitter have given
rise to a deluge of textual data that can be used for the analysis of informal
communication between millions of individuals. In this work, we propose an
information-theoretic approach to geographic language variation using a corpus
based on Twitter. We test our models with tens of concepts and their associated
keywords detected in Spanish tweets geolocated in Spain. We employ
dialectometric measures (cosine similarity and Jensen-Shannon divergence) to
quantify the linguistic distance on the lexical level between cells created in
a uniform grid over the map. This can be done for a single concept or in the
general case taking into account an average of the considered variants. The
latter permits an analysis of the dialects that naturally emerge from the data.
Interestingly, our results reveal the existence of two dialect macrovarieties.
The first group includes a region-specific speech spoken in small towns and
rural areas whereas the second cluster encompasses cities that tend to use a
more uniform variety. Since the results obtained with the two different metrics
qualitatively agree, our work suggests that social media corpora can be
efficiently used for dialectometric analyses.Comment: 10 pages, 7 figures, 1 table. Accepted to VarDial 201
Recommended from our members
Language sample analysis for Spanish speakers
textThe purpose of this project was to develop a Spanish language sample analysis (LSA) scoring procedure for English-Spanish bilinguals used to guide clinicians in developing language goals and monitoring progress on those goals. A Spanish LSA procedure was created and was tested on 20 typically developing and 16 language impaired English-Spanish bilinguals. Each utterance of each language sample was analyzed for correct and attempted use of the 20 grammatical forms selected for the LSA procedure. Based on the results, a preliminary profile of impairment was established. It showed that Relative Clauses, Infinitive Clauses, Present Subjunctive, Third Person Plural Present and Preterit Indicative, Irregular Preterit Indicative, Indirect and Direct Object Clitics, Imperfect, and Plural Nouns were the most problematic forms for English-Spanish bilinguals with LI. Clinical implications of these findings are discussed.Communication Sciences and Disorder
An implementation of Apertium based Assamese morphological analyzer
Morphological Analysis is an important branch of linguistics for any Natural
Language Processing Technology. Morphology studies the word structure and
formation of word of a language. In current scenario of NLP research,
morphological analysis techniques have become more popular day by day. For
processing any language, morphology of the word should be first analyzed.
Assamese language contains very complex morphological structure. In our work we
have used Apertium based Finite-State-Transducers for developing morphological
analyzer for Assamese Language with some limited domain and we get 72.7%
accurac
A literature survey of methods for analysis of subjective language
Subjective language is used to express attitudes and opinions towards things, ideas and people. While content and topic centred natural language processing is now part of everyday life, analysis of subjective aspects of natural language have until recently been largely neglected by the research community. The explosive growth of personal blogs, consumer opinion sites and social network applications in the last years, have however created increased interest in subjective language analysis. This paper provides an overview of recent research conducted in the area
Semantic industrial categorisation based on search engine index
Analysis of specialist language is one of the most pressing
problems when trying to build intelligent content analysis
system. Identifying the scope of the language used and then understanding the relationships between the language entities is a key problem. A semantic relationship analysis of the search engine index was devised and evaluated. Using search engine index provides us with access to the widest database of knowledge in any particular field (if not now, then surely in the future). Social network analysis of keywords collection seems to generate a viable list of the specialist terms and relationships among them. This approach has been tested in the engineering and medical sectors
- …
