4,330,116 research outputs found

    Language Time Series Analysis

    Get PDF
    We use the Detrended Fluctuation Analysis (DFA) and the Grassberger-Proccacia analysis (GP) methods in order to study language characteristics. Despite that we construct our signals using only word lengths or word frequencies, excluding in this way huge amount of information from language, the application of Grassberger- Proccacia (GP) analysis indicates that linguistic signals may be considered as the manifestation of a complex system of high dimensionality, different from random signals or systems of low dimensionality such as the earth climate. The DFA method is additionally able to distinguish a natural language signal from a computer code signal. This last result may be useful in the field of cryptography.Comment: 21 pages, 5 figures, accepted in Physica

    Dialectometric analysis of language variation in Twitter

    Full text link
    In the last few years, microblogging platforms such as Twitter have given rise to a deluge of textual data that can be used for the analysis of informal communication between millions of individuals. In this work, we propose an information-theoretic approach to geographic language variation using a corpus based on Twitter. We test our models with tens of concepts and their associated keywords detected in Spanish tweets geolocated in Spain. We employ dialectometric measures (cosine similarity and Jensen-Shannon divergence) to quantify the linguistic distance on the lexical level between cells created in a uniform grid over the map. This can be done for a single concept or in the general case taking into account an average of the considered variants. The latter permits an analysis of the dialects that naturally emerge from the data. Interestingly, our results reveal the existence of two dialect macrovarieties. The first group includes a region-specific speech spoken in small towns and rural areas whereas the second cluster encompasses cities that tend to use a more uniform variety. Since the results obtained with the two different metrics qualitatively agree, our work suggests that social media corpora can be efficiently used for dialectometric analyses.Comment: 10 pages, 7 figures, 1 table. Accepted to VarDial 201

    An implementation of Apertium based Assamese morphological analyzer

    Full text link
    Morphological Analysis is an important branch of linguistics for any Natural Language Processing Technology. Morphology studies the word structure and formation of word of a language. In current scenario of NLP research, morphological analysis techniques have become more popular day by day. For processing any language, morphology of the word should be first analyzed. Assamese language contains very complex morphological structure. In our work we have used Apertium based Finite-State-Transducers for developing morphological analyzer for Assamese Language with some limited domain and we get 72.7% accurac

    A literature survey of methods for analysis of subjective language

    Get PDF
    Subjective language is used to express attitudes and opinions towards things, ideas and people. While content and topic centred natural language processing is now part of everyday life, analysis of subjective aspects of natural language have until recently been largely neglected by the research community. The explosive growth of personal blogs, consumer opinion sites and social network applications in the last years, have however created increased interest in subjective language analysis. This paper provides an overview of recent research conducted in the area

    Semantic industrial categorisation based on search engine index

    Get PDF
    Analysis of specialist language is one of the most pressing problems when trying to build intelligent content analysis system. Identifying the scope of the language used and then understanding the relationships between the language entities is a key problem. A semantic relationship analysis of the search engine index was devised and evaluated. Using search engine index provides us with access to the widest database of knowledge in any particular field (if not now, then surely in the future). Social network analysis of keywords collection seems to generate a viable list of the specialist terms and relationships among them. This approach has been tested in the engineering and medical sectors
    corecore