6,552 research outputs found

    ON MONITORING LANGUAGE CHANGE WITH THE SUPPORT OF CORPUS PROCESSING

    Get PDF
    One of the fundamental characteristics of language is that it can change over time. One method to monitor the change is by observing its corpora: a structured language documentation. Recent development in technology, especially in the field of Natural Language Processing allows robust linguistic processing, which support the description of diverse historical changes of the corpora. The interference of human linguist is inevitable as it determines the gold standard, but computer assistance provides considerable support by incorporating computational approach in exploring the corpora, especially historical corpora. This paper proposes a model for corpus development, where corpus are annotated to support further computational operations such as lexicogrammatical pattern matching, automatic retrieval and extraction. The corpus processing operations are performed by local grammar based corpus processing software on a contemporary Indonesian corpus. This paper concludes that data collection and data processing in a corpus are equally crucial importance to monitor language change, and none can be set aside

    Constrained Output Embeddings for End-to-End Code-Switching Speech Recognition with Only Monolingual Data

    Full text link
    The lack of code-switch training data is one of the major concerns in the development of end-to-end code-switching automatic speech recognition (ASR) models. In this work, we propose a method to train an improved end-to-end code-switching ASR using only monolingual data. Our method encourages the distributions of output token embeddings of monolingual languages to be similar, and hence, promotes the ASR model to easily code-switch between languages. Specifically, we propose to use Jensen-Shannon divergence and cosine distance based constraints. The former will enforce output embeddings of monolingual languages to possess similar distributions, while the later simply brings the centroids of two distributions to be close to each other. Experimental results demonstrate high effectiveness of the proposed method, yielding up to 4.5% absolute mixed error rate improvement on Mandarin-English code-switching ASR task.Comment: 5 pages, 3 figures, accepted to INTERSPEECH 201

    Representativeness as a Forgotten Lesson for Multilingual and Code-switched Data Collection and Preparation

    Full text link
    Multilingualism is widespread around the world and code-switching (CSW) is a common practice among different language pairs/tuples across locations and regions. However, there is still not much progress in building successful CSW systems, despite the recent advances in Massive Multilingual Language Models (MMLMs). We investigate the reasons behind this setback through a critical study about the existing CSW data sets (68) across language pairs in terms of the collection and preparation (e.g. transcription and annotation) stages. This in-depth analysis reveals that \textbf{a)} most CSW data involves English ignoring other language pairs/tuples \textbf{b)} there are flaws in terms of representativeness in data collection and preparation stages due to ignoring the location based, socio-demographic and register variation in CSW. In addition, lack of clarity on the data selection and filtering stages shadow the representativeness of CSW data sets. We conclude by providing a short check-list to improve the representativeness for forthcoming studies involving CSW data collection and preparation.Comment: Accepted for EMNLP'23 Findings (to appear on EMNLP'23 Proceedings

    PRESERVING AND PROTECTING JAVANESE LANGUAGES BY APPLYING CODE SWITCHING AND CODE MIXING IN TEACHING ENGLISH IN CLASSROOM ( SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENT FOR THE ASSIGNMENT OF PRAGMATICS)

    Get PDF
    Javanese languages are a part of Indigenous languages. They are one of culture’s heritages which Javanese people should preserve and protect them in this globalization era. Javanese languages are considered as the cornerstone of culture and the ultimate expression which Javanese people should know that by using them, culture can be shared and transmitted to further generations to express their identities. However, in the real fact, many indigenous languages in this world are almost extinct, and even, Javanese languages almost disappear at this time in some areas and places. It is crystal clear that there is no special caution from other Javanese people, especially government, in trying to preserve them. Not only government, but also the parents and the elders should take part in this case. Javanese languages should be passed on from generation to the other. Even, it is not strange when the parental generation speaks the Javanese language, they do not often pass it on to their children. Therefore, in an increasing number of cases, Javanese languages are used only by elders. Actually, the loss of some Javanese languages can be caused by some factors, such as irresistible social, political, and economic pressures. In this matter, the relationship and the cooperation between a language planning, language policy, language rights and language education are needed to prevent this phenomena. They are used as vehicles for promoting and perpetuating the vitality, versatility, and stability of Javanese languages. Creating and arranging a better language planning and a better language policy are important to do in Indonesia right now to protect Indonesian language and Indigenous languages, especially Javanese languages. It is, of course, also supported by developing and paying attention to the language rights. Moreover, focusing on language in education for children and young people is a best way to start preserving Javanese languages. Including Javanese children and youth in this discussion on language and education is befitting and appropriate. It needs to know that education in classroom and school areas have also the potential of saving and reviving Javanese languages which are at the brink of extinction. The non-recognition and the prohibition of the use of Javanese languages in the education and work place has impacted the lives of many Javanese people, it has affected them from childhood to adulthood, in the creation of their identity and development of their communities. Education world, in classroom and school areas, which was used as an instrument of assimilation of some languages in Indonesia, especially in Central Java, has impacted in the Javanese languages. Therefore, applying code switching and code mixing in teaching English in classroom should be offered to Javanese people, but also to all students who stay in Java island, as a means of combating prejudices and discrimination and promoting inclusive and respectful societies, is better step to do. However, in order to make it real, the cooperation and the seriousness of government, Javanese people, parents, elders, teachers, and even lecturers must be created in Indonesia, especially in Central Java. It is better for government to make a decision explicitly in keeping and preserving Javanese languages from the extinction through teaching activities in classroom and school areas as the basic formal activity. Keywords : Code mixing, code switching, indigenous languages, Javanes
    • …
    corecore