151 research outputs found

    Automatic Construction of Cross-lingual Networks of Concepts from the Hong Kong SAR Police Department

    Get PDF
    Abstract. The tragic event of September 11 has prompted the rapid growth of attention of national security and criminal analysis. In the national security world, very large volumes of data and information are generated and gathered. Much of this data and information written in different languages and stored in different locations may be seemingly unconnected. Therefore, cross-lingual semantic interoperability is a major challenge to generate an overview of this disparate data and information so that it can be analysed, searched. The traditional information retrieval (IR) approaches normally require a document to share some keywords with the query. In reality, the users may use some keywords that are different from what used in the documents. There are then two different term spaces, one for the users, and another for the documents. The problem can be viewed as the creation of a thesaurus. The creation of such relationships would allow the system to match queries with relevant documents, even though they contain different terms. Apart from this, terrorists and criminals may communicate through letters, e-mails and faxes in languages other than English. The translation ambiguity significantly exacerbates the retrieval problem. To facilitate cross-lingual information retrieval, a corpusbased approach uses the term co-occurrence statistics in parallel or comparable corpora to construct a statistical translation model to cross the language boundary. However, collecting parallel corpora between European language and Oriental language is not an easy task due to the unique linguistics and grammar structures of oriental languages. In this paper, the text-based approach to align English/Chinese Hong Kong Police press release documents from the Web is first presented. This article then reports an algorithmic approach to generate a robust knowledge base based on statistical correlation analysis of the semantics (knowledge) embedded in the bilingual press release corpus. The research output consisted of a thesaurus-like, semantic network knowledge base, which can aid in semantics-based cross-lingual information management and retrieval

    Multilingual practices in a disavowed community: The case of new Italian migrants in London

    Get PDF
    This thesis aims to investigate the linguistic repertoires of new Italian migrants in London and the multilingual practices in which they engage. Italian mass emigration has re-started after the 2008 economic crisis. This new migration continues a long tradition: Italians migrated en masse after the country’s unification and after the Second World War. In the UK, they mainly emigrated after the Second World War to industrial towns, such as Bedford. In contrast, London has become the favourite destination of the post-2008 crisis wave. In the last decade, scholars focused on the social differences between past and new migrants, while the last linguistic study on the Italian community in London was carried out in the 1990s and thus it does not cover the new wave. The research presented here is an attempt to fill this gap. Recorded data collected through ethnographic observations of social gatherings organised by new migrants are presented to show how they engage with translanguaging. Interview data are also used to further explore and better understand participants’ multilingual practices and their ideologies on those. One recurring aspect emerges from both the data sources. Participants’ disavow their national community. They often negotiate the traditional understanding of ethnic and national community by challenging or denying their belonging to the Italian community in London. Nevertheless, informants acknowledge the existence of an in-group style, used by them and by other new migrants, characterised by the possibility of translanguaging. Translanguaging is adopted to negotiate and perform new identities, and to identify the other, who cannot be included in translanguaging practices. Participants demonstrate their membership in (or disaffiliation from) the group through the agreement (or disagreement) with the group style. This seems a challenge to the a priori labelling system based on ethnicity and migratory status, which may be seen as an analytical issue for the study of new transnational and mobile migrant communities

    Interoperabilidade semântica: uma análise das perspectivas teóricas dos estudos desenvolvidos na área de Ciência da Informação

    Get PDF
    It analyzes the theoretical perspectives of the studies on semantic interoperability of Information Science in order to identify their approach with ontological and/or epistemological approaches. It is a bibliographic and exploratory research that uses the technique of content analysis and bibliometric analysis. From the analysis of 54 articles on semantic interoperability indexed in the Web of Science and classified in the Information Science Library Science categoryit was identified that research on semantic interoperability in the LIS area is mostly applied and, often, are limited to describing the development of processes and products without worrying about presenting the fundamentals behind them. Most research has little or no theoretical foundation on language or on what is meant by objectivity.Esse estudo analisa as perspectivas teóricas dos estudos sobre interoperabilidade semântica da Ciência da Informação e identifica a aproximação destas com as abordagens ontológica e/ou epistemológica. É uma pesquisa bibliográfica e exploratória que emprega a técnica de análise de conteúdo e análise bibliométrica. A partir da análise de 54 artigos sobre interoperabilidade semântica indexados na Web of Science e classificados na categoria Information Science Library Science, identificou-seque as pesquisas sobre interoperabilidade semântica na área de CI são majoritariamente aplicadas e, muitas vezes, limitam-se a descrever o desenvolvimento de processos e produtos sem se preocupar em apresentar os fundamentos que estão por trás dos mesmos. A maior parte das pesquisas apresenta pouca ou nenhuma fundamentação teórica sobre a linguagem ou sobre o que se entende por objetividade

    Digital literacy practices of Saudi Female university students

    Get PDF
    This study examines the way young Saudi women use language and other communicative resources in their digitally mediated interactions. It is motivated by the debate in Saudi Arabia on the impact of digital media on the way people use language, especially Arabic, the way they manage their social relationships, and the way they enact their cultural identities. The study was conducted at a women’s university in the eastern part of Saudi Arabia. A hundred and three participants were asked to complete a questionnaire on their online language use. Forty-seven of those participants were asked to keep a detailed literacy log of their digital practices over a period of four days and to submit samples of their interactions for closer analysis. The theoretical framework used to analyze the data combines concepts from new literacy studies (Barton & Hamilton, 1998; Gee & Hayes, 2010; Street, 2003), multimodal discourse analysis (Kress & Van Leeuwen, 2006; Jewitt, Bezemer, & O'Halloran, 2016), and mediated discourse analysis (Jones & Norris, 2005; Scollon, 2001). The framework sees people’s language use in terms of social practices and explores how those practices are affected by the different media people use to communicate, and how mediated communication is linked to broader issues of culture and identity. The analysis reveals that the participants’ digital practices are multimodal and multilingual, and the choices they make about the codes and modes they use take place in the context of a complex nexus of practice, involving the interaction among (i) the affordances and constrains of the different technologies they use, (ii) the demands of their social relationships, and (iii) their individual experiences and socialization into different ways of communicating. By appropriating different codes and modes in different ways in social media, young Saudi women are able to strategically situate themselves in different cultural ‘worlds’, maintaining traditional identities and cultural practices while at the same time enacting new kinds of identities. The study contributes to the debate on the effect of digital media on language use by adopting a sociocultural approach which links language use to social practices, social relationships and social identities

    Automatic construction of English/Chinese parallel corpus.

    Get PDF
    Li Kar Wing.Thesis (M.Phil.)--Chinese University of Hong Kong, 2001.Includes bibliographical references (leaves 88-96).Abstracts in English and Chinese.ABSTRACT --- p.iACKNOWLEDGEMENTS --- p.vLIST OF TABLES --- p.viiiLIST OF FIGURES --- p.ixCHAPTERSChapter 1. --- INTRODUCTION --- p.1Chapter 1.1 --- Application of corpus-based techniques --- p.2Chapter 1.1.1 --- Machine Translation (MT) --- p.2Chapter 1.1.1.1 --- Linguistic --- p.3Chapter 1.1.1.2 --- Statistical --- p.4Chapter 1.1.1.3 --- Lexicon construction --- p.4Chapter 1.1.2 --- Cross-lingual Information Retrieval (CLIR) --- p.6Chapter 1.1.2.1 --- Controlled vocabulary --- p.6Chapter 1.1.2.2 --- Free text --- p.7Chapter 1.1.2.3 --- Application corpus-based approach in CLIR --- p.9Chapter 1.2 --- Overview of linguistic resources --- p.10Chapter 1.3 --- Written language corpora --- p.12Chapter 1.3.1 --- Types of corpora --- p.13Chapter 1.3.2 --- Limitation of comparable corpora --- p.16Chapter 1.4 --- Outline of the dissertation --- p.17Chapter 2. --- LITERATURE REVIEW --- p.19Chapter 2.1 --- Research in automatic corpus construction --- p.20Chapter 2.2 --- Research in translation alignment --- p.25Chapter 2.2.1 --- Sentence alignment --- p.27Chapter 2.2.2 --- Word alignment --- p.28Chapter 2.3 --- Research in alignment of sequences --- p.33Chapter 3. --- ALIGNMENT AT WORD LEVEL AND CHARACTER LEVEL --- p.35Chapter 3.1 --- Title alignment --- p.35Chapter 3.1.1 --- Lexical features --- p.37Chapter 3.1.2 --- Grammatical features --- p.40Chapter 3.1.3 --- The English/Chinese alignment model --- p.41Chapter 3.2 --- Alignment at word level and character level --- p.42Chapter 3.2.1 --- Alignment at word level --- p.42Chapter 3.2.2 --- Alignment at character level: Longest matching --- p.44Chapter 3.2.3 --- Longest common subsequence(LCS) --- p.46Chapter 3.2.4 --- Applying LCS in the English/Chinese alignment model --- p.48Chapter 3.3 --- Reduce overlapping ambiguity --- p.52Chapter 3.3.1 --- Edit distance --- p.52Chapter 3.3.2 --- Overlapping in the algorithm model --- p.54Chapter 4. --- ALIGNMENT AT TITLE LEVEL --- p.59Chapter 4.1 --- Review of score functions --- p.59Chapter 4.2 --- The Score function --- p.60Chapter 4.2.1 --- (C matches E) and (E matches C) --- p.60Chapter 4.2.2 --- Length similarity --- p.63Chapter 5. --- EXPERIMENTAL RESULTS --- p.69Chapter 5.1 --- Hong Kong government press release articles --- p.69Chapter 5.2 --- Hang Seng Bank economic monthly reports --- p.76Chapter 5.3 --- Hang Seng Bank press release articles --- p.78Chapter 5.4 --- Hang Seng Bank speech articles --- p.81Chapter 5.5 --- Quality of the collections and future work --- p.84Chapter 6. --- CONCLUSION --- p.87Bibliograph

    Translating Eastern European Identities into the American National Narrative

    Get PDF
    The purpose of this study is two-fold: to examine the absence from current cultural studies on immigration and ethnicity of the Eastern European American as a conceptual entity, and to propose and implement a new methodology of reading immigrant autobiographical narratives that seeks to make transparent the cultural and linguistic processes of translation through which immigrants negotiate their identities in America. Part I provides the methodology and contextual framework I employ in the re-examinations of Mary Antin's The Promised Land (1912) and Eva Hoffman's Lost in Translation (1989). The historical contextualization focuses on two periods that determined conceptual shifts-- the two decades of anti-immigration sentiment that led to the Immigration Acts of 1921 and 1924, and the decades following World War II, when post-Holocaust consciousness opened the door to the institutionalization of a Jewish identity that both encompassed and effaced the Eastern European one at the same time that Cold War politics hindered the development of an Eastern European immigrant space of articulation. A brief analysis of Flannery O'Connor's story "The Displaced Person" (1954) will underscore the dominant culture's difficulty in conceptualizing Eastern European difference and its place in the American national narrative. After arguing for the need that we differentiate between immigrant and ethnic narratives, I introduce the concept of "palimpsestic translation" and develop a critical paradigm that weds translation theory to the genre of immigrant autobiography and to narratives of immigration at large. Parts II and III contribute to the reconceptualization and partial reconstitution of the Eastern European immigrant American space through a close re-examination of Antin's and Hoffman's immigrant narratives as "palimpsestic translations." The two analyses address issues of historicity, literary and historical visibility, and translatability, as they pertain to and illuminate each text. The conclusion briefly assesses the status of Eastern European American studies and outlines the contribution of my proposed reading paradigm to the resuscitation of a critical and theoretical interest in Eastern European American identities. Finally, I situate my study within the larger call for a reconsideration of the relationship between Translation Studies, American and Cultural Studies, and Ethnic Studies

    Discourses of Tension in a Rainbow Nation: Transcultural Identity Formations among Hakka Mauritians

    Get PDF
    Identity formation happens at a crossroads of that which people believe they are and are not. Acknowledgment, reification, or subversion of identity frictions form powerful communicative patterns that I call ‘discourses of tension’. I argue in this dissertation that discourses of tension are foundational to the formation of transcultural identities—positionalities that emerge between or beyond perceived cultural boundaries—because they enable people to identify and express cultural complexities and expectations. Based on ten months of ethnographic fieldwork and research in other relevant sites, this argument is supported by my analysis of how Hakka Chinese Mauritians express agency and identity within the affordances and constraints presented by historical relations, ideologies, policies, and sociopolitical developments in postcolonial Mauritius. This small Indian Ocean island state is lauded for its peaceful multicultural society while imposing restrictive ethnic classification into four groups (Hindu, Muslim, Chinese, and ‘General Population’) onto its citizens. Mauritian identity formation is anchored in raciolinguistic ideologies which view language and race as naturally linked. These ideologies produce expectations of people’s language use and identity expression, which often conflict with social realities in Mauritius. Within this field of tension, Hakka Mauritians often find themselves having to reassert their identities as ‘authentically’ Mauritian, Chinese, or Hakka. This is further complicated by the recent ‘rise’ of China, which promotes Mandarin language education (instead of Hakka) and affects local perceptions of what it means to be ‘Chinese’. I present three key contexts in which discourses of tension become salient for Hakka Mauritian expression: Mauritian discourses of nation-building and ethnolinguistic community formation Shifts from Hakka to Mandarin in Chinese Mauritian heritage language classrooms Ideologies of ‘Chineseness’ in the semiotic landscape of Mauritian Chinatown My research shows that Hakka Mauritians occupy constant ‘in-between’ spaces and engage in discourses of tension to (re-)examine their identities. My dissertation thus contributes to anthropology an account of individual agency in expressing fluidity and complexity in transcultural identities against the backdrop of discursive tensions

    Online Identities and Linguistic Practices: A case of Arab Study Abroad Students in the UK on Twitter

    Get PDF
    This research investigates the online linguistic practices of five Arab study abroad students in the UK who are Twitter users. These students deploy rich and diverse linguistic repertoires, which include Standard Arabic (Fus’ha), Classical Arabic, colloquial Arabic (Ammyah), as well as different English repertoires and digital affordances (emoji). The study explores and demonstrates how these individuals use their diverse linguistic repertoiresto communicate ideas and construct online identities. In addition, it investigates participants’ attitudes towards different online linguistic practices. Lastly, this study exploresthe impact of mobility, understood geographically as moving to study in the UK, and socially as becoming sojourners, on these practices, thus expanding our understanding of how these two aspects of contemporary life interact. Online ethnography is used as the methodology in this research. This includes observing participants’ Twitter accounts for nine months and conducting interviews with them to seek interpretations of, and comments on, their online practices. Thus, the study makes a methodological contribution to researching online practices of Arab sojourners in the UK. Previous studies (e.g. Al Alaslaa, 2018; Albirini, 2016; Al-Jarf, 2010; Eldin, 2014; Kosoff, 2014) have relied heavily on text analysis, making assumptions about individuals’ intentions when they analyse their repertoire use. To address this limitation, this study interviews the participants to allow them to comment on how and why they use their linguistic repertoires in order to delve into their language ideologies and aspects of online identity construction. The findings show that the participants predominantly used two categories of Arabic: Standard Arabic A (Fus’ha) and Colloquial Arabic (CA) in addition to the use of English and emoji. All these resources are deployed by the participants to construct different macro- and microlevel identities (Bucholtz & Hall, 2005). Another main finding is that most participants relied on CA more than any other varieties, despite the common language ideologies that continue to (re)produce and reinforce the status of Standard Arabic among Arabic speakers (e.g.Albirini, 2016; Bassiouney, 2015; Hoigilt, 2018). It was also found that the role of English in this study is not as dominant as has been reported in previous studies on Arab internet users (e.g.Al-Saleem, 2011; Eldin, 2014; Kosoff, 2014; Strong & Hareb, 2012). Finally, the analysis reveals that mobility does not seem to have a significant impact on the participants’ online linguistic practices. This study contributes to the literature on digital communication, language attitudes, and identity, and to our wider understanding of these areas. More importantly, it adds to recent debates in sociolinguistics regarding concepts such as ‘multilingualism’, ‘languaging’, ‘codeswitching’ and ‘translanguaging’. Moreover, the current study will have some potential practical implications. Thousands of Arab students come to study in the UK annually. Knowing how these students communicate on social media will inform university educators about their ideologies and attitudes to the languages they speak. Also, the findings help to change some of the common perceptions among Arab individuals about linguistic practices of Arab sojourners in the UK

    Foundation, Implementation and Evaluation of the MorphoSaurus System: Subword Indexing, Lexical Learning and Word Sense Disambiguation for Medical Cross-Language Information Retrieval

    Get PDF
    Im medizinischen Alltag, zu welchem viel Dokumentations- und Recherchearbeit gehört, ist mittlerweile der überwiegende Teil textuell kodierter Information elektronisch verfügbar. Hiermit kommt der Entwicklung leistungsfähiger Methoden zur effizienten Recherche eine vorrangige Bedeutung zu. Bewertet man die Nützlichkeit gängiger Textretrievalsysteme aus dem Blickwinkel der medizinischen Fachsprache, dann mangelt es ihnen an morphologischer Funktionalität (Flexion, Derivation und Komposition), lexikalisch-semantischer Funktionalität und der Fähigkeit zu einer sprachübergreifenden Analyse großer Dokumentenbestände. In der vorliegenden Promotionsschrift werden die theoretischen Grundlagen des MorphoSaurus-Systems (ein Akronym für Morphem-Thesaurus) behandelt. Dessen methodischer Kern stellt ein um Morpheme der medizinischen Fach- und Laiensprache gruppierter Thesaurus dar, dessen Einträge mittels semantischer Relationen sprachübergreifend verknüpft sind. Darauf aufbauend wird ein Verfahren vorgestellt, welches (komplexe) Wörter in Morpheme segmentiert, die durch sprachunabhängige, konzeptklassenartige Symbole ersetzt werden. Die resultierende Repräsentation ist die Basis für das sprachübergreifende, morphemorientierte Textretrieval. Neben der Kerntechnologie wird eine Methode zur automatischen Akquise von Lexikoneinträgen vorgestellt, wodurch bestehende Morphemlexika um weitere Sprachen ergänzt werden. Die Berücksichtigung sprachübergreifender Phänomene führt im Anschluss zu einem neuartigen Verfahren zur Auflösung von semantischen Ambiguitäten. Die Leistungsfähigkeit des morphemorientierten Textretrievals wird im Rahmen umfangreicher, standardisierter Evaluationen empirisch getestet und gängigen Herangehensweisen gegenübergestellt
    • …
    corecore