10 research outputs found

    On the Mono- and Cross-Language Detection of Text Re-Use and Plagiarism

    Full text link
    Barrón Cedeño, LA. (2012). On the Mono- and Cross-Language Detection of Text Re-Use and Plagiarism [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/16012Palanci

    Gestion et localisation de ressources sur un assistant numérique personnel

    Get PDF
    Ce mémoire traite les problèmes liés à la gestion et à la localisation de services sur un assistant numérique personnel (ANP). La localisation de services permet aux applications de découvrir les ressources d'un réseau offrant les services qu'elles ont spécifiés. Par le fait même, la localisation de services rend possible la configuration dynamique des applications afin qu'elles utilisent les ressources du réseau auquel l'ANP est connecté. Des protocoles de localisation de services existent pour accomplir cette tâche, mais ils ne sont pas disponibles pour les ANP. La recherche qui fait l'objet de ce mémoire a conduit à la réalisation d'un système de gestion et de localisation de services pour un ANP. Ce système permet la spécification, la localisation et l'utilisation de services d'un réseau à partir d'un ANP. Il permet également la configuration dynamique des applications qui utilisent les services. L'architecture de ce système est distribuée. Les ressources sont accessibles via les composants répartis de CORBA. Cette approche rend les clients indépendants de l'implantation des services offerts par les ressources. Elle permet d'uniformiser des interfaces pour l'accès aux ressources

    Insurrectionism in South Africa : The Pan-Africanist congress and the Poqo movement, 1959-1965

    Get PDF
    The thesis discusses the history of a black South African political organisation, the Pan-Africanist Congress during the brief period of its effective influence inside South Africa: from 1959 to the mid-1960s. The PAC is identified as a populist movement, that is a movement of people who in one way or another were attempting to resist the impulses of an industrialising society. Its ideology therefore tended to stress communal as opposed to class-bound social identities. Beginning as a small dissident group within the dominant African political organisation, the African National Congress, the PAC was born after a decade of mass-based campaigning had distanced the ANC from its earlier nationalist position. The PAC acquired a following in only a few places, normally where its rival, the ANC, was weak and badly organised. It only approached the dimensions of a mass movement in the Western Cape where its militant, racially assertive rhetoric attracted migrant workers who were affected by a twin set of pressures: the efforts by the authorities to exclude them from urban society and the restructuring of their home communities in the Transkei. After the PAC's banning in March 1960 these people began to play a crucial role in transforming the organisation from a cluster of conspiritorial nuclei drawn mainly from the middle class into the popular movement Pogo, in the process injecting it with their own material and ideological preoccupations. In 1963 the PAC's exile leadership attempted to mobilise this following in a nation-wide insurrection but most of their preparations were known to the police who anticipated their plans with thousands of arrests PAC-inspired violence was therefore localised and confined mainly to the Transkei and the Western Cape. Two chapters examine the local social tensions which underlay PAC/Pogo violence in Paarl and the Tembu districts of the Southern Transkei. By way of contrast the development of the movement amongst a non-migrant constituency is examined in a chapter on the PAC's progress in East London and Pretoria. The thesis concludes with an examination of the PAC in exile: here divorced from its popular base and from the political environment which gave rise to its ideological concerns the movement lost its vigour and integrity: a classic instance of the tragedy of exile politics

    Technologies for Reusing Text from the Web

    Get PDF
    Texts from the web can be reused individually or in large quantities. The former is called text reuse and the latter language reuse. We first present a comprehensive overview of the different ways in which text and language is reused today, and how exactly information retrieval technologies can be applied in this respect. The remainder of the thesis then deals with specific retrieval tasks. In general, our contributions consist of models and algorithms, their evaluation, and for that purpose, large-scale corpus construction. The thesis divides into two parts. The first part introduces technologies for text reuse detection, and our contributions are as follows: (1) A unified view of projecting-based and embedding-based fingerprinting for near-duplicate detection and the first time evaluation of fingerprint algorithms on Wikipedia revision histories as a new, large-scale corpus of near-duplicates. (2) A new retrieval model for the quantification of cross-language text similarity, which gets by without parallel corpora. We have evaluated the model in comparison to other models on many different pairs of languages. (3) An evaluation framework for text reuse and particularly plagiarism detectors, which consists of tailored detection performance measures and a large-scale corpus of automatically generated and manually written plagiarism cases. The latter have been obtained via crowdsourcing. This framework has been successfully applied to evaluate many different state-of-the-art plagiarism detection approaches within three international evaluation competitions. The second part introduces technologies that solve three retrieval tasks based on language reuse, and our contributions are as follows: (4) A new model for the comparison of textual and non-textual web items across media, which exploits web comments as a source of information about the topic of an item. In this connection, we identify web comments as a largely neglected information source and introduce the rationale of comment retrieval. (5) Two new algorithms for query segmentation, which exploit web n-grams and Wikipedia as a means of discerning the user intent of a keyword query. Moreover, we crowdsource a new corpus for the evaluation of query segmentation which surpasses existing corpora by two orders of magnitude. (6) A new writing assistance tool called Netspeak, which is a search engine for commonly used language. Netspeak indexes the web in the form of web n-grams as a source of writing examples and implements a wildcard query processor on top of it.Texte aus dem Web können einzeln oder in großen Mengen wiederverwendet werden. Ersteres wird Textwiederverwendung und letzteres Sprachwiederverwendung genannt. Zunächst geben wir einen ausführlichen Überblick darüber, auf welche Weise Text und Sprache heutzutage wiederverwendet und wie Technologien des Information Retrieval in diesem Zusammenhang angewendet werden können. In der übrigen Arbeit werden dann spezifische Retrievalaufgaben behandelt. Unsere Beiträge bestehen dabei aus Modellen und Algorithmen, ihrer empirischen Auswertung und der Konstruktion von großen Korpora hierfür. Die Dissertation ist in zwei Teile gegliedert. Im ersten Teil präsentieren wir Technologien zur Erkennung von Textwiederverwendungen und leisten folgende Beiträge: (1) Ein Überblick über projektionsbasierte- und einbettungsbasierte Fingerprinting-Verfahren für die Erkennung nahezu identischer Texte, sowie die erstmalige Evaluierung einer Reihe solcher Verfahren auf den Revisionshistorien der Wikipedia. (2) Ein neues Modell zum sprachübergreifenden, inhaltlichen Vergleich von Texten. Das Modell basiert auf einem mehrsprachigen Korpus bestehend aus Pärchen themenverwandter Texte, wie zum Beispiel der Wikipedia. Wir vergleichen das Modell in mehreren Sprachen mit herkömmlichen Modellen. (3) Eine Evaluierungsumgebung für Algorithmen zur Plagiaterkennung. Die Umgebung besteht aus Maßen, die die Güte der Erkennung eines Algorithmus' quantifizieren, und einem großen Korpus von Plagiaten. Die Plagiate wurden automatisch generiert sowie mit Hilfe von Crowdsourcing manuell erstellt. Darüber hinaus haben wir zwei Workshops veranstaltet, in denen unsere Evaluierungsumgebung erfolgreich zur Evaluierung aktueller Plagiaterkennungsalgorithmen eingesetzt wurde. Im zweiten Teil präsentieren wir auf Sprachwiederverwendung basierende Technologien für drei verschiedene Retrievalaufgaben und leisten folgende Beiträge: (4) Ein neues Modell zum medienübergreifenden, inhaltlichen Vergleich von Objekten aus dem Web. Das Modell basiert auf der Auswertung der zu einem Objekt vorliegenden Kommentare. In diesem Zusammenhang identifizieren wir Webkommentare als eine in der Forschung bislang vernachlässigte Informationsquelle und stellen die Grundlagen des Kommentarretrievals vor. (5) Zwei neue Algorithmen zur Segmentierung von Websuchanfragen. Die Algorithmen nutzen Web n-Gramme sowie Wikipedia, um die Intention des Suchenden in einer Suchanfrage festzustellen. Darüber hinaus haben wir mittels Crowdsourcing ein neues Evaluierungskorpus erstellt, das zwei Größenordnungen größer ist als bisherige Korpora. (6) Eine neuartige Suchmaschine, genannt Netspeak, die die Suche nach gebräuchlicher Sprache ermöglicht. Netspeak indiziert das Web als Quelle für gebräuchliche Sprache in der Form von n-Grammen und implementiert eine Wildcardsuche darauf

    Technologies for Reusing Text from the Web

    Get PDF
    Texts from the web can be reused individually or in large quantities. The former is called text reuse and the latter language reuse. We first present a comprehensive overview of the different ways in which text and language is reused today, and how exactly information retrieval technologies can be applied in this respect. The remainder of the thesis then deals with specific retrieval tasks. In general, our contributions consist of models and algorithms, their evaluation, and for that purpose, large-scale corpus construction. The thesis divides into two parts. The first part introduces technologies for text reuse detection, and our contributions are as follows: (1) A unified view of projecting-based and embedding-based fingerprinting for near-duplicate detection and the first time evaluation of fingerprint algorithms on Wikipedia revision histories as a new, large-scale corpus of near-duplicates. (2) A new retrieval model for the quantification of cross-language text similarity, which gets by without parallel corpora. We have evaluated the model in comparison to other models on many different pairs of languages. (3) An evaluation framework for text reuse and particularly plagiarism detectors, which consists of tailored detection performance measures and a large-scale corpus of automatically generated and manually written plagiarism cases. The latter have been obtained via crowdsourcing. This framework has been successfully applied to evaluate many different state-of-the-art plagiarism detection approaches within three international evaluation competitions. The second part introduces technologies that solve three retrieval tasks based on language reuse, and our contributions are as follows: (4) A new model for the comparison of textual and non-textual web items across media, which exploits web comments as a source of information about the topic of an item. In this connection, we identify web comments as a largely neglected information source and introduce the rationale of comment retrieval. (5) Two new algorithms for query segmentation, which exploit web n-grams and Wikipedia as a means of discerning the user intent of a keyword query. Moreover, we crowdsource a new corpus for the evaluation of query segmentation which surpasses existing corpora by two orders of magnitude. (6) A new writing assistance tool called Netspeak, which is a search engine for commonly used language. Netspeak indexes the web in the form of web n-grams as a source of writing examples and implements a wildcard query processor on top of it.Texte aus dem Web können einzeln oder in großen Mengen wiederverwendet werden. Ersteres wird Textwiederverwendung und letzteres Sprachwiederverwendung genannt. Zunächst geben wir einen ausführlichen Überblick darüber, auf welche Weise Text und Sprache heutzutage wiederverwendet und wie Technologien des Information Retrieval in diesem Zusammenhang angewendet werden können. In der übrigen Arbeit werden dann spezifische Retrievalaufgaben behandelt. Unsere Beiträge bestehen dabei aus Modellen und Algorithmen, ihrer empirischen Auswertung und der Konstruktion von großen Korpora hierfür. Die Dissertation ist in zwei Teile gegliedert. Im ersten Teil präsentieren wir Technologien zur Erkennung von Textwiederverwendungen und leisten folgende Beiträge: (1) Ein Überblick über projektionsbasierte- und einbettungsbasierte Fingerprinting-Verfahren für die Erkennung nahezu identischer Texte, sowie die erstmalige Evaluierung einer Reihe solcher Verfahren auf den Revisionshistorien der Wikipedia. (2) Ein neues Modell zum sprachübergreifenden, inhaltlichen Vergleich von Texten. Das Modell basiert auf einem mehrsprachigen Korpus bestehend aus Pärchen themenverwandter Texte, wie zum Beispiel der Wikipedia. Wir vergleichen das Modell in mehreren Sprachen mit herkömmlichen Modellen. (3) Eine Evaluierungsumgebung für Algorithmen zur Plagiaterkennung. Die Umgebung besteht aus Maßen, die die Güte der Erkennung eines Algorithmus' quantifizieren, und einem großen Korpus von Plagiaten. Die Plagiate wurden automatisch generiert sowie mit Hilfe von Crowdsourcing manuell erstellt. Darüber hinaus haben wir zwei Workshops veranstaltet, in denen unsere Evaluierungsumgebung erfolgreich zur Evaluierung aktueller Plagiaterkennungsalgorithmen eingesetzt wurde. Im zweiten Teil präsentieren wir auf Sprachwiederverwendung basierende Technologien für drei verschiedene Retrievalaufgaben und leisten folgende Beiträge: (4) Ein neues Modell zum medienübergreifenden, inhaltlichen Vergleich von Objekten aus dem Web. Das Modell basiert auf der Auswertung der zu einem Objekt vorliegenden Kommentare. In diesem Zusammenhang identifizieren wir Webkommentare als eine in der Forschung bislang vernachlässigte Informationsquelle und stellen die Grundlagen des Kommentarretrievals vor. (5) Zwei neue Algorithmen zur Segmentierung von Websuchanfragen. Die Algorithmen nutzen Web n-Gramme sowie Wikipedia, um die Intention des Suchenden in einer Suchanfrage festzustellen. Darüber hinaus haben wir mittels Crowdsourcing ein neues Evaluierungskorpus erstellt, das zwei Größenordnungen größer ist als bisherige Korpora. (6) Eine neuartige Suchmaschine, genannt Netspeak, die die Suche nach gebräuchlicher Sprache ermöglicht. Netspeak indiziert das Web als Quelle für gebräuchliche Sprache in der Form von n-Grammen und implementiert eine Wildcardsuche darauf

    Schooling at the edge of the world: An ethnographic study of educational ambivalence within coastal habitus in northern Mozambique

    Get PDF
    Coastal fishing communities in northern Mozambique have distinctive history, politics and livelihoods that make them physically and socially peripheral. This is evident in relation to lack of the access and ownership of natural resources, social opportunities such as education, access to information and decision making means and the influence of cultural-hereditary characteristics of coastal society. The thesis examines learning in Lunga, the key institutions, their roles and importance. Drawing on Mamdani’s concept of bifurcated state, it outlines the historical background of the formal education system that is a necessary frame for understanding many specific problems of education in contemporary Mozambique. In this setting, the thesis reflects on formal, traditional and Islamic education and their different forms of valorisation in the past and present. The study examines the coastal habitus – the problems of life on the periphery, and the social, political and physical distance. From there, it probes deeper into the relations between competing institutions promoting certain distinctive aspects of coastal life, describing production of the local and the global (national). The main focus of the thesis is the characteristic, ambivalent and strained relations between the schooling and coastal habitus, being the manifestation of the tension between local and global spaces. This thesis discusses these questions and related educational practices as culturally mediated responses to the collective uncertainty and marginalization. It describes the community's struggle over the relative value of schooling versus village-based knowledge and skill acquisition necessary for the community members to live within their structural constraints. Furthermore, it points towards questions of political power, suggesting that coastal society's ambivalence about the utility of schooling may be seen as one of the dilemmas of citizenship in contemporary Mozambique. It demonstrates that ambivalent meanings attached to schooling are shaped by their cultural history and their attempts to maintain their livelihoods in the context of political marginality

    Africa and the Two Chinas

    Get PDF

    Norms and their implications for the making of China's foreign aid policy since 1949 : Case studies of Southeast Asia, Africa and Latin America.

    Get PDF
    This thesis will apply the constructivist theory of International Relations (IR) to the study of Chinese foreign policy, beginning with an examination of the IR theories, realism, liberalism and constructivism, and how each theory explains Chinese foreign policy and its aid behaviour. It will focus on norms and their implications for the making of China's foreign aid policy. Four norms, Asianism, internationalism, sovereignty, and developmental ism are discussed and related to their specific roles in China's policy making. Asianism involves the construction of an Asian identity within Asia, internationalism involves the development of international responsibility, sovereignty entails non-interference in other countries' affairs, and developmentalism involves the transmission of the Beijing Consensus. The analysis continues by linking China's identity to each norm in an historical overview of Chinese foreign policy since 1949. The overview demonstrates how China's identity has become transformed at critical stages throughout the history of the PRC, from victim to neutral actor, to its present great power state, and how these changes in identity have influenced China's subsequent behaviour. By examining three cases, Southeast Asia, Africa and Latin America, this thesis seeks to explain China's foreign policy within each region and highlights how China's policies have been guided by its identity and the mutually constituted norms during its periods of regional activity. The Southeast Asia study is focussed on all four norms, whilst the African and Latin American studies address internationalism, sovereignty and developmental ism. Particular attention is placed upon China's changing identity and its impact on China's future foreign policy and application of foreign aid

    Caractérisation et interprétation de l'enregistrement métamorphique afin de définir les processus d'enfoncement et d'exhumation dans les orogènes (exemple du Massif de Bohême)

    Get PDF
    A range of petrological and structural works carried in the Bohemian Massif largely contributed to establishment of the models of burial and exhumation mechanisms in the external and internal orogenic domains. The eastern margin of the Bohemian Massif is formed by the Brunia microcontinent that was underthrust below the westerly orogenic root of the Moldanubian-Lugian domain. The underthrusting produced in the Brunia basement approximately 50 kilometers wide zone of deformation and metamorphism, called the Moravo-Silesian zone. The Lugian-Moldanubian internal domain forms the core of the orogen and it is characterized by presence of high grade rocks, such as high-pressure felsic granulites, eclogites, garnetiferous and spinel peridotites and migmatites, low-grade rocks and magmatic bodies, preserving a complicated structural and metamorphic history of the deep orogenic root. The presence of high-pressure and low-temperature rocks in the Saxothuringian domain to the west is interpreted as a remnant of a subduction zone.At the south-eastern margin of the Bohemian Massif, two large tectonic windows of the Moravian zone are emerging through a migmatitic nappe of the Moldanubian domain. We have shown that Barrovian prograde metamorphism in these tectonic windows, ranging from chlorite to kyanite zone, is related to continental underthrusting below the orogenic root, and retrograde P−T evolution to nappe stacking and inversion of metamorphic isograds. Complex pattern of isograds that obliquely crosscut tectonic boundaries in a present section is interpreted as a result of late folding of crustal sheets. In the northeastern margin of the Bohemian Massif occurs the Silesian domain, characterized by Barrovian and Buchan type metamorphism. In the kyanite zone, we described eclogite lenses in the metapelite matrix and demonstrated their separate metamorphic histories, indicating mixing of rocks at the tip of the underthrust crustal wedge. Further east, based on chloritoid-staurolite equilibria, we distinguished a high geothermal gradient during prograde metamorphism that is probably associated with inherited heat from a Devonian intracontinental rift, which can also explain the development of Buchan-type metamorphism. In the north-eastern part of the contact between the Lugian domain and the Silesian zone occurs the Staré Město belt, composed of granodiorite, layered amphibolite, metagabbro and serpentinite. Combined structural and geochronological study revealed a structural unconformity between subhorizontal fabrics in the granulites and amphibolites dated at 500 Ma and fabrics of the granodiorite sheet that are parallel to steep foliations reworking the gabbros, dated at 340 Ma on zircon. This allowed distinguishing relics of subhorizontal Ordovician fabrics for the first time in the whole Variscan European belt and interpreting these lithologies as a Cambro-Ordovician intracontinental rift. The Variscan tectono-metamorphic event is manifested by syn-convergent intrusion of the Carboniferous granodiorite sill and by HT-MP compressional deformation of the gabbros that produces foliations parallel with the structure of the granodiorite. Based on these criteria we interpreted the Staré Město belt as a preserved example of an intracontinental Cambro-Ordovician rift that has been reactivated and exhumed during the Variscan orogeny.The studies in the Moldanubian-Lugian orogenic root domain shown that HP conditions are connected with early shallow-dipping fabric, and that HP rocks are exhumed during vertical crustal-scale folding that leads to extrusion of HP rocks along vertical channels. This produces a pattern of large-scale “synforms” cored commonly by HP granulites and large-scale “antiforms” dominated by metasediments. This model of crustal-scale folding and vertical extrusion is proposed as a possible major exhumation mechanism of HP rocks in hot orogens and a potential role of gravity in this process is discussed.Studies of HP granulites from the Moldanubian orogenic root pointed to a problem of interpretation of equilibrium metamorphic assemblages in high-grade rocks and consequently to a problem of determining the peak metamorphic conditions. Classic combination of high-grossular garnet and high-temperature ternary feldspar led to estimation of metamorphic conditions to >1000 °C and 18−28 kbar. Microstructural analysis has shown that mafic granulites contain several generations of garnet, and that ternary feldspar and high-grossular garnet belong to two distinct episodes, and cannot be combined to infer metamorphic conditions. Detailed study of a Morb-type eclogite shows unusually hot prograde conditions that are interpreted as a result of Devonian thermal rejuvenation of continental back arc domain that was responsible for softening of a future orogenic root domain. The steep foliations in the Lugian-Moldanubian domains are to a different degree reworked by shallow-dipping fabrics. At the eastern margin of the Bohemian Massif, kilometre-scale crustal boudins of HP granulites and eclogites enclosed in migmatites are present. The HP lenses experienced first burial to 20 kbar and than exhumation to 10–7 kbar and the mid-crustal rocks revealed increase of pressure to 10 kbar. All the system was re-equilibrated at 7 kbar where the high pressure rocks cooled while mid-crustal rocks became heated. We interpreted this metamorphic pattern as a result of vertically extruded lower-crustal rocks into the middle crust, then travelling as fragments in a subhorizontal hot migmatitic channel above a continental margin. This is interpreted as a continental channel flow domain that developed in the orogenic root above the underthrust Brunia basement, and which is eroded along its total width of 120 km and length of 200 km. A detailed petrological, microstructural and geochemical study of migmatites in the well developed shallow-dipping fabrics of the Moldanubian orogenic root allowed discussion about the evolution of the “channel flow” domain and about the melt migration in the crust. Four migmatite types collected in the steep and shallow-dipping structures show decreasing P−T conditions, indicating exhumation and cooling. Progressive changes in whole rock composition are interpreted in terms of open system behavior, caused by crustal-scale melt infiltration operating at grain boundaries. The mineral-equilibria modelling shown that such “metasomatism” by cirulating melt may cause the observed whole rock composition changes and was suggested as a new mechanism of melt transport in the crust. In order to discuss the effects of melting on large-scale rheology during orogenic root deformations, pseudosection modeling of melt quantities was done to evaluate deformation mechanisms in various high-grade orthogneisses. Simultaneous metamorphic and structural studies of lower- and middle crustal rocks permitted correlation of retrograde and very rarely preserved prograde metamorphic fabrics of the middle and lower orogenic crust, which started discussion of coupled and uncoupled mechanisms of burial and exhumation in the two crustal levels, but also with respect to the upper and lower plate