547 research outputs found

    The European Language Resources and Technologies Forum: Shaping the Future of the Multilingual Digital Europe

    Get PDF
    Proceedings of the 1st FLaReNet Forum on the European Language Resources and Technologies, held in Vienna, at the Austrian Academy of Science, on 12-13 February 2009

    Final FLaReNet deliverable: Language Resources for the Future - The Future of Language Resources

    Get PDF
    Language Technologies (LT), together with their backbone, Language Resources (LR), provide an essential support to the challenge of Multilingualism and ICT of the future. The main task of language technologies is to bridge language barriers and to help creating a new environment where information flows smoothly across frontiers and languages, no matter the country, and the language, of origin. To achieve this goal, all players involved need to act as a community able to join forces on a set of shared priorities. However, until now the field of Language Resources and Technology has long suffered from an excess of individuality and fragmentation, with a lack of coherence concerning the priorities for the field, the direction to move, not to mention a common timeframe. The context encountered by the FLaReNet project was thus represented by an active field needing a coherence that can only be given by sharing common priorities and endeavours. FLaReNet has contributed to the creation of this coherence by gathering a wide community of experts and making them participate in the definition of an exhaustive set of recommendations

    Low-Resource Unsupervised NMT:Diagnosing the Problem and Providing a Linguistically Motivated Solution

    Get PDF
    Unsupervised Machine Translation hasbeen advancing our ability to translatewithout parallel data, but state-of-the-artmethods assume an abundance of mono-lingual data. This paper investigates thescenario where monolingual data is lim-ited as well, finding that current unsuper-vised methods suffer in performance un-der this stricter setting. We find that theperformance loss originates from the poorquality of the pretrained monolingual em-beddings, and we propose using linguis-tic information in the embedding train-ing scheme. To support this, we look attwo linguistic features that may help im-prove alignment quality: dependency in-formation and sub-word information. Us-ing dependency-based embeddings resultsin a complementary word representationwhich offers a boost in performance ofaround 1.5 BLEU points compared to stan-dardWORD2VECwhen monolingual datais limited to 1 million sentences per lan-guage. We also find that the inclusion ofsub-word information is crucial to improv-ing the quality of the embedding

    Insights from the Inventory of Smart Grid Projects in Europe: 2012 Update

    Get PDF
    By the end of 2010 the Joint Research Centre, the European Commission’s in-house science service, launched the first comprehensive inventory of smart grid projects in Europe1. The final catalogue was published in July 2011 and included 219 smart grid and smart metering projects from the EU-28 member states, Switzerland and Norway. The participation of the project coordinators and the reception of the report by the smart grid community were extremely positive. Due to its success, the European Commission decided that the project inventory would be carried out on a regular basis so as to constantly update the picture of smart grid developments in Europe and keep track of lessons learnt and of challenges and opportunities. For this, a new on-line questionnaire was launched in March 2012 and information on projects collected up to September 2012. At the same time an extensive search of project information on the internet and through cooperation links with other European research organizations was conducted. The resulting final database is the most up to date and comprehensive inventory of smart grids and smart metering projects in Europe, including a total of 281 smart grid projects and 90 smart metering pilot projects and rollouts from the same 30 countries that were included in the 2011 inventory database. Projects surveyed were classified into three categories: R&D, demonstration or pre-deployment) and deployment, and for the first time a distinction between smart grid and smart metering projects was made. The following is an insight into the 2012 report.JRC.F.3-Energy securit

    Debating ‘Religious Violence’ in Lebanon: A Comparative Perspective on the Mobilisation of Religious and Secular Militias during the Lebanese Civil War (1975-1990)

    Get PDF
    In a world where collective violence seems increasingly mapped in relation to religions and religious actors rather than secular forces, to understand the potential of religion in promoting conflict has become a formidable and important goal. On the one hand, there are those that argue that ‘religious violence’ is not really religious and at most a perversion of religious teachings. On the other side of the spectrum, an increasing number of commentaries conclude with urgent warnings against religion’s propensity for violence. Rather than taking sides in a debate characterised by sweeping generalisations, this dissertation aims to unravel how, when and at what levels religion can play a role in the social and political mobilisation towards violence, while comparing these mechanisms to non-religious equivalents. A Social Movement Theory (SMT) framework is adopted to analyse the mobilisation processes in four diversely oriented militia movements active in the Lebanese civil war (1975-1990): the Kataeb, the Amal movement, the Progressive Socialist Party and the Lebanese Communist Party. The thesis makes empirical and theoretical contributions on three analytical levels. At the macro-level, the thesis demonstrates how religion co-determined the character of the socio-political context, economic relations, foreign influence, and security issues, against which militia movements emerged as competing forces. The adaptation of critical realism aids in conceptualising the interdependence between these different factors as well as between the analytical levels. At the meso-level it shows how the cooperation and incorporation of religious resources involved significant re-imaginations of prevailing hierarchies and structures – an observation that should change the manner in which we theorise about religion as a resource for mobilisation. Analysing the speech of militia leaders, using the psychometric of integrative complexity, the thesis further demonstrates that no significant differences exist between the relative complexity of religious and non-religious idea structures. IC’s focus on cognitive structures adds an innovative edge to SMT. At the micro-level, augmenting SMT by incorporating insights from the field of social psychology, the thesis evidences how religion played a role in social identification and a mediating role in existential anxiety. Simultaneously, the dissertation cautions that the role of religion is in most instances similar to the role of non-religious counterparts. The research thereby complicates generalising theories on ‘religious violence’, presenting the social mobilisation towards violence as contingent on a complex mix of religious and non-religious ideas, societal structures, available resources, leadership attitudes, social identifications and personal affections

    Empirical machine translation and its evaluation

    Get PDF
    Aquesta tesi estudia l'aplicació de les tecnologies del Processament del Llenguatge Natural disponibles actualment al problema de la Traducció Automàtica basada en Mètodes Empírics i la seva Avaluació.D'una banda, tractem el problema de l'avaluació automàtica. Hem analitzat les principals deficiències dels mètodes d'avaluació actuals, les quals es deuen, al nostre parer, als principis de qualitat superficials en els que es basen. En comptes de limitar-nos al nivell lèxic, proposem una nova direcció cap a avaluacions més heterogènies. El nostre enfocament es basa en el disseny d'un ric conjunt de mesures automàtiques destinades a capturar un ampli ventall d'aspectes de qualitat a diferents nivells lingüístics (lèxic, sintàctic i semàntic). Aquestes mesures lingüístiques han estat avaluades sobre diferents escenaris. El resultat més notable ha estat la constatació de que les mètriques basades en un coneixement lingüístic més profund (sintàctic i semàntic) produeixen avaluacions a nivell de sistema més fiables que les mètriques que es limiten a la dimensió lèxica, especialment quan els sistemes avaluats pertanyen a paradigmes de traducció diferents. Tanmateix, a nivell de frase, el comportament d'algunes d'aquestes mètriques lingüístiques empitjora lleugerament en comparació al comportament de les mètriques lèxiques. Aquest fet és principalment atribuïble als errors comesos pels processadors lingüístics. A fi i efecte de millorar l'avaluació a nivell de frase, a més de recòrrer a la similitud lèxica en absència d'anàlisi lingüística, hem estudiat la possibiliat de combinar les puntuacions atorgades per mètriques a diferents nivells lingüístics en una sola mesura de qualitat. S'han presentat dues estratègies no paramètriques de combinació de mètriques, essent el seu principal avantatge no haver d'ajustar la contribució relativa de cadascuna de les mètriques a la puntuació global. A més, el nostre treball mostra com fer servir el conjunt de mètriques heterogènies per tal d'obtenir detallats informes d'anàlisi d'errors automàticament.D'altra banda, hem estudiat el problema de la selecció lèxica en Traducció Automàtica Estadística. Amb aquesta finalitat, hem construit un sistema de Traducció Automàtica Estadística Castellà-Anglès basat en -phrases', i hem iterat en el seu cicle de desenvolupament, analitzant diferents maneres de millorar la seva qualitat mitjançant la incorporació de coneixement lingüístic. En primer lloc, hem extès el sistema a partir de la combinació de models de traducció basats en anàlisi sintàctica superficial, obtenint una millora significativa. En segon lloc, hem aplicat models de traducció discriminatius basats en tècniques d'Aprenentatge Automàtic. Aquests models permeten una millor representació del contexte de traducció en el que les -phrases' ocorren, efectivament conduint a una millor selecció lèxica. No obstant, a partir d'avaluacions automàtiques heterogènies i avaluacions manuals, hem observat que les millores en selecció lèxica no comporten necessàriament una millor estructura sintàctica o semàntica. Així doncs, la incorporació d'aquest tipus de prediccions en el marc estadístic requereix, per tant, un estudi més profund.Com a qüestió complementària, hem estudiat una de les principals crítiques en contra dels sistemes de traducció basats en mètodes empírics, la seva forta dependència del domini, i com els seus efectes negatius poden ésser mitigats combinant adequadament fonts de coneixement externes. En aquest sentit, hem adaptat amb èxit un sistema de traducció estadística Anglès-Castellà entrenat en el domini polític, al domini de definicions de diccionari.Les dues parts d'aquesta tesi estan íntimament relacionades, donat que el desenvolupament d'un sistema real de Traducció Automàtica ens ha permès viure en primer terme l'important paper dels mètodes d'avaluació en el cicle de desenvolupament dels sistemes de Traducció Automàtica.In this thesis we have exploited current Natural Language Processing technology for Empirical Machine Translation and its Evaluation.On the one side, we have studied the problem of automatic MT evaluation. We have analyzed the main deficiencies of current evaluation methods, which arise, in our opinion, from the shallow quality principles upon which they are based. Instead of relying on the lexical dimension alone, we suggest a novel path towards heterogeneous evaluations. Our approach is based on the design of a rich set of automatic metrics devoted to capture a wide variety of translation quality aspects at different linguistic levels (lexical, syntactic and semantic). Linguistic metrics have been evaluated over different scenarios. The most notable finding is that metrics based on deeper linguistic information (syntactic/semantic) are able to produce more reliable system rankings than metrics which limit their scope to the lexical dimension, specially when the systems under evaluation are different in nature. However, at the sentence level, some of these metrics suffer a significant decrease, which is mainly attributable to parsing errors. In order to improve sentence-level evaluation, apart from backing off to lexical similarity in the absence of parsing, we have also studied the possibility of combining the scores conferred by metrics at different linguistic levels into a single measure of quality. Two valid non-parametric strategies for metric combination have been presented. These offer the important advantage of not having to adjust the relative contribution of each metric to the overall score. As a complementary issue, we show how to use the heterogeneous set of metrics to obtain automatic and detailed linguistic error analysis reports.On the other side, we have studied the problem of lexical selection in Statistical Machine Translation. For that purpose, we have constructed a Spanish-to-English baseline phrase-based Statistical Machine Translation system and iterated across its development cycle, analyzing how to ameliorate its performance through the incorporation of linguistic knowledge. First, we have extended the system by combining shallow-syntactic translation models based on linguistic data views. A significant improvement is reported. This system is further enhanced using dedicated discriminative phrase translation models. These models allow for a better representation of the translation context in which phrases occur, effectively yielding an improved lexical choice. However, based on the proposed heterogeneous evaluation methods and manual evaluations conducted, we have found that improvements in lexical selection do not necessarily imply an improved overall syntactic or semantic structure. The incorporation of dedicated predictions into the statistical framework requires, therefore, further study.As a side question, we have studied one of the main criticisms against empirical MT systems, i.e., their strong domain dependence, and how its negative effects may be mitigated by properly combining outer knowledge sources when porting a system into a new domain. We have successfully ported an English-to-Spanish phrase-based Statistical Machine Translation system trained on the political domain to the domain of dictionary definitions.The two parts of this thesis are tightly connected, since the hands-on development of an actual MT system has allowed us to experience in first person the role of the evaluation methodology in the development cycle of MT systems

    Mediating EU liberalisation and negotiating flexibility: a coalitional approach to wage bargaining change

    Get PDF
    How do we explain divergent trajectories of change in wage bargaining institutions? The advancement of European economic integration, leading to markets liberalisation and increased competition, was expected to bring the breakdown of centralised bargaining arrangements. This expectation was even stronger given the internationalisation of new management practices, pushing European firms to enhance their competitiveness via increasing flexibility. Despite strong theoretical expectations towards a generalised breakdown of wage bargaining, one finds divergent trajectories of change across European countries and sectors. The task of this thesis is to explain the puzzle of varied responses in otherwise similar sectors. Banking and telecommunications sectors in Italy and Greece display a diversity of paths of institutional change: breakdown of bargaining, reform of bargaining, successful centralisation, and failed centralisation. The direction of the paths of institutional change may be explained in large part by two factors ignored by earlier literature: ‘employer associability’ and ‘labourstate coalitions’. On the one hand, it is argued that employers associations which possess the legal competence and take into account the collective interests of both large and smaller firms, may reform the wage bargaining institution, getting the ‘best of both worlds’ for their members. Additionally, a ‘labour-state coalition’ may moderate the destabilising pressures to wage bargaining, as long as trade unions are able to speak with a ‘single voice’. The government will not only be motivated by electoral concerns, but also support centralised bargaining to gain ‘room for manoeuvre’ for tactical policy trade-offs advancing its agenda. Overall, the thesis refines earlier propositions, suggesting a more nuanced causal mechanism to explain institutional change. The argument speaks to wider debates in comparative political economy and comparative employment systems; it fleshes out empirically the role of the state in Mediterranean capitalism and highlights factors that moderate pressures to convergence to the Liberal Market model