41 research outputs found
Institute for the Protection and Security of the Citizen Activity Report 2002.
Abstract not availableJRC.G-Institute for the Protection and the Security of the Citizen (Ispra
Design of a Controlled Language for Critical Infrastructures Protection
We describe a project for the construction of controlled language for critical infrastructures protection (CIP). This project originates
from the need to coordinate and categorize the communications on CIP at the European level. These communications can be physically
represented by official documents, reports on incidents, informal communications and plain e-mail. We explore the application of
traditional library science tools for the construction of controlled languages in order to achieve our goal. Our starting point is an
analogous work done during the sixties in the field of nuclear science known as the Euratom Thesaurus.JRC.G.6-Security technology assessmen
Foundation, Implementation and Evaluation of the MorphoSaurus System: Subword Indexing, Lexical Learning and Word Sense Disambiguation for Medical Cross-Language Information Retrieval
Im medizinischen Alltag, zu welchem viel Dokumentations- und Recherchearbeit gehört, ist mittlerweile der überwiegende Teil textuell kodierter Information elektronisch verfügbar. Hiermit kommt der Entwicklung leistungsfähiger Methoden zur effizienten Recherche eine vorrangige Bedeutung zu.
Bewertet man die Nützlichkeit gängiger Textretrievalsysteme aus dem Blickwinkel der medizinischen Fachsprache, dann mangelt es ihnen an morphologischer Funktionalität (Flexion, Derivation und Komposition), lexikalisch-semantischer Funktionalität und der Fähigkeit zu einer sprachübergreifenden Analyse großer Dokumentenbestände.
In der vorliegenden Promotionsschrift werden die theoretischen Grundlagen des MorphoSaurus-Systems (ein Akronym für Morphem-Thesaurus) behandelt. Dessen methodischer Kern stellt ein um Morpheme der medizinischen Fach- und Laiensprache gruppierter Thesaurus dar, dessen Einträge mittels semantischer Relationen sprachübergreifend verknüpft sind. Darauf aufbauend wird ein Verfahren vorgestellt, welches (komplexe) Wörter in Morpheme segmentiert, die durch sprachunabhängige, konzeptklassenartige Symbole ersetzt werden. Die resultierende Repräsentation ist die Basis für das sprachübergreifende, morphemorientierte Textretrieval.
Neben der Kerntechnologie wird eine Methode zur automatischen Akquise von Lexikoneinträgen vorgestellt, wodurch bestehende Morphemlexika um weitere Sprachen ergänzt werden. Die Berücksichtigung sprachübergreifender Phänomene führt im Anschluss zu einem neuartigen Verfahren zur Auflösung von semantischen Ambiguitäten.
Die Leistungsfähigkeit des morphemorientierten Textretrievals wird im Rahmen umfangreicher, standardisierter Evaluationen empirisch getestet und gängigen Herangehensweisen gegenübergestellt
Proceedings of LOAIT '07 : II Workshop on Legal Ontologies and Artificial Intelligence Techniques
Proceedings of the 2nd Workshop on Legal Ontologies and Artificial Intelligence Techniques June 4th, 2007, Stanford Universit
Liage de données RDF : évaluation d'approches interlingues
The Semantic Web extends the Web by publishing structured and interlinked data using RDF.An RDF data set is a graph where resources are nodes labelled in natural languages. One of the key challenges of linked data is to be able to discover links across RDF data sets. Given two data sets, equivalent resources should be identified and linked by owl:sameAs links. This problem is particularly difficult when resources are described in different natural languages.This thesis investigates the effectiveness of linguistic resources for interlinking RDF data sets. For this purpose, we introduce a general framework in which each RDF resource is represented as a virtual document containing text information of neighboring nodes. The context of a resource are the labels of the neighboring nodes. Once virtual documents are created, they are projected in the same space in order to be compared. This can be achieved by using machine translation or multilingual lexical resources. Once documents are in the same space, similarity measures to find identical resources are applied. Similarity between elements of this space is taken for similarity between RDF resources.We performed evaluation of cross-lingual techniques within the proposed framework. We experimentally evaluate different methods for linking RDF data. In particular, two strategies are explored: applying machine translation or using references to multilingual resources. Overall, evaluation shows the effectiveness of cross-lingual string-based approaches for linking RDF resources expressed in different languages. The methods have been evaluated on resources in English, Chinese, French and German. The best performance (over 0.90 F-measure) was obtained by the machine translation approach. This shows that the similarity-based method can be successfully applied on RDF resources independently of their type (named entities or thesauri concepts). The best experimental results involving just a pair of languages demonstrated the usefulness of such techniques for interlinking RDF resources cross-lingually.Le Web des données étend le Web en publiant des données structurées et liées en RDF. Un jeu de données RDF est un graphe orienté où les ressources peuvent être des sommets étiquetées dans des langues naturelles. Un des principaux défis est de découvrir les liens entre jeux de données RDF. Étant donnés deux jeux de données, cela consiste à trouver les ressources équivalentes et les lier avec des liens owl:sameAs. Ce problème est particulièrement difficile lorsque les ressources sont décrites dans différentes langues naturelles.Cette thèse étudie l'efficacité des ressources linguistiques pour le liage des données exprimées dans différentes langues. Chaque ressource RDF est représentée comme un document virtuel contenant les informations textuelles des sommets voisins. Les étiquettes des sommets voisins constituent le contexte d'une ressource. Une fois que les documents sont créés, ils sont projetés dans un même espace afin d'être comparés. Ceci peut être réalisé à l'aide de la traduction automatique ou de ressources lexicales multilingues. Une fois que les documents sont dans le même espace, des mesures de similarité sont appliquées afin de trouver les ressources identiques. La similarité entre les documents est prise pour la similarité entre les ressources RDF.Nous évaluons expérimentalement différentes méthodes pour lier les données RDF. En particulier, deux stratégies sont explorées: l'application de la traduction automatique et l'usage des banques de données terminologiques et lexicales multilingues. Dans l'ensemble, l'évaluation montre l'efficacité de ce type d'approches. Les méthodes ont été évaluées sur les ressources en anglais, chinois, français, et allemand. Les meilleurs résultats (F-mesure > 0.90) ont été obtenus par la traduction automatique. L'évaluation montre que la méthode basée sur la similarité peut être appliquée avec succès sur les ressources RDF indépendamment de leur type (entités nommées ou concepts de dictionnaires)
Concordancing Software in Practice: An investigation of searches and translation problems across EU official languages
2011/2012The present work reports on an empirical study aimed at investigating translation problems across multiple language pairs. In particular, the analysis is aimed at developing a methodological approach to study concordance search logs taken as manifestations of translation problems and, in a wider perspective, information needs. As search logs are a relatively unexplored data type within translation process research, a controlled environment was needed in order to carry out this exploratory analysis without incurring in additional problems caused by an excessive amount of variables. The logs were collected at the European Commission and contain a large volume of searches from English into 20 EU languages that staff translators working for the EU translation services submitted to an internally available multilingual concordancer. The study attempts to (i) identify differences in the searches (i.e. problems) based on the language pairs; and (ii) group problems into types. Furthermore, the interactions between concordance users and the tool itself have been examined to provide a translation-oriented perspective on the domain of Human-Computer Interaction.
The study draws on the literature on translation problems, Information Retrieval and Web search log analysis, moving from the assumption that in the perspective of concordance searching, translation problems are best interpreted as information needs for which the concordancer is chosen as a form of external support. The structure of a concordance search is examined in all its parts and is eventually broken down into two main components: the 'Search Strategy' component and the 'Problem Unit' component. The former was further analyzed using a mainly quantitative approach, whereas the latter was addressed from a more qualitative perspective. The analysis of the Problem Unit takes into account the length of the search strings as well as their content and linguistic form, each addressed with a different methodological approach. Based on the understanding of concordance searches as manifestations of translation problems, a user- centered classification of translation-oriented information needs is developed to account for as many "problem" scenarios as possible.
According to the initial expectations, different languages should experience different problems. This assumption could not be verified: the 20 different language pairs considered in this study behaved consistently on many levels and, due to the specific research environment, no definite conclusions could be reached as regards the role of the language family criterion for problem identification. The analysis of the 'Problem Unit' component has highlighted automatized support for translating Named Entities as a possible area for further research in translation technology and the development of computer-based translation support tools. Finally, the study indicates (concordance) search logs as an additional data type to be used in experiments on the translation process and for triangulation purposes, while drawing attention on the concordancer as a type of translation aid to be further fine-tuned for the needs of professional translators. ***Il presente lavoro consiste in uno studio empirico sui problemi di traduzione che emergono quando si considerano diverse coppie di lingue e in particolare sviluppa una metodologia per analizzare i log di ricerche effettuate dai traduttori in un software di concordanza (concordancer) quali manifestazioni di problemi di traduzione che, visti in una prospettiva più ampia, si possono anche considerare dei "bisogni d'informazione" (information needs). I log di ricerca costituiscono una tipologia di dato ancora relativamente nuova e inesplorata nell'ambito delle ricerche sul processo di traduzione e pertanto è emersa la necessità di svolgere un'analisi di tipo esplorativo in un contesto controllato onde evitare le problematiche aggiuntive derivanti da un numero eccessivo di variabili. I log di ricerca sono stati raccolti presso la Commissione europea e contengono quantitativi ingenti di ricerche effettuate dai traduttori impiegati presso i servizi di traduzione dell'Unione europea in un concordancer multilingue disponibile come risorsa interna. L'analisi si propone di individuare le differenze nelle ricerche (e quindi nei problemi) a seconda della coppia di lingue selezionata e di raggruppare tali problemi in tipologie. Lo studio fornisce inoltre informazioni sulle modalità di interazione tra gli utenti e il software nell'ambito di un contesto traduttivo, contribuendo alla ricerca nel campo dell'interazione uomo-macchina (Human-Computer Interaction).
Il presente studio trae spunto dalla letteratura sui problemi di traduzione, sull'estrazione d'informazioni (Information Retrieval) e sulle ricerche nel Web e si propone di considerare i problemi di traduzione associati all'impiego di uno strumento per le concordanze quali bisogni di informazione per i quali lo strumento di concordanze è stato scelto come forma di supporto esterna. Ogni singola ricerca è stata esaminata e scomposta in due elementi principali: la "strategia di ricerca" (Search Strategy) e l'"unità problematica" (Problem Unit) che vengono studiati rispettivamente usando approcci prevalentemente di tipo quantitativo e qualitativo. L'analisi dell'unità problematica prende in considerazione la lunghezza, il contenuto e la forma linguistica delle stringhe, analizzando ciascuna con una metodologia di lavoro appositamente studiata. Avendo interpretato le ricerche di concordanze quali manifestazioni di bisogni d'informazione, l'analisi prosegue con la definizione di una serie di categorie di bisogni d'informazione (o problemi) legati alla traduzione e incentrati sul singolo utente al fine di includere quanti più scenari di ricerca possibile.
L'assunto iniziale in base al quale lingue diverse manifesterebbero problemi diversi non è stato verificato empiricamente in quanto le 20 coppie di lingue esaminate hanno mostrato comportamenti alquanto similari nei diversi livelli di analisi. Vista la peculiarità dei dati utilizzati e la specificità dell'Unione europea come contesto di ricerca, non è stato possibile ottenere conclusioni definitive in merito al ruolo delle famiglie linguistiche quali indicatori di problemi, rispetto ad altri criteri di classificazione. L'analisi dell'unità problematica ha evidenziato le entità denominate (Named Entities) quale possibile oggetto di futuri progetti di ricerca nell'ambito delle tecnologie della traduzione. Oltre a offrire un contributo per i futuri sviluppi nell'ambito dei supporti informatici alla traduzione, con il presente studio si è voluto altresì presentare i log delle ricerche (di concordanze) quale tipologia aggiuntiva di dati per lo studio del processo di traduzione e per la triangolazione dei risultati empirico-sperimentali, cercando anche di suggerire possibili tratti migliorativi dei software di concordanza sulla base dei bisogni di informazione riscontrati nei traduttori.XXV Ciclo198
Collaboration in Designing a Pedagogical Approach in Information Literacy
This Open Access book combines expertise in information literacy with expertise in education and teaching to share tips and tricks for the development of good information literacy teaching and training in universities and libraries. It draws on research, knowledge and pedagogical practice from academia, to teach students how to sift through information to be able to distinguish the important and correct from the unusable. It discusses basic concepts and models of information literacy, as well as strategies for accessing, locating and retrieving information and methods suitable for the assessment and management of information. The book explains many concepts connected to information literacy and discusses pedagogical issues with a view to supporting the practitioner. Each chapter examines one aspect of information literacy, discusses the pedagogical challenges involved and provides suggestions for best practice
On the Mono- and Cross-Language Detection of Text Re-Use and Plagiarism
Barrón Cedeño, LA. (2012). On the Mono- and Cross-Language Detection of Text Re-Use and Plagiarism [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/16012Palanci