Search CORE

173 research outputs found

The InFile project: a crosslingual filtering systems evaluation campaign

Author: Besancon Romaric
Chaudiron Stéphane
Choukri Khalid
Mostefa Djamel
Timimi Ismaïl
Publication venue: HAL CCSD
Publication date: 28/05/2008
Field of study

International audienceThe InFile project (INformation, FILtering, Evaluation) is a cross-language adaptive filtering evaluation campaign, sponsored by the French National Research Agency. The campaign is organized by the CEA LIST, ELDA and the University of Lille3-GERiiCO. It has an international scope as it is a pilot track of the CLEF 2008 campaigns. The corpus is built from a collection of about 1,4 millions newswires (10 GB) in three languages, Arabic, English and French provided by Agence France Press (AFP) and selected from a 3 years period. The profiles corpus is made of 50 profiles from which 30 concern general news and events (national and international affairs, politics, sports...) and 20 concern scientific and technical subject

HAL-CEA

A road map for interoperable language resource metadata

Author: Calzolari Nicoletta
Choukri Khalid
Cieri Christopher
Ide Nancy
Langendoen D. Terence
Leveling Johannes
Palmer Martha
Pustejovsky James
Publication venue: European Language Resources Association
Publication date: 01/01/2010
Field of study

LRs remain expensive to create and thus rare relative to demand across languages and technology types. The accidental re-creation of an LR that already exists is a nearly unforgiveable waste of scarce resources that is unfortunately not so easy to avoid. The number of catalogs the HLT researcher must search, with their different formats, make it possible to overlook an existing resource. This paper sketches the sources of this problem and outlines a proposal to rectify along with a new vision of LR cataloging that will to facilitates the documentation and exploitation of a much wider range of LRs than previously considered

CiteSeerX

DCU Online Research Access Service

Software Defined Networking (SDN): Etat de L'art

Author: Bouragba Khalid
Choukri Ihssane
OUZZIF Mohammed
Publication venue: HAL CCSD
Publication date: 17/06/2019
Field of study

International audienceInternet a connu un énorme succès, Il est devenu un outil universel indispensable pour les entreprises et la plupart d’individus. Cependant, malgré leur adoption, les réseaux classiques sont complexes et difficiles à gérer. Une des raisons de cette difficulté réside dans l’architecture des réseaux actuels où le plan de contrôle et le plan de données sont intégrés verticalement dans chaque équipement réseau. SDN est un nouveau paradigme réseau, qui permet de simplifier la gestion et l’innovation dans le réseau, en séparant la logique de contrôle du réseau des équipements d’interconnexions ,en promouvant la centralisation du contrôle et la capacité de programmer le réseau. Dans cet article, nous présentons une vue générale sur SDN. Nous commençons par présenter SDN, son architecture, et ses interfaces de communications. Nous décrivons par la suite le protocole Openflow, son fonctionnement, et les principaux contrôleurs SDN. Nous examinons également les problèmes confrontées par SDN, en nous concentrant sur les principaux défis de plan de contrôle tels que la performance, la scalabilité, la sécurité, et la fiabilité, nous discutons ainsi, les solutions existantes afin de surmonter ces défis

The MGB-5 Challenge: Recognition and Dialect Identification of Dialectal Arabic Speech

Author: Abdelali Ahmed
Ali Ahmed
Choukri Khalid
Glass James
Mubarak Hamdy
Renals Steve
Samih Younes
Shon Suwon
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 20/02/2020
Field of study

Crossref

Edinburgh Research Explorer

Lessons Learned in ATCO2: 5000 hours of Air Traffic Control Communications for Robust Automatic Speech Recognition and Understanding

Author: Choukri Khalid
Khalil Driss
Lenders Vincent
Madikeri Srikanth
Motlicek Petr
Nigmatulina Iuliia
Prasad Amrutha
Rigault Mickael
Szoke Igor
Tart Allan
Zuluaga-Gomez Juan
Publication venue
Publication date: 01/05/2023
Field of study

Voice communication between air traffic controllers (ATCos) and pilots is critical for ensuring safe and efficient air traffic control (ATC). This task requires high levels of awareness from ATCos and can be tedious and error-prone. Recent attempts have been made to integrate artificial intelligence (AI) into ATC in order to reduce the workload of ATCos. However, the development of data-driven AI systems for ATC demands large-scale annotated datasets, which are currently lacking in the field. This paper explores the lessons learned from the ATCO2 project, a project that aimed to develop a unique platform to collect and preprocess large amounts of ATC data from airspace in real time. Audio and surveillance data were collected from publicly accessible radio frequency channels with VHF receivers owned by a community of volunteers and later uploaded to Opensky Network servers, which can be considered an "unlimited source" of data. In addition, this paper reviews previous work from ATCO2 partners, including (i) robust automatic speech recognition, (ii) natural language processing, (iii) English language identification of ATC communications, and (iv) the integration of surveillance data such as ADS-B. We believe that the pipeline developed during the ATCO2 project, along with the open-sourcing of its data, will encourage research in the ATC field. A sample of the ATCO2 corpus is available on the following website: https://www.atco2.org/data, while the full corpus can be purchased through ELDA at http://catalog.elra.info/en-us/repository/browse/ELRA-S0484. We demonstrated that ATCO2 is an appropriate dataset to develop ASR engines when little or near to no ATC in-domain data is available. For instance, with the CNN-TDNNf kaldi model, we reached the performance of as low as 17.9% and 24.9% WER on public ATC datasets which is 6.6/7.6% better than "out-of-domain" but supervised CNN-TDNNf model.Comment: Manuscript under revie

arXiv.org e-Print Archive

Final FLaReNet deliverable: Language Resources for the Future - The Future of Language Resources

Author: Bel N.
Calzolari N.
Choukri Khalid
LS OZ Taal en spraaktechnologie
Mariani J.
Monachini M.
Odijk J.E.J.M.
Piperidis S
Quochi V.
Soria C.
UiL OTS LLI
Publication venue
Publication date: 01/01/2011
Field of study

Language Technologies (LT), together with their backbone, Language Resources (LR), provide an essential support to the challenge of Multilingualism and ICT of the future. The main task of language technologies is to bridge language barriers and to help creating a new environment where information flows smoothly across frontiers and languages, no matter the country, and the language, of origin. To achieve this goal, all players involved need to act as a community able to join forces on a set of shared priorities. However, until now the field of Language Resources and Technology has long suffered from an excess of individuality and fragmentation, with a lack of coherence concerning the priorities for the field, the direction to move, not to mention a common timeframe. The context encountered by the FLaReNet project was thus represented by an active field needing a coherence that can only be given by sharing common priorities and endeavours. FLaReNet has contributed to the creation of this coherence by gathering a wide community of experts and making them participate in the definition of an exhaustive set of recommendations

PUblication MAnagement

Utrecht University Repository

The European Language Resources and Technologies Forum: Shaping the Future of the Multilingual Digital Europe

Author: Baroni Paola
Bel N?ria
Budin Gerhard
Calzolari Nicoletta
Choukri Khalid
Goggi Sara
Mariani Joseph
Monachini Monica
Odijk Jan
Piperidis Stelios
Quochi Valeria
Soria Claudia
Toral Antonio
Publication venue: Istituto di Linguistica Computazionale del CNR - Pisa, ITALY
Publication date
Field of study

Proceedings of the 1st FLaReNet Forum on the European Language Resources and Technologies, held in Vienna, at the Austrian Academy of Science, on 12-13 February 2009

PUblication MAnagement

The European language technology landscape in 2020: Language-centric and human-centric AI for cross-cultural communication in multilingual Europe

Author: Backfried Gerhard
Bontcheva Kalina
Choukri Khalid
De Smedt Koenraad
Gómez-Pérez José Manuel
Hajič Jan
Hegele Stefanie
Irgens Morten
Marheinecke Katrin
Piperidis Stelios
Prinz Christoph
Rehm Georg
Vasiļjevs Andrejs
Yvon François
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2020
Field of study

Multilingualism is a cultural cornerstone of Europe and firmly anchored in the European treaties including full language equality. However, language barriers impacting business, cross-lingual and cross-cultural communication are still omnipresent. Language Technologies (LTs) are a powerful means to break down these barriers. While the last decade has seen various initiatives that created a multitude of approaches and technologies tailored to Europe’s specific needs, there is still an immense level of fragmentation. At the same time, AI has become an increasingly important concept in the European Information and Communication Technology area. For a few years now, AI – including many opportunities, synergies but also misconceptions – has been overshadowing every other topic. We present an overview of the European LT landscape, describing funding programmes, activities, actions and challenges in the different countries with regard to LT, including the current state of play in industry and the LT market. We present a brief overview of the main LT-related activities on the EU level in the last ten years and develop strategic guidance with regard to four key dimensions.publishedVersio

University of Bergen

NORA - Norwegian Open Research Archives

ATCO2 corpus: A Large-Scale Dataset for Research on Automatic Speech Recognition and Natural Language Understanding of Air Traffic Control Communications

Author: Cevenini Claudia
Choukri Khalid
Kocour Martin
Kolčárek Pavel
Motlicek Petr
Nigmatulina Iuliia
Prasad Amrutha
Rigault Mickael
Sarfjoo Seyyed Saeed
Szöke Igor
Tart Allan
Veselý Karel
Zuluaga-Gomez Juan
Černocký Jan
Publication venue
Publication date: 08/11/2022
Field of study

Personal assistants, automatic speech recognizers and dialogue understanding systems are becoming more critical in our interconnected digital world. A clear example is air traffic control (ATC) communications. ATC aims at guiding aircraft and controlling the airspace in a safe and optimal manner. These voice-based dialogues are carried between an air traffic controller (ATCO) and pilots via very-high frequency radio channels. In order to incorporate these novel technologies into ATC (low-resource domain), large-scale annotated datasets are required to develop the data-driven AI systems. Two examples are automatic speech recognition (ASR) and natural language understanding (NLU). In this paper, we introduce the ATCO2 corpus, a dataset that aims at fostering research on the challenging ATC field, which has lagged behind due to lack of annotated data. The ATCO2 corpus covers 1) data collection and pre-processing, 2) pseudo-annotations of speech data, and 3) extraction of ATC-related named entities. The ATCO2 corpus is split into three subsets. 1) ATCO2-test-set corpus contains 4 hours of ATC speech with manual transcripts and a subset with gold annotations for named-entity recognition (callsign, command, value). 2) The ATCO2-PL-set corpus consists of 5281 hours of unlabeled ATC data enriched with automatic transcripts from an in-domain speech recognizer, contextual information, speaker turn information, signal-to-noise ratio estimate and English language detection score per sample. Both available for purchase through ELDA at http://catalog.elra.info/en-us/repository/browse/ELRA-S0484. 3) The ATCO2-test-set-1h corpus is a one-hour subset from the original test set corpus, that we are offering for free at https://www.atco2.org/data. We expect the ATCO2 corpus will foster research on robust ASR and NLU not only in the field of ATC communications but also in the general research community.Comment: Manuscript under review; The code will be available at https://github.com/idiap/atco2-corpu

arXiv.org e-Print Archive

ECP-2007-LANG-617001 FLaReNet: Action Plan

Author: Baroni Paola
Bel N?ria
Budin Gerhard
Calzolari Nicoletta
Caselli Tommaso
Choukri Khalid
Goggi Sara
Mariani Joseph
Monachini Monica
Odijk Jan
Piperidis Stelios
Quochi Valeria
Soria Claudia
Toral Antonio
Publication venue
Publication date
Field of study

Action plan of the FLaReNet project

PUblication MAnagement