11 research outputs found

    Comparison and Adaptation of Automatic Evaluation Metrics for Quality Assessment of Re-Speaking

    Get PDF
    Re-speaking is a mechanism for obtaining high quality subtitles for use in live broadcast and other public events. Because it relies on humans performing the actual re-speaking, the task of estimating the quality of the results is non-trivial. Most organisations rely on humans to perform the actual quality assessment, but purely automatic methods have been developed for other similar problems, like Machine Translation. This paper will try to compare several of these methods: BLEU, EBLEU, NIST, METEOR, METEOR-PL, TER and RIBES. These will then be matched to the human-derived NER metric, commonly used in re-speaking.Comment: Comparison and Adaptation of Automatic Evaluation Metrics for Quality Assessment of Re-Speaking. arXiv admin note: text overlap with arXiv:1509.0908

    Interpreter identification in the Polish Interpreting Corpus

    Get PDF
    This paper describes automated identification of interpreter voices in the Polish Interpreting Corpus (PINC). After collecting a set of voice samples of interpreters, a deep neural network model was used to match all the utterances from the corpus with specific individuals. The final result is very accurate and provides a considerable saving of time and accuracy off human judgment.Aquest article descriu la identificació automatitzada de veus d'intèrprets al Corpus d'Intèrprets Polonès (Polish Interpreting Corpus, PINC). Després de recollir un conjunt de mostres de veu de diversos intèrprets, s'ha utilitzat un model de xarxa neuronal profunda per fer coincidir les mostres de parla del corpus amb les de cada individu. El resultat final és molt precís i proporciona un estalvi considerable de temps i de precisió en la interpretació humana.Este artículo describe la identificación automática de voces de intérpretes en el Corpus Polaco de Interpretación. Tras recopilar una serie de muestras de voces de intérpretes, se utilizó un modelo de red neuronal profunda para asociar todas las elocuciones del corpus con individuos específicos. El resultado final es muy acertado, lo cual implica un ahorro considerable de tiempo y análisis humano

    Pre-trained Deep Neural Network using Sparse Autoencoders and Scattering Wavelet Transform for Musical Genre Recognition

    Get PDF
    Research described in this paper tries to combine the approach of Deep Neural Networks (DNN) with the novel audio features extracted using the Scattering Wavelet Transform (SWT) for classifying musical genres. The SWT uses a sequence of Wavelet Transforms to compute the modulation spectrum coefficients of multiple orders, which has already shown to be promising for this task. The DNN in this work uses pre-trained layers using Sparse Autoencoders (SAE). Data obtained from the Creative Commons website jamendo.com is used to boost the well-known GTZAN database, which is a standard benchmark for this task. The final classifier is tested using a 10-fold cross validation to achieve results similar to other state-of-the-art approaches

    Comparison and Adaptation of Automatic Evaluation Metrics for Quality Assessment of Re-Speaking

    Get PDF
    Re-speaking is a mechanism for obtaining high quality subtitles for use in livebroadcast and other public events. Because it relies on humans performing theactual re-speaking, the task of estimating the quality of the results is non-trivial.Most organisations rely on humans to perform the actual quality assessment,but purely automatic methods have been developed for other similar problems,like Machine Translation. This paper will try to compare several of thesemethods: BLEU, EBLEU, NIST, METEOR, METEOR-PL, TER and RIBES.These will then be matched to the human-derived NER metric, commonly usedin re-speaking

    EP-Poland : building a bilingual parallel corpus for interpreting research

    Get PDF
    This paper reports on the process of building the EP-Poland corpus and on the first empirical applications thereof. This extensive bidirectional English-Polish corpus of original parliamentary contributions paired with professional simultaneous interpretations includes 11 European Parliament debates held between January 2016 and February 2020. The main topic of these debates is the rule of law crisis triggered by the Law and Justice government in Poland. The corpus contains over 157,000 tokens and about 20 h 45 min of recordings, counting both source and target texts. The two interpreting directions (English-Polish and Polish-English) are represented almost evenly. The annotation of the corpus completed so far includes mark-up information, POS tagging, labelling disfluency phenomena, and all forms of explicitating shifts. Manual annotation for personal deixis is in progress. An additional interesting feature is the speaker identification performed employing the X-vector method, which allowed us to identify 36 interpreters. We begin with an overview of the existing interpreting corpora. Then we proceed to explain the design features of the EP-Poland and report on two completed empirical studies analysing idiosyncratic interpreting behaviour. We conclude by outlining future development pathways and offering some remarks on corpus significance and its limitations

    Aspekty zarządzania informacją w projektowaniu portalu głosowego

    No full text
    Niniejszy artykuł opisuje zasady projektowania Portalu Głosowego wykorzystywanego w informacji telefonicznej komunikacji miejskiej oraz wpływ Zarządzania Informacją (ang. Information Architecture, IA) podczas jego budowy. IA jest najczęściej omawiane w kontekście projektowania stron internetowych i aplikacji graficznych, ale wiele z jego aspektów można zastosować do innych zadań projektowych. W artykule tym zanalizowano, jak można wykorzystać IA w procesie tworzenia systemu, w którym jedynym sposobem interakcji z użytkownikiem jest mowa. Głównym zadaniem tego systemu jest przekazywanie użytkownikom informacji o komunikacji miejskiej przez telefon. Cel ten osiągnięto poprzez użycie systemu dialogowego wykorzystującego technologię rozpoznawania i syntezy mowy. Procedurę IA zastosowano w kilku fazach projektowania dialogu, zwracając szczególną uwagę na różne ograniczenia tego nietypowego podejścia do interakcji z użytkownikiem. Głównym celem jest zaprojektowanie systemu dostarczającego żądanych informacji w szybki i wygodny sposób. Jest to trudne z kilku powodów: kosztu i ograniczeń czasowych, braku doświadczenia użytkowników w interakcji z podobnymi systemami, braku wglądu w ogólną strukturę systemu, co wymusza na użytkowniku wizualizację struktury we własnym umyśle, treściwej prezentacji znaczącej ilości informacji. Typowych rozwiązań takich problemów, obejmujących nawigację, przeglądanie i wyszukiwanie, nie można bezpośrednio zastosować w takim medium, ale wszystkie mają własne, interesujące odpowiedniki

    ASR training dataset for Croatian ParlaSpeech-HR v1.0

    No full text
    The ParlaSpeech-HR dataset is built from parliamentary proceedings available in the Croatian part of the ParlaMint corpus and the parliamentary recordings available from the Croatian Parliament's YouTube channel. The corpus consists of segments 8-20 seconds in length. There are two transcripts available: the original one, and the one normalised via a simple rule-based normaliser. Each of the transcripts contains word-level alignments to the recordings. Each segment has a reference to the ParlaMint 2.1 corpus (http://hdl.handle.net/11356/1432) via utterance IDs. If a segment is based on a single utterance, speaker information for that segment is available as well. There is speaker information available for 381,849 segments, i.e., 95% of all segments. Speaker information consists of all the speaker information available from the ParlaMint 2.1 corpus (name, party, gender, age, status, role). There are all together 309 speakers in the dataset. The dataset is divided into a training, a development, and a testing subset. Development data consist of 500 segments coming from the 5 most frequent speakers, with the goal of not losing speaker variety on dev data. Test data consist of 513 segments that come from 3 male (258 segments) and 3 female speakers (255 segments). There are no segments coming from the 6 test speakers in the two remaining subsets. The 22,076 instances not having speaker information are not assigned to any of the three subsets. The remaining 380,836 instances form the training set

    PolEval 2019 : the next chapter in evaluating Natural Language Processing tools for Polish

    No full text
    PolEval is a SemEval-inspired evaluation campaign for natural language processing tools for Polish. Submitted tools compete againstone another within certain tasks selected by organizers, using available data and are evaluated according to pre-established procedures.It is organized since 2017 and each year the winning systems become the state-of-the-art in Polish language processing in the respectivetasks. In 2019 we have organized six different tasks, creating an even greater opportunity for NLP researchers to evaluate their systemsin an objective manner
    corecore