6 research outputs found

    SzövegalapĂș nyelvi elemzƑ kiĂ©rtĂ©kelĂ©se gĂ©pi beszĂ©dfelismerƑ hibĂĄkkal terhelt kimenetĂ©n

    Get PDF
    A cikkĂŒnkben felvĂĄzolt vizsgĂĄlat fĂłkuszĂĄban az ĂĄll, hogy kiderĂŒljön, milyen mĂ©rtĂ©kƱ szintaktikai elemzĂ©st kĂ©pes vĂ©grehajtani a „magyarlĂĄnc” nyelvi elemzƑ a beszĂ©dfelismerƑ ĂĄltal kibocsĂĄjtott, hibĂĄkkal terhelt szövegeken, Ă©s ez az elemzĂ©s mennyiben „hasonlĂ­t” a hibĂĄtlan referenciaszövegen futtatotthoz, illetve azonosĂ­thatĂł-e az elemzĂ©snek olyan szintje, rĂ©szeredmĂ©nye, amely nagyban korrelĂĄl a hibĂĄtlan szövegĂ©vel. A feladathoz egy hĂ­radĂłs adatbĂĄzis 535 mondatbĂłl ĂĄllĂł rĂ©szhalmazĂĄt hasznĂĄltuk fel. Ezen a „magyarlĂĄnc” nyelvi elemzƑvel szintaktikai elemzĂ©st hajtottunk vĂ©gre, mely meghatĂĄrozta a mondatokra a szĂłfaji Ă©s fĂŒggƑsĂ©gi cĂ­mkĂ©ket. Ezt követƑen a szintaktikai / szemantikai elemzĂ©sek elemi rĂ©szekre (szavakra) törtĂ©nƑ azonosĂ­tĂĄsa Ă©s felbontĂĄsa következett, majd az ezek halmaza felett megvalĂłsĂ­tott bag of words reprezentĂĄciĂł vizsgĂĄlata, melyet a korrelĂĄciĂł, hasonlĂłsĂĄg mĂ©rĂ©sĂ©re hasznĂĄltuk fel. TovĂĄbbi összehasonlĂ­tĂĄs törtĂ©nt a kinyert szĂłfaji Ă©s dependencia tagek tĂĄvolsĂĄgszĂĄmĂ­tĂĄsĂĄval is, a szĂłhibaarĂĄny szĂĄmĂ­tĂĄsĂĄval analĂłg mĂłdon. Az eredmĂ©nyek alapjĂĄn elmondhatĂł, a beszĂ©d-szöveg ĂĄtalakĂ­tĂĄssal nyert szövegeken vĂ©gzett elemzĂ©s nagyban korrelĂĄl a hibĂĄktĂłl mentes referenciaĂĄtiraton vĂ©gzettel

    NagyszĂłtĂĄras beszĂ©dfelismerĂ©s morfĂ©maalapĂș rekurrens nyelvi modell hasznĂĄlatĂĄval

    Get PDF
    A klasszikus beszĂ©dfelismerƑ rendszerek szĂĄmĂĄra hatalmas kihĂ­vĂĄst jelentenek az agglutinĂĄlĂł nyelvek, hiszen pontos eredmĂ©nyek elĂ©rĂ©sĂ©hez hatalmas szĂłtĂĄrakra van szĂŒksĂ©g a ragozĂĄs Ă©s a szóösszetĂ©tel miatt. A problĂ©ma fƑleg a nyelvi modell rĂ©szĂ©t Ă©rinti a felismerƑnek, tekintve, hogy tĂșl nagy szĂłtĂĄrmĂ©ret esetĂ©n a tanulĂĄsi fĂĄzis rendkĂ­vĂŒl nehĂ©z, ez pedig szuboptimĂĄlis modellhez vezethet. Ezen problĂ©mĂĄra megoldĂĄst jelenthet, ha szavak helyett azoknĂĄl kisebb egysĂ©get, morfĂ©mĂĄkat hasznĂĄlunk a nyelvi modellezĂ©s sorĂĄn. A cikkben bemutatĂĄsra kerĂŒl egy morfĂ©maalapĂș, rekurrens neuronhĂĄlĂłs nyelvi modellt alkalmazĂł beszĂ©dfelismerƑ, amely hasznĂĄlatĂĄval szignifikĂĄnsan jobb eredmĂ©nyeket tudtunk elĂ©rni egy magyar nyelvƱ beszĂ©dkorpuszon mint a hagyomĂĄnyos szĂłszintƱ megközelĂ­tĂ©ssel

    Magyar nyelvƱ, Ă©lƑ közĂ©leti- Ă©s hĂ­rmƱsorok gĂ©pi feliratozĂĄsa

    Get PDF
    CikkĂŒnkben egy valĂłs idejƱ, kis erƑforrĂĄs-igĂ©nyƱ gĂ©pi beszĂ©d-szöveg ĂĄtalakĂ­tĂł rendszert mutatunk be, melyet elsƑsorban televĂ­ziĂłs közĂ©leti tĂĄrsalgĂĄsi beszĂ©d feliratozĂĄsĂĄra fejlesztettĂŒnk ki. MegoldĂĄsunkat összevetjĂŒk a tĂ©materĂŒleten legelterjedtebben hasznĂĄlt nyĂ­lt forrĂĄskĂłdĂș keretrendszer, a Kaldi dekĂłderĂ©vel is. Ezen felĂŒl kĂŒlönbözƑ adatbĂĄzis-mĂ©retek mellett Ă©s ĂșjrabeszĂ©lĂ©s alkalmazĂĄsĂĄval is vĂ©gzĂŒnk felismerĂ©si kĂ­sĂ©rleteket. KĂ­sĂ©rleti rendszerĂŒnkkel, mely egy több mint 70 milliĂł szĂłt tartalmazĂł szövegkorpuszon Ă©s egy közel 500 ĂłrĂĄs beszĂ©dadatbĂĄzison lett tanĂ­tva sikerĂŒlt az eddig publikĂĄlt legalacsonyabb szĂłhibaarĂĄnyt elĂ©rnĂŒnk magyar nyelvƱ, televĂ­ziĂłs hĂ­radĂłk Ă©s közĂ©leti tĂĄrsalgĂĄsi beszĂ©d tĂ©makörĂ©n

    Morphologically motivated word classes for very large vocabulary speech recognition of Finnish and Estonian

    Get PDF
    We study class-based n-gram and neural network language models for very large vocabulary speech recognition of two morphologically rich languages: Finnish and Estonian. Due to morphological processes such as derivation, inflection and compounding, the models need to be trained with vocabulary sizes of several millions of word types. Class-based language modelling is in this case a powerful approach to alleviate the data sparsity and reduce the computational load. For a very large vocabulary, bigram statistics may not be an optimal way to derive the classes. We thus study utilizing the output of a morphological analyzer to achieve efficient word classes. We show that efficient classes can be learned by refining the morphological classes to smaller equivalence classes using merging, splitting and exchange procedures with suitable constraints. This type of classification can improve the results, particularly when language model training data is not very large. We also extend the previous analyses by rescoring the hypotheses obtained from a very large vocabulary recognizer using class-based neural network language models. We show that despite the fixed vocabulary, carefully constructed classes for word-based language models can in some cases result in lower error rates than subword-based unlimited vocabulary language models.We study class-based n-gram and neural network language models for very large vocabulary speech recognition of two morphologically rich languages: Finnish and Estonian. Due to morphological processes such as derivation, inflection and compounding, the models need to be trained with vocabulary sizes of several millions of word types. Class-based language modelling is in this case a powerful approach to alleviate the data sparsity and reduce the computational load. For a very large vocabulary, bigram statistics may not be an optimal way to derive the classes. We thus study utilizing the output of a morphological analyzer to achieve efficient word classes. We show that efficient classes can be learned by refining the morphological classes to smaller equivalence classes using merging, splitting and exchange procedures with suitable constraints. This type of classification can improve the results, particularly when language model training data is not very large. We also extend the previous analyses by rescoring the hypotheses obtained from a very large vocabulary recognizer using class-based neural network language models. We show that despite the fixed vocabulary, carefully constructed classes for word-based language models can in some cases result in lower error rates than subword-based unlimited vocabulary language models.Peer reviewe

    XII. Magyar Szåmítógépes Nyelvészeti Konferencia

    Get PDF

    XVI. Magyar Szåmítógépes Nyelvészeti Konferencia

    Get PDF
    corecore