437 research outputs found
Real-time Regular Expression Matching
This paper is devoted to finite state automata, regular expression matching,
pattern recognition, and the exponential blow-up problem, which is the growing
complexity of automata exponentially depending on regular expression length.
This paper presents a theoretical and hardware solution to the exponential
blow-up problem for some complicated classes of regular languages, which caused
severe limitations in Network Intrusion Detection Systems work. The article
supports the solution with theorems on correctness and complexity.Comment: 17 pages, 11 figure
Happy-GLL: modular, reusable and complete top-down parsers for parameterized nonterminals
Parser generators and parser combinator libraries are the most popular tools
for producing parsers. Parser combinators use the host language to provide
reusable components in the form of higher-order functions with parsers as
parameters. Very few parser generators support this kind of reuse through
abstraction and even fewer generate parsers that are as modular and reusable as
the parts of the grammar for which they are produced. This paper presents a
strategy for generating modular, reusable and complete top-down parsers from
syntax descriptions with parameterized nonterminals, based on the FUN-GLL
variant of the GLL algorithm.
The strategy is discussed and demonstrated as a novel back-end for the Happy
parser generator. Happy grammars can contain `parameterized nonterminals' in
which parameters abstract over grammar symbols, granting an abstraction
mechanism to define reusable grammar operators. However, the existing Happy
back-ends do not deliver on the full potential of parameterized nonterminals as
parameterized nonterminals cannot be reused across grammars. Moreover, the
parser generation process may fail to terminate or may result in exponentially
large parsers generated in an exponential amount of time.
The GLL back-end presented in this paper implements parameterized
nonterminals successfully by generating higher-order functions that resemble
parser combinators, inheriting all the advantages of top-down parsing. The
back-end is capable of generating parsers for the full class of context-free
grammars, generates parsers in linear time and generates parsers that find all
derivations of the input string. To our knowledge, the presented GLL back-end
makes Happy the first parser generator that combines all these features.
This paper describes the translation procedure of the GLL back-end and
compares it to the LALR and GLR back-ends of Happy in several experiments.Comment: 15 page
Large Language Models
Artificial intelligence is making spectacular progress, and one of the best
examples is the development of large language models (LLMs) such as OpenAI's
GPT series. In these lectures, written for readers with a background in
mathematics or physics, we give a brief history and survey of the state of the
art, and describe the underlying transformer architecture in detail. We then
explore some current ideas on how LLMs work and how models trained to predict
the next word in a text are able to perform other tasks displaying
intelligence.Comment: 46 page
University of Windsor Undergraduate Calendar 2023 Spring
https://scholar.uwindsor.ca/universitywindsorundergraduatecalendars/1023/thumbnail.jp
CLARIN
The book provides a comprehensive overview of the Common Language Resources and Technology Infrastructure – CLARIN – for the humanities. It covers a broad range of CLARIN language resources and services, its underlying technological infrastructure, the achievements of national consortia, and challenges that CLARIN will tackle in the future. The book is published 10 years after establishing CLARIN as an Europ. Research Infrastructure Consortium
University of Windsor Undergraduate Calendar 2023 Winter
https://scholar.uwindsor.ca/universitywindsorundergraduatecalendars/1020/thumbnail.jp
GREC: Multi-domain Speech Recognition for the Greek Language
Μία από τις κορυφαίες προκλήσεις στην Αυτόματη Αναγνώριση Ομιλίας είναι η ανάπτυξη ικανών συστημάτων που μπορούν να έχουν ισχυρή απόδοση μέσα από διαφορετικές συνθήκες ηχογράφησης. Στο παρόν έργο κατασκευάζουμε και αναλύουμε το GREC, μία μεγάλη πολυτομεακή συλλογή δεδομένων για αυτόματη αναγνώριση ομιλίας στην ελληνική γλώσσα. Το GREC αποτελείται από τρεις βάσεις δεδομένων στους θεματικούς τομείς των «εκπομπών ειδήσεων», «ομιλίας από δωρισμένες εγγραφές φωνής», «ηχητικών βιβλίων» και μιας νέας συλλογής δεδομένων στον τομέα των «πολιτικών ομιλιών». Για τη δημιουργία του τελευταίου, συγκεντρώνουμε δεδομένα ομιλίας από ηχογραφήσεις των επίσημων συνεδριάσεων της Βουλής των Ελλήνων, αποδίδοντας ένα σύνολο δεδομένων που αποτελείται από 120 ώρες ομιλίας πολιτικού περιεχομένου. Περιγράφουμε με λεπτομέρεια την καινούρια συλλογή δεδομένων, την προεπεξεργασία και την ευθυγράμμιση ομιλίας, τα οποία βασίζονται στο εργαλείο ανοιχτού λογισμικού Kaldi. Επιπλέον, αξιολογούμε την απόδοση των μοντέλων Gaussian Mixture (GMM) - Hidden Markov (HMM) και Deep Neural Network (DNN) - HMM όταν εφαρμόζονται σε δεδομένα από διαφορετικούς τομείς. Τέλος, προσθέτουμε τη δυνατότητα αυτόματης δεικτοδότησης ομιλητών στο Kaldi-gRPC-Server, ενός εργαλείου γραμμένο σε Python που βασίζεται στο PyKaldi και στο gRPC για βελτιωμένη ανάπτυξη μοντέλων αυτόματης αναγνώρισης ομιλίας.One of the leading challenges in Automatic Speech Recognition (ASR) is the development of robust systems that can perform well under multiple settings. In this work we construct and analyze GREC, a large, multi-domain corpus for automatic speech recognition for the Greek language. GREC is a collection of three available subcorpora over the domains of “news casts”, “crowd-sourced speech”, “audiobooks”, and a new corpus in the domain of “public speeches”. For the creation of the latter, HParl, we collect speech data from recordings of the official proceedings of the Hellenic Parliament, yielding, a dataset which consists of 120 hours of political speech segments. We describe our data collection, pre-processing and alignment setup, which are based on Kaldi toolkit. Furthermore, we perform extensive ablations on the recognition performance of Gaussian Mixture (GMM) - Hidden Markov (HMM) models and Deep Neural Network (DNN) - HMM models over the different domains. Finally, we integrate speaker diarization features to Kaldi-gRPC-Server, a modern, pythonic tool based on PyKaldi and gRPC for streamlined deployment of Kaldi based speech recognition
2001 July, University of Memphis bulletin
Vol. 88 of the University of Memphis bulletin containing the graduate catalog for 2001-2003.https://digitalcommons.memphis.edu/speccoll-ua-pub-bulletins/1189/thumbnail.jp
- …