47 research outputs found

    Recurrent neural network language model training with noise contrastive estimation for speech recognition

    Get PDF
    In recent years recurrent neural network language models (RNNLMs) have been successfully applied to a range of tasks including speech recognition. However, an important issue that limits the quantity of data used, and their possible application areas, is the computational cost in training. A significant part of this cost is associated with the softmax function at the output layer, as this requires a normalization term to be explicitly calculated. This impacts both the training and testing speed, especially when a large output vocabulary is used. To address this problem, noise contrastive estimation (NCE), is used in RNNLM training in this paper. It does not require the above normalization during both training and testing and is insensitive to the output layer size. On a large vocabulary conversational telephone speech recognition task, a doubling in training speed and 56 time speed up in test time evaluation were obtained.Xie Chen is supported by Toshiba Research Europe Ltd, Cambridge Research Lab. The research leading to these results was also supported by EPSRC grant EP/I031022/1 (Natural Speech Technology) and DARPA under the Broad Operational Language Translation (BOLT) and RATS programs. The paper does not necessarily reflect the position or the policy of US Government and no official endorsement should be inferred. The authos also would like to thanks Ashish Vaswani from USC for suggestions and discussion on training of NNLMs with NCE.This is the author accepted manuscript. The final version is available from IEEE via http://dx.doi.org/10.1109/ICASSP.2015.717900

    Vector representation of Internet domain names using Word embedding techniques

    Get PDF
    Word embeddings is a well-known set of techniques widely used in natural language processing ( NLP ). This thesis explores the use of word embeddings in a new scenario. A vector space model ( VSM) for Internet domain names ( DNS) is created by taking core ideas from NLP techniques and applying them to real anonymized DNS log queries from a large Internet Service Provider ( ISP) . The main goal is to find semantically similar domains only using information of DNS queries without any other knowledge about the content of those domains. A set of transformations through a detailed preprocessing pipeline with eight specific steps is defined to move the original problem to a problem in the NLP field. Once the preprocessing pipeline is applied and the DNS log files are transformed to a standard text corpus, we show that state-of-the-art techniques for word embeddings can be successfully applied in order to build what we called a DNS-VSM (a vector space model for Internet domain names). Different word embeddings techniques are evaluated in this work: Word2Vec (with Skip-Gram and CBOW architectures), App2Vec (with a CBOW architecture and adding time gaps between DNS queries), and FastText (which includes sub-word information). The obtained results are compared using various metrics from Information Retrieval theory and the quality of the learned vectors is validated with a third party source, namely, similar sites service offered by Alexa Internet, Inc2 . Due to intrinsic characteristics of domain names, we found that FastText is the best option for building a vector space model for DNS. Furthermore, its performance (considering the top 3 most similar learned vectors to each domain) is compared against two baseline methods: Random Guessing (returning randomly any domain name from the dataset) and Zero Rule (returning always the same most popular domains), outperforming both of them considerably. The results presented in this work can be useful in many engineering activities, with practical application in many areas. Some examples include websites recommendations based on similar sites, competitive analysis, identification of fraudulent or risky sites, parental-control systems, UX improvements (based on recommendations, spell correction, etc.), click-stream analysis, representation and clustering of users navigation profiles, optimization of cache systems in recursive DNS resolvers (among others). Finally, as a contribution to the research community a set of vectors of the DNS-VSM trained on a similar dataset to the one used in this thesis is released and made available for download through the github page in [1]. With this we hope that further work and research can be done using these vectors.La vectorización de palabras es un conjunto de técnicas bien conocidas y ampliamente usadas en el procesamiento del lenguaje natural ( PLN ). Esta tesis explora el uso de vectorización de palabras en un nuevo escenario. Un modelo de espacio vectorial ( VSM) para nombres de dominios de Internet ( DNS ) es creado tomando ideas fundamentales de PLN, l as cuales son aplicadas a consultas reales anonimizadas de logs de DNS de un gran proveedor de servicios de Internet ( ISP) . El objetivo principal es encontrar dominios relacionados semánticamente solamente usando información de consultas DNS sin ningún otro conocimiento sobre el contenido de esos dominios. Un conjunto de transformaciones a través de un detallado pipeline de preprocesamiento con ocho pasos específicos es definido para llevar el problema original a un problema en el campo de PLN. Una vez aplicado el pipeline de preprocesamiento y los logs de DNS son transformados a un corpus de texto estándar, se muestra que es posible utilizar con éxito técnicas del estado del arte respecto a vectorización de palabras para construir lo que denominamos un DNS-VSM (un modelo de espacio vectorial para nombres de dominio de Internet). Diferentes técnicas de vectorización de palabras son evaluadas en este trabajo: Word2Vec (con arquitectura Skip-Gram y CBOW) , App2Vec (con arquitectura CBOW y agregando intervalos de tiempo entre consultas DNS ), y FastText (incluyendo información a nivel de sub-palabra). Los resultados obtenidos se comparan usando varias métricas de la teoría de Recuperación de Información y la calidad de los vectores aprendidos es validada por una fuente externa, un servicio para obtener sitios similares ofrecido por Alexa Internet, Inc . Debido a características intrínsecas de los nombres de dominio, encontramos que FastText es la mejor opción para construir un modelo de espacio vectorial para DNS . Además, su performance es comparada contra dos métodos de línea base: Random Guessing (devolviendo cualquier nombre de dominio del dataset de forma aleatoria) y Zero Rule (devolviendo siempre los mismos dominios más populares), superando a ambos de manera considerable. Los resultados presentados en este trabajo pueden ser útiles en muchas actividades de ingeniería, con aplicación práctica en muchas áreas. Algunos ejemplos incluyen recomendaciones de sitios web, análisis competitivo, identificación de sitios riesgosos o fraudulentos, sistemas de control parental, mejoras de UX (basada en recomendaciones, corrección ortográfica, etc.), análisis de flujo de clics, representación y clustering de perfiles de navegación de usuarios, optimización de sistemas de cache en resolutores de DNS recursivos (entre otros). Por último, como contribución a la comunidad académica, un conjunto de vectores del DNS-VSM entrenado sobre un juego de datos similar al utilizado en esta tesis es liberado y hecho disponible para descarga a través de la página github en [1]. Con esto esperamos a que más trabajos e investigaciones puedan realizarse usando estos vectores

    Towards Efficient Hardware Acceleration of Deep Neural Networks on FPGA

    Get PDF
    Deep neural network (DNN) has achieved remarkable success in many applications because of its powerful capability for data processing. Their performance in computer vision have matched and in some areas even surpassed human capabilities. Deep neural networks can capture complex nonlinear features; however this ability comes at the cost of high computational and memory requirements. State-of-art networks require billions of arithmetic operations and millions of parameters. The brute-force computing model of DNN often requires extremely large hardware resources, introducing severe concerns on its scalability running on traditional von Neumann architecture. The well-known memory wall, and latency brought by the long-range connectivity and communication of DNN severely constrain the computation efficiency of DNN. The acceleration techniques of DNN, either software or hardware, often suffer from poor hardware execution efficiency of the simplified model (software), or inevitable accuracy degradation and limited supportable algorithms (hardware), respectively. In order to preserve the inference accuracy and make the hardware implementation in a more efficient form, a close investigation to the hardware/software co-design methodologies for DNNs is needed. The proposed work first presents an FPGA-based implementation framework for Recurrent Neural Network (RNN) acceleration. At architectural level, we improve the parallelism of RNN training scheme and reduce the computing resource requirement for computation efficiency enhancement. The hardware implementation primarily targets at reducing data communication load. Secondly, we propose a data locality-aware sparse matrix and vector multiplication (SpMV) kernel. At software level, we reorganize a large sparse matrix into many modest-sized blocks by adopting hypergraph-based partitioning and clustering. Available hardware constraints have been taken into consideration for the memory allocation and data access regularization. Thirdly, we present a holistic acceleration to sparse convolutional neural network (CNN). During network training, the data locality is regularized to ease the hardware mapping. The distributed architecture enables high computation parallelism and data reuse. The proposed research results in an hardware/software co-design methodology for fast and accurate DNN acceleration, through the innovations in algorithm optimization, hardware implementation, and the interactive design process across these two domains

    Tackling Sequence to Sequence Mapping Problems with Neural Networks

    Full text link
    In Natural Language Processing (NLP), it is important to detect the relationship between two sequences or to generate a sequence of tokens given another observed sequence. We call the type of problems on modelling sequence pairs as sequence to sequence (seq2seq) mapping problems. A lot of research has been devoted to finding ways of tackling these problems, with traditional approaches relying on a combination of hand-crafted features, alignment models, segmentation heuristics, and external linguistic resources. Although great progress has been made, these traditional approaches suffer from various drawbacks, such as complicated pipeline, laborious feature engineering, and the difficulty for domain adaptation. Recently, neural networks emerged as a promising solution to many problems in NLP, speech recognition, and computer vision. Neural models are powerful because they can be trained end to end, generalise well to unseen examples, and the same framework can be easily adapted to a new domain. The aim of this thesis is to advance the state-of-the-art in seq2seq mapping problems with neural networks. We explore solutions from three major aspects: investigating neural models for representing sequences, modelling interactions between sequences, and using unpaired data to boost the performance of neural models. For each aspect, we propose novel models and evaluate their efficacy on various tasks of seq2seq mapping.Comment: PhD thesi

    Automatic Speech Recognition for Low-resource Languages and Accents Using Multilingual and Crosslingual Information

    Get PDF
    This thesis explores methods to rapidly bootstrap automatic speech recognition systems for languages, which lack resources for speech and language processing. We focus on finding approaches which allow using data from multiple languages to improve the performance for those languages on different levels, such as feature extraction, acoustic modeling and language modeling. Under application aspects, this thesis also includes research work on non-native and Code-Switching speech

    Design and Implementation of a Domain Specific Language for Deep Learning

    Get PDF
    \textit {Deep Learning} (DL) has found great success in well-diversified areas such as machine vision, speech recognition, big data analysis, and multimedia understanding recently. However, the existing state-of-the-art DL frameworks, e.g. Caffe2, Theano, TensorFlow, MxNet, Torch7, and CNTK, are programming libraries with fixed user interfaces, internal representations, and execution environments. Modifying the code of DL layers or data structure is very challenging without in-depth understanding of the underlying implementation. The optimization of the code and execution in these tools is often limited and relies on the specific DL computation graph manipulation and scheduling that lack systematic and universal strategies. Furthermore, most of these tools demand many dependencies beside the tool itself and require to be built to some specific platforms for DL training or inference. \\\\ \noindent This dissertation presents {\it DeepDSL}, a \textit {domain specific language} (DSL) embedded in Scala, that compiles DL networks encoded with DeepDSL to efficient, compact, and portable Java source programs for DL training and inference. DeepDSL represents DL networks as abstract tensor functions, performs symbolic gradient derivations to generate the Intermediate Representation (IR), optimizes the IR expressions, and compiles the optimized IR expressions to cross-platform Java code that is easily modifiable and debuggable. Also, the code directly runs on GPU without additional dependencies except a small set of \textit{JNI} (Java Native Interface) wrappers for invoking the underneath GPU libraries. Moreover, DeepDSL provides static analysis for memory consumption and error detection. \\\\ \noindent DeepDSL\footnote{Our previous results are reported in~\cite{zhao2017}; design and implementation details are summarized in~\cite{Zhao2018}.} has been evaluated with many current state-of-the-art DL networks (e.g. Alexnet, GoogleNet, VGG, Overfeat, and Deep Residual Network). While the DSL code is highly compact with less than 100 lines for each of the network, the Java source code generated by the DeepDSL compiler is highly efficient. Our experiments show that the output java source has very competitive runtime performance and memory efficiency compared to the existing DL frameworks

    Ensembles of Text and Time-Series Models for Automatic Generation of Financial Trading Signals

    Get PDF
    Event Studies in Finance have focused on traditional news headlines to assess the impact an event has on a traded company. The increased proliferation of news and information produced by social media content has disrupted this trend. Although researchers have begun to identify trading opportunities from social media platforms, such as Twitter, almost all techniques use a general sentiment from large collections of tweets. Though useful, general sentiment does not provide an opportunity to indicate specific events worthy of affecting stock prices. This work presents an event clustering algorithm, utilizing natural language processing techniques to generate newsworthy events from Twitter, which have the potential to influence stock prices in the same manner as traditional news headlines. The event clustering method addresses the effects of pre-news and lagged-news, two peculiarities that appear when connecting trading and news, regardless of the medium. Pre-news signifies a finding where stock prices move in advance of a news release. Lagged-news refers to follow-up or late-arriving news, adding redundancy in making trading decisions. For events generated by the proposed clustering algorithm, we have designed and implemented novel language and time-series techniques -- incorporating Event Studies and Machine Learning to produce an actionable system that can guide trading decisions. Of the various methods considered, the emphasis was particularly on the state-of-the-art established methods versus modern Deep Learning techniques. The recommended prediction algorithms provide investing strategies with profitable risk-adjusted returns. The suggested language models present Annualized Sharpe Ratios (risk-adjusted returns) in the 5 to 11 range, while time-series models produce in the 2 to 3 range (without transaction costs). A close investigation of the distribution of returns confirms the encouraging Sharpe Ratios by identifying most outliers as significant positive gains. Additionally, Machine Learning metrics of precision, recall, and accuracy are discussed alongside financial metrics in hopes of bridging the gap between academia and industry in the field of Computational Finance
    corecore