230 research outputs found
Productivity Measurement of Call Centre Agents using a Multimodal Classification Approach
Call centre channels play a cornerstone role in business communications and transactions, especially in challenging business situations. Operations’ efficiency, service quality, and resource productivity are core aspects of call centres’ competitive advantage in rapid market competition. Performance evaluation in call centres is challenging due to human subjective evaluation, manual assortment to massive calls, and inequality in evaluations because of different raters. These challenges impact these operations' efficiency and lead to frustrated customers. This study aims to automate performance evaluation in call centres using various deep learning approaches. Calls recorded in a call centre are modelled and classified into high- or low-performance evaluations categorised as productive or nonproductive calls.
The proposed conceptual model considers a deep learning network approach to model the recorded calls as text and speech. It is based on the following: 1) focus on the technical part of agent performance, 2) objective evaluation of the corpus, 3) extension of features for both text and speech, and 4) combination of the best accuracy from text and speech data using a multimodal structure. Accordingly, the diarisation algorithm extracts that part of the call where the agent is talking from which the customer is doing so. Manual annotation is also necessary to divide the modelling corpus into productive and nonproductive (supervised training). Krippendorff’s alpha was applied to avoid subjectivity in the manual annotation. Arabic speech recognition is then developed to transcribe the speech into text. The text features are the words embedded using the embedding layer. The speech features make several attempts to use the Mel Frequency Cepstral Coefficient (MFCC) upgraded with Low-Level Descriptors (LLD) to improve classification accuracy. The data modelling architectures for speech and text are based on CNNs, BiLSTMs, and the attention layer. The multimodal approach follows the generated models to improve performance accuracy by concatenating the text and speech models using the joint representation methodology.
The main contributions of this thesis are:
• Developing an Arabic Speech recognition method for automatic transcription of speech into text.
• Drawing several DNN architectures to improve performance evaluation using speech features based on MFCC and LLD.
• Developing a Max Weight Similarity (MWS) function to outperform the SoftMax function used in the attention layer.
• Proposing a multimodal approach for combining the text and speech models for best performance evaluation
Exploiting Cross-Lingual Representations For Natural Language Processing
Traditional approaches to supervised learning require a generous amount of labeled data for good generalization. While such annotation-heavy approaches have proven useful for some Natural Language Processing (NLP) tasks in high-resource languages (like English), they are unlikely to scale to languages where collecting labeled data is di cult and time-consuming. Translating supervision available in English is also not a viable solution, because developing a good machine translation system requires expensive to annotate resources which are not available for most languages.
In this thesis, I argue that cross-lingual representations are an effective means of extending NLP tools to languages beyond English without resorting to generous amounts of annotated data or expensive machine translation. These representations can be learned in an inexpensive manner, often from signals completely unrelated to the task of interest. I begin with a review of different ways of inducing such representations using a variety of cross-lingual signals and study algorithmic approaches of using them in a diverse set of downstream tasks. Examples of such tasks covered in this thesis include learning representations to transfer a trained model across languages for document classification, assist in monolingual lexical semantics like word sense induction, identify asymmetric lexical relationships like hypernymy between words in different languages, or combining supervision across languages through a shared feature space for cross-lingual entity linking. In all these applications, the representations make information expressed in other languages available in English, while requiring minimal additional supervision in the language of interest
A Survey of Word Reordering in Statistical Machine Translation: Computational Models and Language Phenomena
Word reordering is one of the most difficult aspects of statistical machine
translation (SMT), and an important factor of its quality and efficiency.
Despite the vast amount of research published to date, the interest of the
community in this problem has not decreased, and no single method appears to be
strongly dominant across language pairs. Instead, the choice of the optimal
approach for a new translation task still seems to be mostly driven by
empirical trials. To orientate the reader in this vast and complex research
area, we present a comprehensive survey of word reordering viewed as a
statistical modeling challenge and as a natural language phenomenon. The survey
describes in detail how word reordering is modeled within different
string-based and tree-based SMT frameworks and as a stand-alone task, including
systematic overviews of the literature in advanced reordering modeling. We then
question why some approaches are more successful than others in different
language pairs. We argue that, besides measuring the amount of reordering, it
is important to understand which kinds of reordering occur in a given language
pair. To this end, we conduct a qualitative analysis of word reordering
phenomena in a diverse sample of language pairs, based on a large collection of
linguistic knowledge. Empirical results in the SMT literature are shown to
support the hypothesis that a few linguistic facts can be very useful to
anticipate the reordering characteristics of a language pair and to select the
SMT framework that best suits them.Comment: 44 pages, to appear in Computational Linguistic
Cross-view Embeddings for Information Retrieval
In this dissertation, we deal with the cross-view tasks related to information retrieval
using embedding methods. We study existing methodologies and propose new methods to overcome their limitations. We formally introduce the concept of mixed-script
IR, which deals with the challenges faced by an IR system when a language is written
in different scripts because of various technological and sociological factors. Mixed-script terms are represented by a small and finite feature space comprised of character
n-grams. We propose the cross-view autoencoder (CAE) to model such terms in an
abstract space and CAE provides the state-of-the-art performance.
We study a wide variety of models for cross-language information retrieval (CLIR)
and propose a model based on compositional neural networks (XCNN) which overcomes the limitations of the existing methods and achieves the best results for many
CLIR tasks such as ad-hoc retrieval, parallel sentence retrieval and cross-language
plagiarism detection. We empirically test the proposed models for these tasks on
publicly available datasets and present the results with analyses.
In this dissertation, we also explore an effective method to incorporate contextual
similarity for lexical selection in machine translation. Concretely, we investigate a
feature based on context available in source sentence calculated using deep autoencoders. The proposed feature exhibits statistically significant improvements over the
strong baselines for English-to-Spanish and English-to-Hindi translation tasks.
Finally, we explore the the methods to evaluate the quality of autoencoder generated representations of text data and analyse its architectural properties. For this,
we propose two metrics based on reconstruction capabilities of the autoencoders:
structure preservation index (SPI) and similarity accumulation index (SAI). We also
introduce a concept of critical bottleneck dimensionality (CBD) below which the
structural information is lost and present analyses linking CBD and language perplexity.En esta disertación estudiamos problemas de vistas-múltiples relacionados con la recuperación de información utilizando técnicas de representación en espacios de baja dimensionalidad. Estudiamos las técnicas existentes y proponemos nuevas técnicas para solventar algunas de las limitaciones existentes. Presentamos formalmente el concepto de recuperación de información con escritura mixta, el cual trata las dificultades de los sistemas de recuperación de información cuando los textos contienen escrituras en distintos alfabetos debido a razones tecnológicas y socioculturales. Las palabras en escritura mixta son representadas en un espacio de caracterÃsticas finito y reducido, compuesto por n-gramas de caracteres. Proponemos los auto-codificadores de vistas-múltiples (CAE, por sus siglas en inglés) para modelar dichas palabras en un espacio abstracto, y esta técnica produce resultados de vanguardia.
En este sentido, estudiamos varios modelos para la recuperación de información entre lenguas diferentes (CLIR, por sus siglas en inglés) y proponemos un modelo basado en redes neuronales composicionales (XCNN, por sus siglas en inglés), el cual supera las limitaciones de los métodos existentes. El método de XCNN propuesto produce mejores resultados en diferentes tareas de CLIR tales como la recuperación de información ad-hoc, la identificación de oraciones equivalentes en lenguas distintas y la detección de plagio entre lenguas diferentes. Para tal efecto, realizamos pruebas experimentales para dichas tareas sobre conjuntos de datos disponibles públicamente, presentando los resultados y análisis correspondientes.
En esta disertación, también exploramos un método eficiente para utilizar similitud semántica de contextos en el proceso de selección léxica en traducción automática. EspecÃficamente, proponemos caracterÃsticas extraÃdas de los contextos disponibles en las oraciones fuentes mediante el uso de auto-codificadores. El uso de las caracterÃsticas propuestas demuestra mejoras estadÃsticamente significativas sobre sistemas de traducción robustos para las tareas de traducción entre inglés y español, e inglés e hindú.
Finalmente, exploramos métodos para evaluar la calidad de las representaciones de datos de texto generadas por los auto-codificadores, a la vez que analizamos las propiedades de sus arquitecturas. Como resultado, proponemos dos nuevas métricas para cuantificar la calidad de las reconstrucciones generadas por los auto-codificadores: el Ãndice de preservación de estructura (SPI, por sus siglas en inglés) y el Ãndice de acumulación de similitud (SAI, por sus siglas en inglés). También presentamos el concepto de dimensión crÃtica de cuello de botella (CBD, por sus siglas en inglés), por debajo de la cual la información estructural se deteriora. Mostramos que, interesantemente, la CBD está relacionada con la perplejidad de la lengua.En aquesta dissertació estudiem els problemes de vistes-múltiples relacionats amb la recuperació d'informació utilitzant tècniques de representació en espais de baixa dimensionalitat. Estudiem les tècniques existents i en proposem unes de noves per solucionar algunes de les limitacions existents. Presentem formalment el concepte de recuperació d'informació amb escriptura mixta, el qual tracta les dificultats dels sistemes de recuperació d'informació quan els textos contenen escriptures en diferents alfabets per motius tecnològics i socioculturals. Les paraules en escriptura mixta són representades en un espai de caracterÃstiques finit i reduït, composat per n-grames de carà cters. Proposem els auto-codificadors de vistes-múltiples (CAE, per les seves sigles en anglès) per modelar aquestes paraules en un espai abstracte, i aquesta tècnica produeix resultats d'avantguarda.
En aquest sentit, estudiem diversos models per a la recuperació d'informació entre llengües diferents (CLIR , per les sevas sigles en anglès) i proposem un model basat en xarxes neuronals composicionals (XCNN, per les sevas sigles en anglès), el qual supera les limitacions dels mètodes existents. El mètode de XCNN proposat produeix millors resultats en diferents tasques de CLIR com ara la recuperació d'informació ad-hoc, la identificació d'oracions equivalents en llengües diferents, i la detecció de plagi entre llengües diferents. Per a tal efecte, realitzem proves experimentals per aquestes tasques sobre conjunts de dades disponibles públicament, presentant els resultats i anà lisis corresponents.
En aquesta dissertació, també explorem un mètode eficient per utilitzar similitud semà ntica de contextos en el procés de selecció lèxica en traducció automà tica. EspecÃficament, proposem caracterÃstiques extretes dels contextos disponibles a les oracions fonts mitjançant l'ús d'auto-codificadors. L'ús de les caracterÃstiques proposades demostra millores estadÃsticament significatives sobre sistemes de traducció robustos per a les tasques de traducció entre anglès i espanyol, i anglès i hindú.
Finalment, explorem mètodes per avaluar la qualitat de les representacions de dades de text generades pels auto-codificadors, alhora que analitzem les propietats de les seves arquitectures. Com a resultat, proposem dues noves mètriques per quantificar la qualitat de les reconstruccions generades pels auto-codificadors: l'Ãndex de preservació d'estructura (SCI, per les seves sigles en anglès) i l'Ãndex d'acumulació de similitud (SAI, per les seves sigles en anglès). També presentem el concepte de dimensió crÃtica de coll d'ampolla (CBD, per les seves sigles en anglès), per sota de la qual la informació estructural es deteriora. Mostrem que, de manera interessant, la CBD està relacionada amb la perplexitat de la llengua.Gupta, PA. (2017). Cross-view Embeddings for Information Retrieval [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/78457TESI
Generative Non-Markov Models for Information Extraction
Learning from unlabeled data is a long-standing challenge in machine learning. A
principled solution involves modeling the full joint distribution over inputs
and the latent structure of interest, and imputing the missing data via
marginalization. Unfortunately, such marginalization is expensive for most
non-trivial problems, which places practical limits on the expressiveness of
generative models. As a result, joint models often encode strict assumptions
about the underlying process such as fixed-order Markovian assumptions and
employ simple count-based features of the inputs. In contrast, conditional
models, which do not directly model the observed data, are free to incorporate
rich overlapping features of the input in order to predict the latent structure
of interest. It would be desirable to develop expressive generative models that
retain tractable inference. This is the topic of this thesis. In particular, we
explore joint models which relax fixed-order Markov assumptions, and investigate
the use of recurrent neural networks for automatic feature induction in the
generative process.
We focus on two structured prediction problems: (1) imputing labeled segmentions
of input character sequences, and (2) imputing directed spanning trees relating
strings in text corpora. These problems arise in many applications of practical
interest, but we are primarily concerned with named-entity recognition and
cross-document coreference resolution in this work.
For named-entity recognition, we propose a generative model in which the
observed characters originate from a latent non-Markov process over words, and
where the characters are themselves produced via a non-Markov process: a
recurrent neural network (RNN). We propose a sampler for the proposed model in
which sequential Monte Carlo is used as a transition kernel for a Gibbs sampler.
The kernel is amenable to a fast parallel implementation, and results in fast
mixing in practice.
For cross-document coreference resolution, we move beyond sequence modeling to
consider string-to-string transduction. We stipulate a generative process for a
corpus of documents in which entity names arise from copying---and optionally
transforming---previous names of the same entity. Our proposed model is
sensitive to both the context in which the names occur as well as their
spelling. The string-to-string transformations correspond to systematic
linguistic processes such as abbreviation, typos, and nicknaming, and by analogy
to biology, we think of them as mutations along the edges of a phylogeny. We
propose a novel block Gibbs sampler for this problem that alternates between
sampling an ordering of the mentions and a spanning tree relating all mentions
in the corpus
- …