11 research outputs found
Multi language Email Classification Using Transfer learning
In recent years Artificial Intelligence has become a core part of many businesses, from manufacturers to service providers, AI can be found helping to improve business processes as well as providing customized experiences and support for the customers. Natural language processing gives computers the ability to understand human language, recent breakthroughs in multilingual models bring us closer to overcome language barriers and achieve various tasks regardless of the language. This brings to companies the opportunity to process data and provide services to customers regardless of their language. In this dissertation, we review the progress of NLP towards multilingual text classification, our results suggest that using a machine translation to augment our corpora is a suitable approach to fine-tune multi-language models like XLM-Roberta, obtaing better results than zero-shot approaches. Our results also suggest that in domain pre-training can help to increase the performance of the classification for both monolingual and multi-language classifiersDissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business Intelligenc
Invernet: An Adversarial Attack Framework to Infer Downstream Context Distribution through Word Embedding Inversion
Word embedding has become a popular form of data representation that is used to train deepneural networks in many natural language processing tasks, such as machine translation, named entity recognition, information retrieval, etc. Through embedding, each word is represented as a dense vector which captures its semantic relationship with others, and can better empower machine learning models to achieve state-of-the-art performance. Due to the data and computation intensive nature of learning word embeddings from scratch, an affordable way is to borrow an existing general embedding trained on large-scale text corpora by third party (i.e., pre-training), and further specialize the embedding by training on downstream domain-specific dataset (i.e., fine-tuning). However, a privacy issue can rise during this process is that the adversarial parties who have the pre-train datasets may be able infer the key information such context distribution of downstream datasets by analyzing the fine-tuned embeddings. In this study, we aim to propose an effective way to infer the context distribution (i.e., the words co-occurrence in downstream corpora revealing particular domain information) in order to demonstrate the above-mentioned privacy concerns. Specifically, we propose a focused selection method along with a novel model inversion architecture “Invernet” to invert word embeddings into the word-to-word context information of the fine-tuned dataset. We consider the popular word2vec models including CBOW, SkipGram, and GloVe algorithms with various unsupervised settings. We conduct extensive experimental study on two real-world news datasets: Antonio Gulli’s News Dataset from Hugging Face repository and a New York Times dataset from both quantitative and qualitative perspectives. Results show that “Invernet” has been able to achieve an average F1 score of 0.70 and an average AUC score of 0.79 in an attack scenario. A concerning pattern from our experiments reveal that embedding models that are generally considered superior in different tasks tend to be more vulnerable to model inversion. Our results iiisuggest that a significant amount of context distribution information from the downstream dataset can potentially leak if an attacker gets access to the pretrained and fine-tuned word embeddings. As a result, attacks using “Invernet” can jeopardize the privacy of the users whose data might have been used to fine-tune the word embedding model
On Polysemy: A Philosophical, Psycholinguistic, and Computational Study
Most words in natural languages are polysemous, that is they have related but different meanings in different contexts. These polysemous meanings (senses) are marked by their structuredness, flexibility, productivity, and regularity. Previous theories have focused on some of these features but not all of them together. Thus, I propose a new theory of polysemy, which has two components. First, word meaning is actively modulated by broad contexts in a continuous fashion. Second, clustering arises from contextual modulations of a word and is then entrenched in our long term memory to facilitate future production and processing. Hence, polysemous senses are entrenched clusters in contextual modulation of word meaning and a word is polysemous if and only if it has entrenched clustering in its contextual modulation. I argue that this theory explains all the features of polysemous senses. In order to demonstrate more thoroughly how clusters emerge from meaning modulation during processing and provide evidence for this new theory, I implement the theory by training a recurrent neural network (RNN) that learns distributional information through exposure to a large corpus of English. Clusters of contextually modulated meanings emerge from how the model processes individual words in sentences. This trained model is validated against a human-annotated corpus of polysemy, focusing on the gradedness and flexibility of polysemous sense individuation, a human-annotated corpus of regular polysemy, focusing on the regularity of polysemy, and behavioral findings of offline sense relatedness ratings and online sentence processing. Last, the implication to philosophy of this new theory of polysemy is discussed. I focus on the debate between semantic minimalism and semantic contextualism. I argue that the phenomenon of polysemy poses a severe challenge to semantic minimalism. No solution is foreseeable if the minimalist thesis is kept, and the existence of contextual modulation is denied within the literal truth condition of an utterance
An Exploration of Word Embedding Initialization in Deep-Learning Tasks
Word embeddings are the interface between
the world of discrete units of text
processing and the continuous, differentiable
world of neural networks. In this
work, we examine various random and
pretrained initialization methods for embeddings
used in deep networks and their
effect on the performance on four NLP
tasks with both recurrent and convolutional
architectures. We confirm that pretrained
embeddings are a little better than
random initialization, especially considering
the speed of learning. On the other
hand, we do not see any significant difference
between various methods of random
initialization, as long as the variance
is kept reasonably low. High-variance initialization
prevents the network to use the
space of embeddings and forces it to use
other free parameters to accomplish the
task. We support this hypothesis by observing
the performance in learning lexical
relations and by the fact that the network
can learn to perform reasonably in its task
even with fixed random embeddings
Recommended from our members
Machine learning methods for vector-based compositional semantics
Rich semantic representations of linguistic data are an essential component to the development of machine learning algorithms for natural language processing. This thesis explores techniques to model the meaning of phrases and sentences as dense vectors, which can then be further analysed and manipulated to perform any number of tasks involving the understanding of human language. Rather than seeing this task purely as an engineering problem, this thesis will focus on linguistically-motivated approaches, based on the principle of compositionality.
The first half of the thesis will be dedicated to categorial compositional models, which are based on the observation that certain types of grammars share the structure of the algebra of vector spaces. This leads to an approach where the meanings of words are modelled as multilinear maps, encoded as tensors. In this framework, the meaning of a composite linguistic phrase can be computed via the tensor multiplication of its constituents, according to the phrase's syntactic structure. I contribute two categorial compositional models: the first, an extension of a popular method for learning semantic representation of words, models the meanings of adjective-noun phrases as matrix-vector multiplications; the second uses higher-order tensors to represent the meaning of relative clauses.
In contrast, the models presented in the second half of the thesis do away with traditional syntactic structures. Rather than using the standard syntax trees of linguistics to drive the compositional process, these models treat the compositional structure as a latent variable. I contribute two models that automatically induce trees for a downstream task, without ever being shown a `real' syntax tree: one model based on chart parsing, and one based on shift-reduce parsing. While these proposed approaches induce trees that do not resemble traditional syntax trees, they do lead to models with higher performance on downstream tasks – opening up avenues for future research.EPSR