14 research outputs found

    Detecting the Intent of Email Using Embeddings, Deep Learning and Transfer Learning

    Get PDF
    Throughout the years\u27 several strategies and tools were proposed and developed to help the users cope with the problem of email overload, but each of these solutions had its own limitations and, in some cases, contribute to further problems. One major theme that encapsulates many of these solutions is automatically classifying emails into predefined categories (ex: Finance, Sport, Promotion, etc.) then move/tag the incoming email to that particular category. In general, these solutions have two main limitations: 1) they need to adapt to changing user’s behavior. 2) they require handcrafted features engineering which in turn need a lot of time, effort, and domain knowledge to produce acceptable performance.This dissertation aims to explore the email phenomenon and provide a scalable solution that addresses the above limitations. Our proposed system requires no handcrafted features engineering and utilizes the Speech Act Theory to design a classification system that detects whether an email required an action (i.e. to do) or no action (i.e. to read). We can automate both the features extraction and the classification phases by using our own word embeddings, trained on the entire Enron Email dataset, to represent the input. Then, we use a convolutional layer to capture local tri-gram features, followed by an LSTM layer to consider the meaning of a given feature (trigrams) concerning some “memory” of words that could occur much earlier in the email. Our system detects the email intent with 89% accuracy outperforming other related works. In developing this system, we followed the concept of Occam’s razor (i.e. law of parsimony). It is a problem-solving principle stating that entities should not be multiplied without necessity. Chapter four present our efforts to simplify the above-proposed model by dropping the use of the CNN layer and showing that fine-tuning a pre-trained Language Model on the Enron email dataset can achieve comparable results. To the best of our knowledge, this is the first attempt of using transfer learning to develop a deep learning model in the email domain. Finally, we showed that we could even drop the LSTM layer by representing each email’s sentences using contextual word/sentence embeddings. Our experimental results using three different types of embeddings: context-free word embeddings (word2vec and GloVe), contextual word embeddings (ELMo and BERT), and sentence embeddings (DAN-based Universal Sentence Encoder and Transformer-based Universal Sentence Encoder) suggest that using ELMo embeddings produce the best result. We achieved an accuracy of 90.10%, comparing with word2vec (82.02%), BERT (58.08%), DAN-based USE (86.66%), and Transformer-based USE (88.16%)

    A Personalized Dense Retrieval Framework for Unified Information Access

    Full text link
    Developing a universal model that can efficiently and effectively respond to a wide range of information access requests -- from retrieval to recommendation to question answering -- has been a long-lasting goal in the information retrieval community. This paper argues that the flexibility, efficiency, and effectiveness brought by the recent development in dense retrieval and approximate nearest neighbor search have smoothed the path towards achieving this goal. We develop a generic and extensible dense retrieval framework, called \framework, that can handle a wide range of (personalized) information access requests, such as keyword search, query by example, and complementary item recommendation. Our proposed approach extends the capabilities of dense retrieval models for ad-hoc retrieval tasks by incorporating user-specific preferences through the development of a personalized attentive network. This allows for a more tailored and accurate personalized information access experience. Our experiments on real-world e-commerce data suggest the feasibility of developing universal information access models by demonstrating significant improvements even compared to competitive baselines specifically developed for each of these individual information access tasks. This work opens up a number of fundamental research directions for future exploration.Comment: Accepted to SIGIR 202

    Chemical composition and source attribution of submicron aerosol particles in the summertime Arctic lower troposphere

    No full text
    We use airborne measurements of aerosol particle composition to demonstrate the strong contrast between particle sources and composition within and above the summertime Arctic boundary layer. In-situ measurements from two complementary aerosol mass spectrometers, the ALABAMA and the HR-ToF-AMS, with black carbon measurements from an SP2 are presented. Particle composition analysis was complemented by trace gas measurements, satellite data, and air mass history modeling to attribute particle properties to particle origin and air mass source regions. Particle composition above the summertime Arctic boundary layer was dominated by chemically aged particles, containing elemental carbon, nitrate, ammonium, sulfate, and organic matter. From our analysis, we conclude that the presence of these particles was driven by transport of aerosol and precursor gases from mid-latitudes to Arctic regions. Particularly, elevated concentrations of nitrate, ammonium, and organic matter coincided with time spent over vegetation fires in northern Canada. In parallel, those particles were largely present in high CO environments (> 90 ppbv). Additionally, we observed that the organic-to-sulfate ratio was enhanced with increasing influence from these fires. Besides vegetation fires, particle sources in mid-latitudes further include anthropogenic emissions in Europe, North America, and East Asia. The presence of particles in the Arctic lower free troposphere correlated with time spent over populated and industrial areas in these regions. Further, the size distribution of free tropospheric particles containing elemental carbon and nitrate was shifter to larger diameters compared to particles present within the boundary layer. Moreover, our analysis suggests that organic matter when present in the Arctic free troposphere can partly be identified as low-molecular weight dicarboxylic acids (oxalic, malonic, and succinic acid). Particles containing dicarboxylic acids were largely present when the residence time of air masses outside Arctic regions was high. In contrast, particle composition within the marine boundary layer was largely driven by Arctic regional processes. Air mass history modeling demonstrated that alongside primary sea spray particles, marine-biogenic sources contributed to secondary aerosol formation by trimethylamine, methanesulfonic acid, sulfate, and other organic species

    Mono 1,4-diaza-2,3-diborinane and bicyclic species: Synthesis and structures

    No full text
    The derivatives of diazadiborinane containing a ring with two nitrogen atoms and two boron atoms are a class of 6-membered heterocyclic compounds. Previous studies conducted on the synthesis of diazadiborinane showed that their structures are highly unstable and that some prepared isomer Mixtures have no defined mono structure. New monocyclic 1,4-diaza-2,3-diborinane derivatives B-2{1,4-(NAr)(2)C2H4} 3ah-fh and the new bicyclic (B-2{1,2-(NAr)(2)C2H4)(2)) isomer 6 were prepared. The structures of these new derivatives were characterized using nuclear magnetic resonance (NMR) spectroscopy. The molecular and crystal structures of 3ah, 3bh, 3ch, 3fa and bicyclic species 6c were determined using the single crystal X-ray diffraction technique. (C) 2016 Elsevier Ltd. All rights reserved
    corecore