1,746 research outputs found
COTA: Improving the Speed and Accuracy of Customer Support through Ranking and Deep Networks
For a company looking to provide delightful user experiences, it is of
paramount importance to take care of any customer issues. This paper proposes
COTA, a system to improve speed and reliability of customer support for end
users through automated ticket classification and answers selection for support
representatives. Two machine learning and natural language processing
techniques are demonstrated: one relying on feature engineering (COTA v1) and
the other exploiting raw signals through deep learning architectures (COTA v2).
COTA v1 employs a new approach that converts the multi-classification task into
a ranking problem, demonstrating significantly better performance in the case
of thousands of classes. For COTA v2, we propose an Encoder-Combiner-Decoder, a
novel deep learning architecture that allows for heterogeneous input and output
feature types and injection of prior knowledge through network architecture
choices. This paper compares these models and their variants on the task of
ticket classification and answer selection, showing model COTA v2 outperforms
COTA v1, and analyzes their inner workings and shortcomings. Finally, an A/B
test is conducted in a production setting validating the real-world impact of
COTA in reducing issue resolution time by 10 percent without reducing customer
satisfaction
Confusion Modelling - An Estimation by Semantic Embeddings
Approaching the task of coherence assessment of a conversation from its negative perspective ‘confusion’ rather than coherence itself, has been attempted by very few research works. Influencing Embeddings to learn from similarity/dissimilarity measures such as distance, cosine similarity between two utterances will equip them with the semantics to differentiate a coherent and an incoherent conversation through the detection of negative entity, ‘confusion’. This research attempts to measure coherence of conversation between a human and a conversational agent by means of such semantic embeddings trained from scratch by an architecture centralising the learning from the distance between the embeddings. State of the art performance of general BERT’s embeddings and state of the art performance of ConveRT’s conversation specific embeddings in addition to the GLOVE embeddings are also tested upon the laid architecture. Confusion, being a more sensible entity, real human labelling performance is set as the baseline to evaluate the models. The base design resulted in not such a good performance against the human score but the pre-trained embeddings when plugged into the base architecture had performance boosts in a particular order from lowest to highest, through BERT, GLOVE and ConveRT. The intuition and the efficiency of the base conceptual design is proved of its success when the variant having the ConveRT embeddings plugged into the base design, outperformed the original ConveRT’s state of art performance on generating similarity scores. Though a performance comparable to real human performance was not achieved by the models, there witnessed a considerable overlapping between the ConveRT variant and the human scores which is really a great positive inference to be enjoyed as achieving human performance is always the state of art in any research domain. Also, from the results, this research joins the group of works claiming BERT to be unsuitable for conversation specific modelling and embedding works
Meta learning with language models: Challenges and opportunities in the classification of imbalanced text
Detecting out of policy speech (OOPS) content is important but difficult.
While machine learning is a powerful tool to tackle this challenging task, it
is hard to break the performance ceiling due to factors like quantity and
quality limitations on training data and inconsistencies in OOPS definition and
data labeling. To realize the full potential of available limited resources, we
propose a meta learning technique (MLT) that combines individual models built
with different text representations. We analytically show that the resulting
technique is numerically stable and produces reasonable combining weights. We
combine the MLT with a threshold-moving (TM) technique to further improve the
performance of the combined predictor on highly-imbalanced in-distribution and
out-of-distribution datasets. We also provide computational results to show the
statistically significant advantages of the proposed MLT approach.
All authors contributed equally to this work.Comment: 22 pages, including 5 figures, 12 tables, 1 appendi
Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities
One of the main challenges of multimodal learning is the need to combine
heterogeneous modalities (e.g., video, audio, text). For example, video and
audio are obtained at much higher rates than text and are roughly aligned in
time. They are often not synchronized with text, which comes as a global
context, e.g., a title, or a description. Furthermore, video and audio inputs
are of much larger volumes, and grow as the video length increases, which
naturally requires more compute dedicated to these modalities and makes
modeling of long-range dependencies harder.
We here decouple the multimodal modeling, dividing it into separate, focused
autoregressive models, processing the inputs according to the characteristics
of the modalities. We propose a multimodal model, called Mirasol3B, consisting
of an autoregressive component for the time-synchronized modalities (audio and
video), and an autoregressive component for the context modalities which are
not necessarily aligned in time but are still sequential. To address the
long-sequences of the video-audio inputs, we propose to further partition the
video and audio sequences in consecutive snippets and autoregressively process
their representations. To that end, we propose a Combiner mechanism, which
models the audio-video information jointly within a timeframe. The Combiner
learns to extract audio and video features from raw spatio-temporal signals,
and then learns to fuse these features producing compact but expressive
representations per snippet.
Our approach achieves the state-of-the-art on well established multimodal
benchmarks, outperforming much larger models. It effectively addresses the high
computational demand of media inputs by both learning compact representations,
controlling the sequence length of the audio-video feature representations, and
modeling their dependencies in time
Signal processing using spectrally phase encoded optical frequency combs
Methods, apparatus and systems for an optical system for data harvesting and pattern recognition. The system includes a mode locked laser for producing a comb of optical frequencies that is split into two identical combs, a wavelength division demultiplexer eparate the individual optical frequency components of one comb and modulates each optical frequency component with a different one of plural target objects. A second modulator modulates an input signal with the second comb and an optical splitter splits the modulated signal into plural optical frequency components each containing the input signal. An optical combiner simultaneously combines the components containing the real time signal with one of the components containing a target object to produce a temporally modulated interferograrn, and a comparator simultaneously compares the two on a comb by comb basis using balanced differential detection to determine any of the plural target objects are found in the input signal
Machine Learning to Predict Advertisement Targeting Solutions
Generally, the present disclosure is directed to using machine learning to predict advertisement targeting solutions. In particular, in some implementations, the systems and methods of the present disclosure can include or otherwise leverage one or more machine-learned models to predict optimal advertisement target solutions such as, for example, keyword word sets, negative word sets, location restrictions, bid adjustments, and/or schedules based on product data such as, for example, advertisement content (e.g., ad creatives text), seed keywords, images of the product, and/or advertiser metadata
- …