2,112 research outputs found
Analysing symbolic music with probabilistic grammars
Recent developments in computational linguistics offer ways to approach the analysis of musical structure by inducing probabilistic models (in the form of grammars) over a corpus of music. These can produce idiomatic sentences from a probabilistic model of the musical language and thus offer explanations of the musical structures they model. This chapter surveys historical and current work in musical analysis using grammars, based on computational linguistic approaches. We outline the theory of probabilistic grammars and illustrate their implementation in Prolog using PRISM. Our experiments on learning the probabilities for simple grammars from pitch sequences in two kinds of symbolic musical corpora are summarized. The results support our claim that probabilistic grammars are a promising framework for computational music analysis, but also indicate that further work is required to establish their superiority over Markov models
BOSS: Bayesian Optimization over String Spaces
This article develops a Bayesian optimization (BO) method which acts directly
over raw strings, proposing the first uses of string kernels and genetic
algorithms within BO loops. Recent applications of BO over strings have been
hindered by the need to map inputs into a smooth and unconstrained latent
space. Learning this projection is computationally and data-intensive. Our
approach instead builds a powerful Gaussian process surrogate model based on
string kernels, naturally supporting variable length inputs, and performs
efficient acquisition function maximization for spaces with syntactical
constraints. Experiments demonstrate considerably improved optimization over
existing approaches across a broad range of constraints, including the popular
setting where syntax is governed by a context-free grammar
SynJax: Structured Probability Distributions for JAX
The development of deep learning software libraries enabled significant
progress in the field by allowing users to focus on modeling, while letting the
library to take care of the tedious and time-consuming task of optimizing
execution for modern hardware accelerators. However, this has benefited only
particular types of deep learning models, such as Transformers, whose
primitives map easily to the vectorized computation. The models that explicitly
account for structured objects, such as trees and segmentations, did not
benefit equally because they require custom algorithms that are difficult to
implement in a vectorized form.
SynJax directly addresses this problem by providing an efficient vectorized
implementation of inference algorithms for structured distributions covering
alignment, tagging, segmentation, constituency trees and spanning trees. With
SynJax we can build large-scale differentiable models that explicitly model
structure in the data. The code is available at
https://github.com/deepmind/synjax
Text analysis of handwritten production deviations
Companies want to understand the latest trends and summarize product status or
public opinion based on social media data. Because data is rich and very diverse,
there has been a need to create automated and real-time opinion polling and data
mining. This need has contributed to the huge popularity of text analysis and at
the same time the development and use of it is being applied to more and more
industries. Not just for evaluating consumer feedback, for example.
Natural language processing (NLP) is a subfield of linguistics, computer science,
and artificial intelligence which is focused to enable computers to understand and
interpret human language. Its goal and strength is specifically to program computers
to process and analyze large amounts of natural language. NLP technology can
extract data accurately from text and classify and organize data. Using machine
learning methods makes text analysis much faster and more efficient than manual
word processing. The methods can be used to reduce labor costs and speed up the
processing of texts without compromising on quality.
The main focus of the thesis is to study the textual material received from the client
and to develop a prediction model based on it using natural language processing
(NLP) techniques. As a research strategy has been used a case study. The obtained
text data, sentences about 9000, are from the period 2016/11-2018/9 from the production deviations observed in the welding and assembly process. Text sentences,
i.e. user comments, were available at all stages from the detection of a deviation to
its solution. This study has focused on the first observational comment written on
the deviation. Based on them, a predictive model has been trained that can predict
based on the given first comment, what can be the root cause of the deviation.
The research material has been analyzed using both traditional machine learning
methods and more advanced deep learning methods, pre-trained FinBERT and multilingual BERT. The accuracy of the model has been a key measure of the superiority
of the model. The result was a reliable prediction model that can be used to predict
when a deviation falls into class 100 (missing part) or class 200 (other deviations).
The best accuracy of the traditional machine learning model was 85.7 % and of the
transformer model was 82.6 %. The most common word in the all Finnish sentences
was "puuttua" in different forms
- …