10 research outputs found
Memory-Based Shallow Parsing
We present memory-based learning approaches to shallow parsing and apply
these to five tasks: base noun phrase identification, arbitrary base phrase
recognition, clause detection, noun phrase parsing and full parsing. We use
feature selection techniques and system combination methods for improving the
performance of the memory-based learner. Our approach is evaluated on standard
data sets and the results are compared with that of other systems. This reveals
that our approach works well for base phrase identification while its
application towards recognizing embedded structures leaves some room for
improvement
Separación de fuentes y transcripción musical con deep learning.
El objetivo de este trabajo es conseguir una transcripción MIDI a partir de un archivo MP3 en melodÃas polifónicas (varias notas sonando al mismo tiempo) y con varios instrumentos sonando a la vez, haciendo uso del deep learning. Para ello primero se separa cada instrumento mediante el uso de la librerÃa Demucs, y luego se entrena un modelo que detecta las notas completas (frames). El deep learning es una herramienta que ha evolucionado en gran medida estos últimos años gracias al avance de la tecnologÃa.<br /
Recommended from our members
A feasibility study on compiling reactive problem solution methods for an Al domain
This paper investigates the feasibility of compiling the functionality of a decision
theoretic problem solving engine into a set of rules or functionally similar construct.
The decision theoretic engine runs in exponential time, while the rule set runs in
linear time at worst. The main question that will determine the feasibility is whether
the size of the rule set is small enough to be of practical use. Based on the tests
run, size does not appear to be a limiting factor in compiling rule sets
Evolving visual routines
Thesis (M.S.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1995.Includes bibliographical references (leaves 114-117).by Michael Patrick Johnson.M.S
Criminal data analysis based on low rank sparse representation
FINDING effective clustering methods for a high dimensional dataset is challenging due to the curse of dimensionality. These challenges can usually make the most of basic common algorithms fail in highdimensional spaces from tackling problems such as large number of groups, and overlapping. Most domains uses some parameters to describe the appearance, geometry and dynamics of a scene. This has motivated the implementation of several techniques of a high-dimensional data for finding a low-dimensional space. Many proposed methods fail to overcome the challenges, especially when the data input is high-dimensional, and the clusters have a complex.
REGULARLY in high dimensional data, lots of the data dimensions are not related and might hide the existing clusters in noisy data. High-dimensional data often reside on some low dimensional subspaces. The problem of subspace clustering algorithms is to uncover the type of relationship of an objects from one dimension that are related in different subsets of another dimensions. The state-of-the-art methods for subspace segmentation which included the Low Rank Representation (LRR) and Sparse Representation (SR). The former
seeks the global lowest-rank representation but restrictively assumes the independence among subspaces, whereas the latter seeks the clustering of disjoint or overlapped subspaces through locality measure, which, however, causes failure in the case of large noise.
THIS thesis aims are to identify the key problems and obstacles that have challenged the researchers in recent years in clustering high dimensional data, then to implement an effective subspace clustering methods for solving high dimensional crimes domains for both real events and synthetic data which has complex data structure with 168 different offence crimes. As well as to overcome the disadvantages of existed subspace algorithms techniques. To this end, a Low-Rank Sparse Representation (LRSR) theory, the future will refer to as Criminal Data Analysis Based on LRSR will be examined, then to be used to recover and segment embedding subspaces. The results of these methods will be discussed and compared with what already have been examined on previous approaches such as K-mean and PCA segmented based on K-means. The previous approaches have helped us to chose the right subspace clustering methods. The Proposed method based on subspace segmentation method named Low Rank subspace Sparse Representation (LRSR) which not only recovers the low-rank subspaces but also gets a relatively sparse segmentation with respect to disjoint subspaces or even overlapping subspaces.
BOTH UCI Machine Learning Repository, and crime database are the best to find and compare the best subspace clustering algorithm that fit for high dimensional space data. We used many Open-Source Machine Learning Frameworks and Tools for both employ our machine learning tasks and methods including preparing, transforming, clustering and visualizing the high-dimensional crime dataset, we precisely have used the most modern and powerful Machine Learning Frameworks data science that known as SciKit-Learn for library for the Python programming language, as well as we have used R, and Matlab in previous experiment
3-D Content-Based Retrieval and Classification with Applications to Museum Data
There is an increasing number of multimedia collections arising in areas once only the domain of text and 2-D images. Richer types of multimedia such as audio, video and 3-D objects are becoming more and more common place. However, current retrieval techniques in these areas are not as sophisticated as textual and 2-D image techniques and in many cases rely upon textual searching through associated keywords. This thesis is concerned with the retrieval of 3-D objects and with the application of these techniques to the problem of 3-D object annotation. The majority of the work in this thesis has been driven by the European project, SCULPTEUR. This thesis provides an in-depth analysis of a range of 3-D shape descriptors for their suitability for general purpose and specific retrieval tasks using a publicly available data set, the Princeton Shape Benchmark, and using real world museum objects evaluated using a variety of performance metrics. This thesis also investigates the use of 3-D shape descriptors as inputs to popular classification algorithms and a novel classifier agent for use with the SCULPTEUR system is designed and developed and its performance analysed. Several techniques are investigated to improve individual classifier performance. One set of techniques combines several classifiers whereas the other set of techniques aim to find the optimal training parameters for a classifier. The final chapter of this thesis explores a possible application of these techniques to the problem of 3-D object annotation
Genetic programming for cephalometric landmark detection
The domain of medical imaging analysis has burgeoned in recent years due to the availability and affordability of digital radiographic imaging equipment and associated algorithms and, as such, there has been significant activity in the automation of the medical diagnostic process. One such process, cephalometric analysis, is manually intensive and it can take an experienced orthodontist thirty minutes to analyse one radiology image. This thesis describes an approach, based on genetic programming, neural networks and machine learning, to automate this process. A cephalometric analysis involves locating a number of points in an X-ray and determining the linear and angular relationships between them. If the points can be located accurately enough, the rest of the analysis is straightforward. The investigative steps undertaken were as follows: Firstly, a previously published method, which was claimed to be domain independent, was implemented and tested on a selection of landmarks, ranging from easy to very difficult. These included the menton, upper lip, incisal upper incisor, nose tip and sella landmarks. The method used pixel values, and pixel statistics (mean and standard deviation) of pre-determined regions as inputs to a genetic programming detector. This approach proved unsatisfactory and the second part of the investigation focused on alternative handcrafted features sets and fitness measures. This proved to be much more successful and the third part of the investigation involved using pulse coupled neural networks to replace the handcrafted features with learned ones. The fourth and final stage involved an analysis of the evolved programs to determine whether reasonable algorithms had been evolved and not just random artefacts learnt from the training images. A significant finding from the investigative steps was that the new domain independent approach, using pulse coupled neural networks and genetic programming to evolve programs, was as good as or even better than one using the handcrafted features. The advantage of this finding is that little domain knowledge is required, thus obviating the requirement to manually generate handcrafted features. The investigation revealed that some of the easy landmarks could be found with 100% accuracy while the accuracy of finding the most difficult ones was around 78%. An extensive analysis of evolved programs revealed underlying regularities that were captured during the evolutionary process. Even though the evolutionary process took different routes and a diverse range of programs was evolved, many of the programs with an acceptable detection rate implemented algorithms with similar characteristics. The major outcome of this work is that the method described in this thesis could be used as the basis of an automated system. The orthodontist would be required to manually correct a few errors before completing the analysis
Recommended from our members
Active learning with committees : an approach to efficient learning in text categorization using linear threshold algorithms
We developed and investigated machine learning methods that require
minimal preprocessing of the input data, use few training examples, run fast, and
still obtain high levels of accuracy.
Most approaches to designing machine learning programs are based on the
supervised learning paradigm – training examples are chosen randomly and given
to the learner. We explore the "active learning" paradigm – the learner
automatically selects the more informative training examples. Our domain of
interest is text categorization, but most of the methods developed are quite general.
The purpose of text categorization is to assign each document in a collection
to appropriate categories. Most existing text categorization methods require large
amounts of time to prepare the documents for learning and large numbers of
examples for training. Humans must assign correct categories to documents before
they can be used for training; this costs time and money. Our goal is to develop
machine learning methods that, when compared to other methods currently available, are more efficient in time and space, use fewer training documents, and
are as accurate.
We developed the Active Learning with Committees (ALC) framework –
inspired by the Query by Committee approach of Freund, Seung, et al. A
"committee" is a group of learners that jointly participate in learning and in
predicting the classes of new examples. We perform minimal preprocessing of the
documents and thus the domain is noisy, high dimensional, and has large numbers
of irrelevant attributes. We use linear threshold learning algorithms to obtain
computational efficiency with respect to these large numbers of attributes, with
specific algorithms being chosen because they also generalize well when large
numbers of attributes are irrelevant.
We developed and analyzed several ALC systems. Our results show that it is
possible to design active learning systems that scale up to large numbers of features
and obtain accuracies comparable to the supervised learning methods while using
an order of magnitude fewer examples and an order of magnitude less time. The
ALC methods developed have run times on the order of seconds, typically use only
5 - 7% of the training documents, and are as accurate as their supervised
counterparts
Pharmacovigilance Decision Support : The value of Disproportionality Analysis Signal Detection Methods, the development and testing of Covariability Techniques, and the importance of Ontology
The cost of adverse drug reactions to society in the form of deaths, chronic illness, foetal malformation, and many other effects is quite significant. For example, in the United States of America, adverse reactions to prescribed drugs is around the fourth leading cause of death. The reporting of adverse drug reactions is spontaneous and voluntary in Australia. Many methods that have been used for the analysis of adverse drug reaction data, mostly using a statistical approach as a basis for clinical analysis in drug safety surveillance decision support. This thesis examines new approaches that may be used in the analysis of drug safety data. These methods differ significantly from the statistical methods in that they utilize co variability methods of association to define drug-reaction relationships. Co variability algorithms were developed in collaboration with Musa Mammadov to discover drugs associated with adverse reactions and possible drug-drug interactions. This method uses the system organ class (SOC) classification in the Australian Adverse Drug Reaction Advisory Committee (ADRAC) data to stratify reactions. The text categorization algorithm BoosTexter was found to work with the same drug safety data and its performance and modus operandi was compared to our algorithms. These alternative methods were compared to a standard disproportionality analysis methods for signal detection in drug safety data including the Bayesean mulit-item gamma Poisson shrinker (MGPS), which was found to have a problem with similar reaction terms in a report and innocent by-stander drugs. A classification of drug terms was made using the anatomical-therapeutic-chemical classification (ATC) codes. This reduced the number of drug variables from 5081 drug terms to 14 main drug classes. The ATC classification is structured into a hierarchy of five levels. Exploitation of the ATC hierarchy allows the drug safety data to be stratified in such a way as to make them accessible to powerful existing tools. A data mining method that uses association rules, which groups them on the basis of content, was used as a basis for applying the ATC and SOC ontologies to ADRAC data. This allows different views of these associations (even very rare ones). A signal detection method was developed using these association rules, which also incorporates critical reaction terms.Doctor of Philosoph