10 research outputs found

    Memory-Based Shallow Parsing

    Full text link
    We present memory-based learning approaches to shallow parsing and apply these to five tasks: base noun phrase identification, arbitrary base phrase recognition, clause detection, noun phrase parsing and full parsing. We use feature selection techniques and system combination methods for improving the performance of the memory-based learner. Our approach is evaluated on standard data sets and the results are compared with that of other systems. This reveals that our approach works well for base phrase identification while its application towards recognizing embedded structures leaves some room for improvement

    Separación de fuentes y transcripción musical con deep learning.

    Get PDF
    El objetivo de este trabajo es conseguir una transcripción MIDI a partir de un archivo MP3 en melodías polifónicas (varias notas sonando al mismo tiempo) y con varios instrumentos sonando a la vez, haciendo uso del deep learning. Para ello primero se separa cada instrumento mediante el uso de la librería Demucs, y luego se entrena un modelo que detecta las notas completas (frames). El deep learning es una herramienta que ha evolucionado en gran medida estos últimos años gracias al avance de la tecnología.<br /

    Evolving visual routines

    Get PDF
    Thesis (M.S.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1995.Includes bibliographical references (leaves 114-117).by Michael Patrick Johnson.M.S

    Criminal data analysis based on low rank sparse representation

    Get PDF
    FINDING effective clustering methods for a high dimensional dataset is challenging due to the curse of dimensionality. These challenges can usually make the most of basic common algorithms fail in highdimensional spaces from tackling problems such as large number of groups, and overlapping. Most domains uses some parameters to describe the appearance, geometry and dynamics of a scene. This has motivated the implementation of several techniques of a high-dimensional data for finding a low-dimensional space. Many proposed methods fail to overcome the challenges, especially when the data input is high-dimensional, and the clusters have a complex. REGULARLY in high dimensional data, lots of the data dimensions are not related and might hide the existing clusters in noisy data. High-dimensional data often reside on some low dimensional subspaces. The problem of subspace clustering algorithms is to uncover the type of relationship of an objects from one dimension that are related in different subsets of another dimensions. The state-of-the-art methods for subspace segmentation which included the Low Rank Representation (LRR) and Sparse Representation (SR). The former seeks the global lowest-rank representation but restrictively assumes the independence among subspaces, whereas the latter seeks the clustering of disjoint or overlapped subspaces through locality measure, which, however, causes failure in the case of large noise. THIS thesis aims are to identify the key problems and obstacles that have challenged the researchers in recent years in clustering high dimensional data, then to implement an effective subspace clustering methods for solving high dimensional crimes domains for both real events and synthetic data which has complex data structure with 168 different offence crimes. As well as to overcome the disadvantages of existed subspace algorithms techniques. To this end, a Low-Rank Sparse Representation (LRSR) theory, the future will refer to as Criminal Data Analysis Based on LRSR will be examined, then to be used to recover and segment embedding subspaces. The results of these methods will be discussed and compared with what already have been examined on previous approaches such as K-mean and PCA segmented based on K-means. The previous approaches have helped us to chose the right subspace clustering methods. The Proposed method based on subspace segmentation method named Low Rank subspace Sparse Representation (LRSR) which not only recovers the low-rank subspaces but also gets a relatively sparse segmentation with respect to disjoint subspaces or even overlapping subspaces. BOTH UCI Machine Learning Repository, and crime database are the best to find and compare the best subspace clustering algorithm that fit for high dimensional space data. We used many Open-Source Machine Learning Frameworks and Tools for both employ our machine learning tasks and methods including preparing, transforming, clustering and visualizing the high-dimensional crime dataset, we precisely have used the most modern and powerful Machine Learning Frameworks data science that known as SciKit-Learn for library for the Python programming language, as well as we have used R, and Matlab in previous experiment

    3-D Content-Based Retrieval and Classification with Applications to Museum Data

    Get PDF
    There is an increasing number of multimedia collections arising in areas once only the domain of text and 2-D images. Richer types of multimedia such as audio, video and 3-D objects are becoming more and more common place. However, current retrieval techniques in these areas are not as sophisticated as textual and 2-D image techniques and in many cases rely upon textual searching through associated keywords. This thesis is concerned with the retrieval of 3-D objects and with the application of these techniques to the problem of 3-D object annotation. The majority of the work in this thesis has been driven by the European project, SCULPTEUR. This thesis provides an in-depth analysis of a range of 3-D shape descriptors for their suitability for general purpose and specific retrieval tasks using a publicly available data set, the Princeton Shape Benchmark, and using real world museum objects evaluated using a variety of performance metrics. This thesis also investigates the use of 3-D shape descriptors as inputs to popular classification algorithms and a novel classifier agent for use with the SCULPTEUR system is designed and developed and its performance analysed. Several techniques are investigated to improve individual classifier performance. One set of techniques combines several classifiers whereas the other set of techniques aim to find the optimal training parameters for a classifier. The final chapter of this thesis explores a possible application of these techniques to the problem of 3-D object annotation

    Genetic programming for cephalometric landmark detection

    Get PDF
    The domain of medical imaging analysis has burgeoned in recent years due to the availability and affordability of digital radiographic imaging equipment and associated algorithms and, as such, there has been significant activity in the automation of the medical diagnostic process. One such process, cephalometric analysis, is manually intensive and it can take an experienced orthodontist thirty minutes to analyse one radiology image. This thesis describes an approach, based on genetic programming, neural networks and machine learning, to automate this process. A cephalometric analysis involves locating a number of points in an X-ray and determining the linear and angular relationships between them. If the points can be located accurately enough, the rest of the analysis is straightforward. The investigative steps undertaken were as follows: Firstly, a previously published method, which was claimed to be domain independent, was implemented and tested on a selection of landmarks, ranging from easy to very difficult. These included the menton, upper lip, incisal upper incisor, nose tip and sella landmarks. The method used pixel values, and pixel statistics (mean and standard deviation) of pre-determined regions as inputs to a genetic programming detector. This approach proved unsatisfactory and the second part of the investigation focused on alternative handcrafted features sets and fitness measures. This proved to be much more successful and the third part of the investigation involved using pulse coupled neural networks to replace the handcrafted features with learned ones. The fourth and final stage involved an analysis of the evolved programs to determine whether reasonable algorithms had been evolved and not just random artefacts learnt from the training images. A significant finding from the investigative steps was that the new domain independent approach, using pulse coupled neural networks and genetic programming to evolve programs, was as good as or even better than one using the handcrafted features. The advantage of this finding is that little domain knowledge is required, thus obviating the requirement to manually generate handcrafted features. The investigation revealed that some of the easy landmarks could be found with 100% accuracy while the accuracy of finding the most difficult ones was around 78%. An extensive analysis of evolved programs revealed underlying regularities that were captured during the evolutionary process. Even though the evolutionary process took different routes and a diverse range of programs was evolved, many of the programs with an acceptable detection rate implemented algorithms with similar characteristics. The major outcome of this work is that the method described in this thesis could be used as the basis of an automated system. The orthodontist would be required to manually correct a few errors before completing the analysis

    Pharmacovigilance Decision Support : The value of Disproportionality Analysis Signal Detection Methods, the development and testing of Covariability Techniques, and the importance of Ontology

    Get PDF
    The cost of adverse drug reactions to society in the form of deaths, chronic illness, foetal malformation, and many other effects is quite significant. For example, in the United States of America, adverse reactions to prescribed drugs is around the fourth leading cause of death. The reporting of adverse drug reactions is spontaneous and voluntary in Australia. Many methods that have been used for the analysis of adverse drug reaction data, mostly using a statistical approach as a basis for clinical analysis in drug safety surveillance decision support. This thesis examines new approaches that may be used in the analysis of drug safety data. These methods differ significantly from the statistical methods in that they utilize co variability methods of association to define drug-reaction relationships. Co variability algorithms were developed in collaboration with Musa Mammadov to discover drugs associated with adverse reactions and possible drug-drug interactions. This method uses the system organ class (SOC) classification in the Australian Adverse Drug Reaction Advisory Committee (ADRAC) data to stratify reactions. The text categorization algorithm BoosTexter was found to work with the same drug safety data and its performance and modus operandi was compared to our algorithms. These alternative methods were compared to a standard disproportionality analysis methods for signal detection in drug safety data including the Bayesean mulit-item gamma Poisson shrinker (MGPS), which was found to have a problem with similar reaction terms in a report and innocent by-stander drugs. A classification of drug terms was made using the anatomical-therapeutic-chemical classification (ATC) codes. This reduced the number of drug variables from 5081 drug terms to 14 main drug classes. The ATC classification is structured into a hierarchy of five levels. Exploitation of the ATC hierarchy allows the drug safety data to be stratified in such a way as to make them accessible to powerful existing tools. A data mining method that uses association rules, which groups them on the basis of content, was used as a basis for applying the ATC and SOC ontologies to ADRAC data. This allows different views of these associations (even very rare ones). A signal detection method was developed using these association rules, which also incorporates critical reaction terms.Doctor of Philosoph
    corecore