11 research outputs found

    ConQueSt: a Constraint-based Querying System for Exploratory Pattern Discovery

    Get PDF
    Il contributo di questa tesi è il disegno e lo sviluppo di un sistema di Knoledge Discovery denominato ConQueSt. Basato sul paradigma del Pattern Discovery guidato dai vincoli, ConQueSt segue la visione dell’Inductive Database: • il mining è visto come forma più complessa di querying, • il sistema quindi è equipaggiato con un data mining query language, e strettamente collegato con un DBMS • i pattern estratti con query di mining diventano cittadini di prima classe e, seguendo il principio di chiusura, vengono materializzati accanto ai dati nel DBMS. ConQueSt è già stato presentato con successo al workshop internazionale della comunità IDB, e alla prestigiosa conferenza IEEE International Conference on Data Mining Engineering (ICDE 2006). A giugno sarà presentato alla conferenaz italiana di basi di dati (SEBD 2006). E’ attualmente in corso la sottomissione ad una prestigiosa rivista

    Data mining query language design and implementation.

    Get PDF
    Xiaolei Yuan.Thesis submitted in: December 2003.Thesis (M.Phil.)--Chinese University of Hong Kong, 2004.Includes bibliographical references (leaves 95-101).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Background --- p.1Chapter 1.1.1 --- Data Mining: A New Wave of Database Applications --- p.1Chapter 1.1.2 --- Association Rule Mining --- p.4Chapter 1.2 --- Motivation --- p.7Chapter 1.3 --- Main Contribution --- p.8Chapter 1.4 --- Thesis Organization --- p.9Chapter 2 --- Literature Review --- p.10Chapter 2.1 --- Data mining and association rule mining --- p.10Chapter 2.2 --- Integration data mining with DBMS --- p.11Chapter 2.3 --- Query language design for association rule mining --- p.12Chapter 2.4 --- Unified data mining models --- p.15Chapter 2.5 --- Other topics --- p.15Chapter 3 --- A New Data Mining Query Language M2MQL --- p.17Chapter 3.1 --- Simple item-based association rule --- p.18Chapter 3.1.1 --- One rule set --- p.19Chapter 3.1.2 --- Rule set and Source data set --- p.22Chapter 3.1.3 --- New rule sets from existing ones --- p.24Chapter 3.2 --- Generalized item-based association rules --- p.25Chapter 3.3 --- CREATE RULE and SELECT RULE Primitive --- p.32Chapter 4 --- The Algebra in M2MQL --- p.33Chapter 4.1 --- Review of nested relations --- p.33Chapter 4.1.1 --- Concepts of nested relation --- p.34Chapter 4.1.2 --- Nested relation and association rule mining --- p.35Chapter 4.2 --- Nested relational algebra --- p.36Chapter 4.3 --- Specific data mining algebra --- p.39Chapter 4.3.1 --- POWERSET p --- p.40Chapter 4.3.2 --- SET-CONTAINMENT-JOIN xc --- p.40Chapter 4.3.3 --- Functional operators --- p.42Chapter 5 --- Mining On Top of M2MQL --- p.50Chapter 5.1 --- Problem statement --- p.50Chapter 5.2 --- Frequency Counting Phase --- p.52Chapter 5.3 --- Frequent Itemset Generation Phase --- p.54Chapter 5.4 --- Rule Generation Phase --- p.57Chapter 5.5 --- Summary --- p.64Chapter 6 --- Conclusions and Future Work --- p.65Chapter 6.1 --- What we have achieved --- p.65Chapter 6.2 --- What is ahead --- p.66Chapter 6.2.1 --- Issues of Query Optimization --- p.66Chapter 6.2.2 --- Issues of Expanding Table Forms --- p.67Chapter A --- General Syntax of M2MQL --- p.68Chapter B --- Syntax and Example for MSQL --- p.71Chapter B.1 --- Syntax of MSQL --- p.71Chapter B.2 --- Example --- p.73Chapter C --- Syntax and Example for MINE RULE --- p.76Chapter C.1 --- syntax of MINE RULE --- p.76Chapter C.2 --- Example --- p.77Chapter C.2.1 --- Counting Groups --- p.78Chapter C.2.2 --- Making Couples of Clusters --- p.79Chapter C.2.3 --- Extracting Bodies --- p.80Chapter C.2.4 --- Extracting Rules --- p.80Bibliography --- p.8

    Systems between information and knowledge : In a memory management model of an extended enterprise

    Get PDF
    The research question of this thesis was how knowledge can be managed with information systems. Information systems can support but not replace knowledge management. Systems can mainly store epistemic organisational knowledge included in content, and process data and information. Certain value can be achieved by adding communication technology to systems. All communication, however, can not be managed. A new layer between communication and manageable information was named as knowformation. Knowledge management literature was surveyed, together with information species from philosophy, physics, communication theory, and information system science. Positivism, post-positivism, and critical theory were studied, but knowformation in extended organisational memory seemed to be socially constructed. A memory management model of an extended enterprise (M3.exe) and knowformation concept were findings from iterative case studies, covering data, information and knowledge management systems. The cases varied from groups towards extended organisation. Systems were investigated, and administrators, users (knowledge workers) and managers interviewed. The model building required alternative sets of data, information and knowledge, instead of using the traditional pyramid. Also the explicit-tacit dichotomy was reconsidered. As human knowledge is the final aim of all data and information in the systems, the distinction between management of information vs. management of people was harmonised. Information systems were classified as the core of organisational memory. The content of the systems is in practice between communication and presentation. Firstly, the epistemic criterion of knowledge is not required neither in the knowledge management literature, nor from the content of the systems. Secondly, systems deal mostly with containers, and the knowledge management literature with applied knowledge. Also the construction of reality based on the system content and communication supports the knowformation concept. Knowformation belongs to memory management model of an extended enterprise (M3.exe) that is divided into horizontal and vertical key dimensions. Vertically, processes deal with content that can be managed, whereas communication can be supported, mainly by infrastructure. Horizontally, the right hand side of the model contains systems, and the left hand side content, which should be independent from each other. A strategy based on the model was defined.Tutkimuksen tavoitteena oli määrittää, miten tietojärjestelmiä voidaan käyttää organisaatioiden tietämyksen hallintaan. Johtopäätöksenä voidaan sanoa, että järjestelmillä voidaan tukea, mutta ei korvata tietojohtamista. Tietojärjestelmiä voidaan käyttää lähinnä organisaation episteemisen tiedon muistina, prosessoitavan tiedon varastointiin. Oleellista lisäarvoa saadaan, jos viestintäteknologiaa käytetään tietojärjestelmien tukena. Kommunikaatiota ei kuitenkaan voida johtaa, sillä se ei perustu prosesseihin, vaan enintään työnkulkuun ja sitä vapaampaan viestintään. Hallitun informaation ja viestinnän välille syntyy knowformaatioksi nimetty kerros, lähinnä organisaatioiden lyhytkestoiseen muistiin. Uusi knowformaatio-käsite on käytännön tapaustutkimusten tulos. Vastaavaa ei aiemmissa tietojohtamisen tutkimuksissa ole esitetty. Tietojohtamisen kirjallisuuden taustaksi tutkittiin fysiikan, filosofian, viestinnän ja tietojenkäsittelytieteen luokitukset. Tapaustutkimuksissa tarkasteltiin useita datan hallinnan, dokumentaation ja tietojohtamisen järjestelmiä organisaation sisäisissä ryhmissä, organisaation laajuisesti sekä organisaation yhteistyökumppaneiden kanssa. Tapauksissa tutkittiin niin järjestelmien ominaisuudet kuin myös eri sidosryhmien kokemukset. Tutkimuksessa tietojärjestelmät luokiteltiin organisaation muistin ytimeen. Knowformaation kerrosta tarvitaan toisaalta koska filosofisen tiedon episteemistä kriteeriä ei edellytetä järjestelmien sisällöltä (eikä tietojohtamisen kirjallisuuden käsitemäärittelyissä) ja toisaalta koska tiedon uudelleenkonstruoinnissa merkitys muuttuu. Tulevien järjestelmien suunnitteluun tarvitaan uusi näkökulma, koska data, informaatio ja knowledge tasojen hierarkia ei erotu eri järjestelmätyyppien käyttäjien sosiaalisesti konstruoidussa todellisuudessa. Tieteen filosofian skaala positivistisesta konstruktivistiseen oli mallin muodostuksessa oleellinen, ja sen validiuden todentamisen jälkeen eksplisiittinen piiloinen -dikotomia mallinettiin uudelleen knowformaatio-käsitteen avulla. Uusi tietomalli ja knowformaatio-käsite tarvitaan työn päätuloksessa, jatketun organisaation muistin hallintamallissa. Sen ääripäihin kuluvat kommunikaatio, jota tuetaan, ja toisessa päässä prosessit, joita hallitaan. Kahden muun entiteetin, järjestelmien ja niiden sisällön, tulisi olla riippumattomia toisistaan. Knowformaatio elää näiden kokonaisuuksien implisiittisillä rajoilla, informaation ja tiedon välisellä harmaalla alueella

    Generación de métodos basados en inteligencia artificial para el análisis de datos medioambientales. Aplicaciones prácticas

    Get PDF
    Sastre Merlín, Antonio, tutorEn los últimos tiempos se ha puesto de manifiesto la gran importancia del análisis de datos con vistas a la búsqueda de modelos y a la inferencia de información nueva y relevante. En concreto, en ciencias medioambientales estas tareas de análisis son de especial importancia debido a la paulatina degradación ambiental que sufre nuestro entorno y que requiere actuaciones urgentes y de gran precisión. La investigación que se presenta en este trabajo de tesis es el fruto de la integración de dos áreas de conocimiento bien conocidas; las áreas de inteligencia artificial y de ciencias medioambientales, con el objetivo de diseñar y desarrollar métodos de análisis o de inferencia de modelos que permitan explorar nuevos aspectos de los problemas medioambientales a partir de un conjunto de observaciones. Habitualmente estos problemas presentan una gran complejidad que limita, en muchos casos, la eficacia de las técnicas estadísticas de inferencia para la extracción de información o conocimiento. La metodología propuesta pretende ser una ayuda útil y complementaria a los estudios estadísticos. La memoria presenta todas las fases del diseño y del desarrollo de un sistema de extracción de conocimiento en bases de datos (Knowledge Discovery Database - KDD) que ha sido implementado teniendo en cuenta características propias de los datos y muestreos medioambientales. Entre las aportaciones principales se encuentra un sistema de inferencia de modelos que utiliza un procedimiento de aprendizaje automático, en concreto aprendizaje basado en ejemplos. El sistema genera modelos fácilmente interpretables ya que el conocimiento viene representado por un conjunto de reglas Si-entonces. En este sistema de inferencia de modelos se ha implementado un algoritmo genético como método de búsqueda de los mejores conjuntos de reglas que permite evitar la exploración sesgada del espacio de posibles soluciones (modelos) que presentan otros procedimientos de búsqueda. Además como parte del sistema KDD desarrollado, se ha implementado una herramienta de ayuda a la recogida georeferenciada de datos en campo que los almacena, en tiempo real, en una base de datos relacional con un formato que permite el tratamiento posterior de la información almacenada con un Sistema de Información Geográfica. El conjunto de herramientas desarrolladas se aplican a un problema medioambiental; el control de malas hierbas en sistemas agrícolas, una de las líneas centrales de la denominada agricultura de precisión, área que desde las perspectivas ecológica y económica busca una gestión óptima de los productos agroquímicos empleados en los tratamientos fitosanitarios. En concreto el análisis que se presenta en la memoria va encaminado a la obtención, a partir de un conjunto de datos, de modelos basados en reglas que expliquen, en función de parámetros ambientales y para un mismo campo, la existencia de una mayor cantidad de malas hierbas en unas zonas del cultivo frente a otras. El conocimiento incluido en los modelos extraídos aporta información de utilidad que puede plasmarse en un mapa de riesgo que permita asesorar en la aplicación precisa de herbicida sólo en las zonas del cultivo que lo requieran y en una dosis ajustada a cada situación de infestación. Los datos utilizados para la obtención de los modelos provienen de varias parcelas de cereal de invierno situadas en la Comunidad de Madrid y en la provincia de Barcelona y de dos tipos de mala hierba (Avena sterilis L. y Lolium rigidum G.). Asimismo, los conjuntos de reglas obtenidos con la metodología propuesta se han contrastado con los modelos generados, para el mismo conjunto de datos, con algoritmos comerciales como C&RT y C5.0, dando como resultado una mejora en la calidad de los modelos inducidos con los métodos desarrollados, es decir que nuestros modelos describen con mayor exactitud y confianza las observaciones de partida

    Neural Networks for Analysing Music and Environmental Audio

    Get PDF
    PhDIn this thesis, we consider the analysis of music and environmental audio recordings with neural networks. Recently, neural networks have been shown to be an effective family of models for speech recognition, computer vision, natural language processing and a number of other statistical modelling problems. The composite layer-wise structure of neural networks allows for flexible model design, where prior knowledge about the domain of application can be used to inform the design and architecture of the neural network models. Additionally, it has been shown that when trained on sufficient quantities of data, neural networks can be directly applied to low-level features to learn mappings to high level concepts like phonemes in speech and object classes in computer vision. In this thesis we investigate whether neural network models can be usefully applied to processing music and environmental audio. With regards to music signal analysis, we investigate 2 different problems. The fi rst problem, automatic music transcription, aims to identify the score or the sequence of musical notes that comprise an audio recording. We also consider the problem of automatic chord transcription, where the aim is to identify the sequence of chords in a given audio recording. For both problems, we design neural network acoustic models which are applied to low-level time-frequency features in order to detect the presence of notes or chords. Our results demonstrate that the neural network acoustic models perform similarly to state-of-the-art acoustic models, without the need for any feature engineering. The networks are able to learn complex transformations from time-frequency features to the desired outputs, given sufficient amounts of training data. Additionally, we use recurrent neural networks to model the temporal structure of sequences of notes or chords, similar to language modelling in speech. Our results demonstrate that the combination of the acoustic and language model predictions yields improved performance over the acoustic models alone. We also observe that convolutional neural networks yield better performance compared to other neural network architectures for acoustic modelling. For the analysis of environmental audio recordings, we consider the problem of acoustic event detection. Acoustic event detection has a similar structure to automatic music and chord transcription, where the system is required to output the correct sequence of semantic labels along with onset and offset times. We compare the performance of neural network architectures against Gaussian mixture models and support vector machines. In order to account for the fact that such systems are typically deployed on embedded devices, we compare performance as a function of the computational cost of each model. We evaluate the models on 2 large datasets of real-world recordings of baby cries and smoke alarms. Our results demonstrate that the neural networks clearly outperform the other models and they are able to do so without incurring a heavy computation cost

    Virtuosity in Computationally Creative Musical Performance for Bass Guitar

    Get PDF
    This thesis focuses on the development and implementation of a theory for a computationally creative musical performance system aimed at producing virtuosic interpretations of musical pieces for performance on bass guitar. This theory has been developed and formalised using Wiggins’ Creative Systems Framework (CSF) and uses case-base reasoning (CBR) and an engagement-reflection cycle to adorn monophonic musical note sequences with explicit performance directions, selected to maximise the virtuosity when performed using a bass guitar. A survey of 497 bass players’ playing competences was conducted and used to develop a playing complexity rating for adorned musical pieces. Measures of musical similarity used within the case-base reasoning were assessed by a listening test of 12 participants. A study into the perceived difficulty of bass performances was also conducted and an appropriate model of perceived bass playing difficulty determined. The complexity rating and perceived playing difficulties are utilised within the heuristic used by the system to determine what performances are considered to be virtuosic. The output of the system was rendered on a digital waveguide model of an electric bass, that was updated with newly developed digital waveguide synthesis methods for advanced bass guitar playing techniques. These audio renderings were evaluated with a perceptual study of 60 participants, the results of which were used to validate the heuristic used within the system. This research makes contribution to the fields of Computational Creativity (CC), AI Music Creativity, Music Information Retrieval and Musicology. It demonstrates how the CSF can be used as a tool to aid in designing computationally creative musical performance systems, provides a method to assess musical complexity and perceived difficulty of bass guitar performances, tested a suitable musical similarity measure for use within creative systems, and made advances in bass guitar digital waveguide synthesis methods
    corecore