226 research outputs found

    Design of Interactive Feature Space Construction Protocol

    Get PDF
    Machine learning deals with designing systems that learn from data i.e. automatically improve with experience. Systems gain experience by detecting patterns or regularities and using them for making predictions. These predictions are based on the properties that the system learns from the data. Thus when we say a machine learns, it means it has changed in a way that allows it to perform more efficiently than before. Machine learning is emerging as an important technology for solving a number of applications involving natural language processing applications, medical diagnosis, game playing or financial applications. Wide variety of machine learning approaches have been developed and used for a number of applications. We first review the work done in the field of machine learning and analyze various concepts about machine learning that are applicable to the work presented in this thesis. Next we examine active machine learning for pipelining of an important natural language application i.e. information extraction, in which the task of prediction is carried out in different stages and the output of each stage serves as an input to the next stage. A number of machine learning algorithms have been developed for different applications. However no single machine learning algorithm can be used appropriately for all learning problems. It is not possible to create a general learner for all problems because there are varied types of real world datasets that cannot be handled by a single learner. For this purpose an evaluation of the machine learning algorithms is needed. We present an experiment for the evaluation of various state-of-the-art machine learning algorithms using an interactive machine learning tool called WEKA (Waikato Environment for Knowledge Analysis). Evaluation is carried out with the purpose of finding an optimal solution for a real world learning problemcredit approval used in banks. It is a classification problem. Finally, we present an approach of combining various learners with the aim of increasing their efficiency. We present two experiments that evaluate the machine learning algorithms for efficiency and compare their performance with the new combined approach, for the same classification problem. Later we show the effects of feature selection on the efficiency of our combined approach as well as on other machine learning techniques. The aim of this work is to analyze the techniques that increase the efficiency of the learners

    Text miner's little helper: scalable self-tuning methodologies for knowledge exploration

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Speech Recognition

    Get PDF
    Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

    Acta Cybernetica : Volume 19. Number 4.

    Get PDF

    A Data Mining Toolbox for Collaborative Writing Processes

    Get PDF
    Collaborative writing (CW) is an essential skill in academia and industry. Providing support during the process of CW can be useful not only for achieving better quality documents, but also for improving the CW skills of the writers. In order to properly support collaborative writing, it is essential to understand how ideas and concepts are developed during the writing process, which consists of a series of steps of writing activities. These steps can be considered as sequence patterns comprising both time events and the semantics of the changes made during those steps. Two techniques can be combined to examine those patterns: process mining, which focuses on extracting process-related knowledge from event logs recorded by an information system; and semantic analysis, which focuses on extracting knowledge about what the student wrote or edited. This thesis contributes (i) techniques to automatically extract process models of collaborative writing processes and (ii) visualisations to describe aspects of collaborative writing. These two techniques form a data mining toolbox for collaborative writing by using process mining, probabilistic graphical models, and text mining. First, I created a framework, WriteProc, for investigating collaborative writing processes, integrated with the existing cloud computing writing tools in Google Docs. Secondly, I created new heuristic to extract the semantic nature of text edits that occur in the document revisions and automatically identify the corresponding writing activities. Thirdly, based on sequences of writing activities, I propose methods to discover the writing process models and transitional state diagrams using a process mining algorithm, Heuristics Miner, and Hidden Markov Models, respectively. Finally, I designed three types of visualisations and made contributions to their underlying techniques for analysing writing processes. All components of the toolbox are validated against annotated writing activities of real documents and a synthetic dataset. I also illustrate how the automatically discovered process models and visualisations are used in the process analysis with real documents written by groups of graduate students. I discuss how the analyses can be used to gain further insight into how students work and create their collaborative documents

    Puheen ja tekstin välisen tilastollisen assosiaation itseohjautuva oppiminen

    Get PDF
    One of the key challenges in artificial cognitive systems is to develop effective algorithms that learn without human supervision to understand qualitatively different realisations of the same abstraction and therefore also acquire an ability to transcribe a sensory data stream to completely different modality. This is also true in the so-called Big Data problem. Through learning of associations between multiple types of data of the same phenomenon, it is possible to capture hidden dynamics that govern processes that yielded the measured data. In this thesis, a methodological framework for automatic discovery of statistical associations between two qualitatively different data streams is proposed. The simulations are run on a noisy, high bit-rate, sensory signal (speech) and temporally discrete categorical data (text). In order to distinguish the approach from traditional automatic speech recognition systems, it does not utilize any phonetic or linguistic knowledge in the recognition. It merely learns statistically sound units of speech and text and their mutual mappings in an unsupervised manner. The experiments on child directed speech with limited vocabulary show that, after a period of learning, the method acquires a promising ability to transcribe continuous speech to its textual representation.Keinoälyn toteuttamisessa vaikeimpia haasteita on kehittää ohjaamattomia oppimismenetelmiä, jotka oppivat yhdistämään saman abstraktin käsitteen toteutuksen useassa eri modaaliteeteissa ja vieläpä kuvailemaan aistihavainnon jossain toisessa modaaliteetissa, missä havainto tapahtuu. Vastaava pätee myös niin kutsutun Big Data ongelman yhteydessä. Samasta ilmiöstä voi usein saada monimuotoista mittaustuloksia. Selvittämällä näiden tietovirtojen keskinäiset yhteydet voidaan mahdollisesti oppia ymmärtämään ilmiön taustalla olevia prosesseja ja piilevää dynamiikkaa. Tässä diplomityössä esitellään menetelmällinen tapa löytää automaattisesti tilastolliset yhteydet kahden ominaisuuksiltaan erilaisen tietovirran välille. Menetelmää simuloidaan kohinaisella sekä korkea bittinopeuksisella aistihavaintosignaalilla (puheella) ja ajallisesti diskreetillä kategorisella datalla (tekstillä). Erotuksena perinteisiin automaattisiin puheentunnistusmenetelmiin esitetty menetelmä ei hyödynnä tunnistuksessa lainkaan foneettista tai kielitieteellistä tietämystä. Menetelmä ainoastaan oppii ohjaamattomasti tilastollisesti vahvat osaset puheesta ja tekstistä sekä niiden väliset yhteydet. Kokeet pikkulapselle suunnatulla, sanastollisesti rajoitetulla puheella osoitti, että oppimisjakson jälkeen menetelmällä saavutetaan lupaava kyky muuntaa puhetta tekstiks
    corecore