226 research outputs found
Design of Interactive Feature Space Construction Protocol
Machine learning deals with designing systems that learn from data i.e. automatically improve
with experience. Systems gain experience by detecting patterns or regularities and using them for
making predictions. These predictions are based on the properties that the system learns from the
data. Thus when we say a machine learns, it means it has changed in a way that allows it to
perform more efficiently than before. Machine learning is emerging as an important technology
for solving a number of applications involving natural language processing applications, medical
diagnosis, game playing or financial applications. Wide variety of machine learning approaches
have been developed and used for a number of applications.
We first review the work done in the field of machine learning and analyze various concepts
about machine learning that are applicable to the work presented in this thesis. Next we examine
active machine learning for pipelining of an important natural language application i.e.
information extraction, in which the task of prediction is carried out in different stages and the
output of each stage serves as an input to the next stage.
A number of machine learning algorithms have been developed for different applications.
However no single machine learning algorithm can be used appropriately for all learning
problems. It is not possible to create a general learner for all problems because there are varied
types of real world datasets that cannot be handled by a single learner. For this purpose an
evaluation of the machine learning algorithms is needed. We present an experiment for the
evaluation of various state-of-the-art machine learning algorithms using an interactive machine
learning tool called WEKA (Waikato Environment for Knowledge Analysis). Evaluation is
carried out with the purpose of finding an optimal solution for a real world learning problemcredit
approval used in banks. It is a classification problem.
Finally, we present an approach of combining various learners with the aim of increasing their
efficiency. We present two experiments that evaluate the machine learning algorithms for
efficiency and compare their performance with the new combined approach, for the same
classification problem. Later we show the effects of feature selection on the efficiency of our
combined approach as well as on other machine learning techniques. The aim of this work is to
analyze the techniques that increase the efficiency of the learners
Text miner's little helper: scalable self-tuning methodologies for knowledge exploration
L'abstract è presente nell'allegato / the abstract is in the attachmen
Speech Recognition
Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes
A Data Mining Toolbox for Collaborative Writing Processes
Collaborative writing (CW) is an essential skill in academia and industry. Providing support during the process of CW can be useful not only for achieving better quality documents, but also for improving the CW skills of the writers. In order to properly support collaborative writing, it is essential to understand how ideas and concepts are developed during the writing process, which consists of a series of steps of writing activities. These steps can be considered as sequence patterns comprising both time events and the semantics of the changes made during those steps. Two techniques can be combined to examine those patterns: process mining, which focuses on extracting process-related knowledge from event logs recorded by an information system; and semantic analysis, which focuses on extracting knowledge about what the student wrote or edited. This thesis contributes (i) techniques to automatically extract process models of collaborative writing processes and (ii) visualisations to describe aspects of collaborative writing. These two techniques form a data mining toolbox for collaborative writing by using process mining, probabilistic graphical models, and text mining. First, I created a framework, WriteProc, for investigating collaborative writing processes, integrated with the existing cloud computing writing tools in Google Docs. Secondly, I created new heuristic to extract the semantic nature of text edits that occur in the document revisions and automatically identify the corresponding writing activities. Thirdly, based on sequences of writing activities, I propose methods to discover the writing process models and transitional state diagrams using a process mining algorithm, Heuristics Miner, and Hidden Markov Models, respectively. Finally, I designed three types of visualisations and made contributions to their underlying techniques for analysing writing processes. All components of the toolbox are validated against annotated writing activities of real documents and a synthetic dataset. I also illustrate how the automatically discovered process models and visualisations are used in the process analysis with real documents written by groups of graduate students. I discuss how the analyses can be used to gain further insight into how students work and create their collaborative documents
Puheen ja tekstin välisen tilastollisen assosiaation itseohjautuva oppiminen
One of the key challenges in artificial cognitive systems is to develop effective algorithms that learn without human supervision to understand qualitatively different realisations of the same abstraction and therefore also acquire an ability to transcribe a sensory data stream to completely different modality. This is also true in the so-called Big Data problem. Through learning of associations between multiple types of data of the same phenomenon, it is possible to capture hidden dynamics that govern processes that yielded the measured data.
In this thesis, a methodological framework for automatic discovery of statistical associations between two qualitatively different data streams is proposed. The simulations are run on a noisy, high bit-rate, sensory signal (speech) and temporally discrete categorical data (text). In order to distinguish the approach from traditional automatic speech recognition systems, it does not utilize any phonetic or linguistic knowledge in the recognition. It merely learns statistically sound units of speech and text and their mutual mappings in an unsupervised manner. The experiments on child directed speech with limited vocabulary show that, after a period of learning, the method acquires a promising ability to transcribe continuous speech to its textual representation.Keinoälyn toteuttamisessa vaikeimpia haasteita on kehittää ohjaamattomia oppimismenetelmiä, jotka oppivat yhdistämään saman abstraktin käsitteen toteutuksen useassa eri modaaliteeteissa ja vieläpä kuvailemaan aistihavainnon jossain toisessa modaaliteetissa, missä havainto tapahtuu. Vastaava pätee myös niin kutsutun Big Data ongelman yhteydessä. Samasta ilmiöstä voi usein saada monimuotoista mittaustuloksia. Selvittämällä näiden tietovirtojen keskinäiset yhteydet voidaan mahdollisesti oppia ymmärtämään ilmiön taustalla olevia prosesseja ja piilevää dynamiikkaa.
Tässä diplomityössä esitellään menetelmällinen tapa löytää automaattisesti tilastolliset yhteydet kahden ominaisuuksiltaan erilaisen tietovirran välille. Menetelmää simuloidaan kohinaisella sekä korkea bittinopeuksisella aistihavaintosignaalilla (puheella) ja ajallisesti diskreetillä kategorisella datalla (tekstillä). Erotuksena perinteisiin automaattisiin puheentunnistusmenetelmiin esitetty menetelmä ei hyödynnä tunnistuksessa lainkaan foneettista tai kielitieteellistä tietämystä. Menetelmä ainoastaan oppii ohjaamattomasti tilastollisesti vahvat osaset puheesta ja tekstistä sekä niiden väliset yhteydet. Kokeet pikkulapselle suunnatulla, sanastollisesti rajoitetulla puheella osoitti, että oppimisjakson jälkeen menetelmällä saavutetaan lupaava kyky muuntaa puhetta tekstiks
- …