68 research outputs found

    Toward Learning Systems: A Validated Methodfor Classifying Knowledge Queries

    Get PDF
    Organizations currently possess a vast and rapidly growing amount of information--much of it residing in corporate databases, generated as a by-product of transaction automation. Although this information is a potentially rich source of knowledge about production processes, few companies have begun to implement learning systems to leverage the value of this stored data and information (Bohn 1994). At the same time there is a broad and expanding array of technologies, tools, and models to assist in deriving knowledge from data. These include technological areas such as knowledge discovery in databases, machine learning, statistics, neural networks, expert systems, and case-based reasoning (Piatetsky-Shapiro and Frawley 1991). These technologies and tools possess a broad range of capabilities for inductive, deductive and analogical reasoning approaches to the creation and validation of knowledge. The literature of computer science, information systems, and statistics contains a vast number of studies comparing the most similar types of learning algorithms from a very technical perspective (Curram and Mingers 1994; Kodratoff 1988; Weiss and Kulikowski 1991) and specialists exist in each of the technology areas. However organizations faced with planning learning systems cannot normally assemble a team with expertise in each of the potentially important technology areas. They must start by identifying critical questions and learning goals, which are driven by business context and unconstrained by the capabilities of particular technologies. The premise of the research is that there is a growing need for guiding frameworks and methods to help organizations assess their learning needs--starting from the broadened perspective of business knowledge requirements --and match them with the most suitable categories of learning technologies, models, tools and specialists (Keen 1994). This report describes research that is underway to develop a classification theory to serve as the foundation for technology selection stages of such a metho

    Learning Features that Predict Cue Usage

    Get PDF
    Our goal is to identify the features that predict the occurrence and placement of discourse cues in tutorial explanations in order to aid in the automatic generation of explanations. Previous attempts to devise rules for text generation were based on intuition or small numbers of constructed examples. We apply a machine learning program, C4.5, to induce decision trees for cue occurrence and placement from a corpus of data coded for a variety of features previously thought to affect cue usage. Our experiments enable us to identify the features with most predictive power, and show that machine learning can be used to induce decision trees useful for text generation.Comment: 10 pages, 2 Postscript figures, uses aclap.sty, psfig.te

    Seimo posėdžių stenogramų tekstynas autorystės nustatymo bei autoriaus profilio sudarymo tyrimams

    Get PDF
    In our paper we present a corpus of transcribed Lithuanian parliamentary speeches. The corpus is prepared in a specific format, appropriate for different authorship identification tasks. The corpus consists of approximately 111 thousand texts (24 million words). Each text matches one parliamentary speech produced during an ordinary session from the period of 7 parliamentary terms starting on March 10, 1990 and ending on December 23, 2013. The texts are grouped into 147 categories corresponding to individual authors, therefore they can be used for authorship attribution tasks; besides, these texts are also grouped according to age, gender and political views, therefore they are also suitable for author profiling tasks. Whereas short texts complicate recognition of author speaking style and are ambiguous in relation to the style of other authors, we incorporated only texts containing not less than 100 words into the corpus. In order to make each category as comprehensive and representative as possible, we included only those authors, who produced speeches at least 200 times. All the texts are lemmatized, morphologically and syntactically annotated, tokenized into the character n-grams. The statistical information of the corpus is also available. We have also demonstrated that the created corpus can be effectively used in authorship attribution and author profiling tasks with supervised machine learning methods. The corpus structure also allows using it with unsupervised machine learning methods and can be used for creation of rule-based methods, as well as in different linguistic analyses. Straipsnyje pristatome Seimo posėdžių stenogramų tekstyną, parengtą specialiu formatu, tinkančiu įvairiems autorystės nustatymo tyrimams. Tekstyną sudaro apie 111 tūkstančių tekstų (24 milijonai žodžių), kurių kiekvienas atitinka vieną parlamentaro pasisakymą eilinės sesijos posėdžio metu bei apima 7 Lietuvos Respublikos Seimo kadencijas: nuo 1990 metų kovo 10 dienos iki 2013 metų gruodžio 23 dienos. Pasisakymų tekstai sugrupuoti pagal autorius į 147 grupes, todėl tinka individualių autorių autorystės nustatymo tyrimams; jie suskirstyti pagal autorių amžiaus grupes, lytį ar politines pažiūras, todėl tinka autorių profilio sudarymo tyrimams. Trumpas tekstas neatskleidžia jo autoriaus kalbėjimo stiliaus, yra daugiaprasmiškas kitų autorių atžvilgiu, todėl į tekstyną įtraukti ne trumpesni nei 100 žodžių tekstai. Kiekvieną autorių atitinkantis tekstų rinkinys turi būti išsamus ir reprezentatyvus, todėl įtraukti autoriai, pasisakę ne mažiau kaip 200 kartų. Visi tekstai automatiškai lemuoti, morfologiškai bei sintaksiškai anotuoti, suskaidyti simbolių n-gramomis, surinkta statistinė informacija. Straipsnyje pademonstruota, kaip sukurtas tekstynas gali būti panaudotas individualių autorių autorystės nustatymo bei autorių profilio sudarymo tyrimams, naudojant prižiūrimo mašininio mokymo metodus. Tekstyno struktūra taip pat leidžia taikyti neprižiūrimo Ligita Šarkutė Viešosios politikos ir administravimo institutas Kauno technologijos universitetas K. Donelaičio g. 20-217 LT-44239 Kaunas, Lietuva El. paštas: [email protected] 28 mašininio mokymo metodus, patogi taisyklinių-loginių metodų kūrimui bei įvairioms lingvistinėms analizėms

    Abductive Reasoning in Multiple Fault Diagnosis

    Get PDF
    Abductive reasoning involves generating an explanation for a given set of observations about the world. Abduction provides a good reasoning framework for many AI problems, including diagnosis, plan recognition and learning. This paper focuses on the use of abductive reasoning in diagnostic systems in which there may be more than one underlying cause for the observed symptoms. In exploring this topic, we will review and compare several different approaches, including Binary Choice Bayesian, Sequential Bayesian, Causal Model Based Abduction, Parsimonious Set Covering, and the use of First Order Logic. Throughout the paper we will use as an example a simple diagnostic problem involving automotive troubleshooting

    Memory-Based Shallow Parsing

    Full text link
    We present memory-based learning approaches to shallow parsing and apply these to five tasks: base noun phrase identification, arbitrary base phrase recognition, clause detection, noun phrase parsing and full parsing. We use feature selection techniques and system combination methods for improving the performance of the memory-based learner. Our approach is evaluated on standard data sets and the results are compared with that of other systems. This reveals that our approach works well for base phrase identification while its application towards recognizing embedded structures leaves some room for improvement
    corecore