11 research outputs found

    Enhanced services for targeted information retrieval by event extraction and data mining

    Get PDF
    Where Information Retrieval (IR) and Text Categorization delivers a set of (ranked) documents according to a query, users of large document collections would rather like to receive answers. Question-answering from text has already been the goal of the Message Understanding Conferences. Since then, the task of text understanding has been reduced to several more tractable tasks, most prominently Named Entity Recognition (NER) and Relation Extraction. Now, pieces can be put together to form enhanced services added on an IR system. In this paper, we present a framework which combines standard IR with machine learning and (pre-)processing for NER in order to extract events from a large document collection. Some questions can already be answered by particular events. Other questions require an analysis of a set of events. Hence, the extracted events become input to another machine learning process which delivers the final output to the user's question. Our case study is the public collection of minutes of plenary sessions of the German parliament and of petitions to the German parliament. --

    Handling Tree-Structured Values in RapidMiner

    Get PDF
    Attribute value types play an important role in mostly every datamin- ing task. Most learners, for instance, are restricted to particular value types. The usage of such learners is just possible after special forms of preprocessing. RapidMiner most commonly distinguishes between nom- inal and numerical values which are well-known to every RapidMiner- user. Although, covering a great fraction of attribute types being present in nowadays datamining tasks, nominal and numerical attribute values are not sufficient for every type of feature. In this work we are focusing on attribute values containing a tree-structure. We are presenting the handling and especially the possibilities to use tree-structured data for modelling. Additionally, we are introducing particular tasks which are offering tree-structured data and might benefit from using those struc- tures for modelling. All methods presented in this paper are contained in the Information Extraction Plugin1 for RapidMiner

    Tree Kernel Usage in Naive Bayes Classifiers

    Get PDF
    We present a novel approach in machine learning by combining naive Bayes classifiers with tree kernels. Tree kernel methods produce promising results in machine learning tasks containing treestructured attribute values. These kernel methods are used to compare two tree-structured attribute values recursively. Up to now tree kernels are only used in kernel machines like Support Vector Machines or Perceptrons. In this paper, we show that tree kernels can be utilized in a naive Bayes classifier enabling the classifier to handle tree-structured values. We evaluate our approach on three datasets containing tree-structured values. We show that our approach using tree-structures delivers significantly better results in contrast to approaches using non-structured (flat) features extracted from the tree. Additionally, we show that our approach is significantly faster than comparable kernel machines in several settings which makes it more useful in resource-aware settings like mobile devices

    Enhanced Services for Targeted Information Retrieval by Event Extraction and Data Mining

    Get PDF
    Where Information Retrieval (IR) and Text Categorization delivers a set of (ranked) documents according to a query, users of large document collections would rather like to receive answers. Questionanswering from text has already been the goal of the Message Understanding Conferences. Since then, the task of text understanding has been reduced to several more tractable tasks, most prominently Named Entity Recognition (NER) and Relation Extraction. Now, pieces can be put together to form enhanced services added on an IR system. In this paper, we present a framework which combines standard IR with machine learning and (pre-)processing for NER in order to extract events from a large document collection. Some questions can already be answered by particular events. Other questions require an analysis of a set of events. Hence, the extracted events become input to another machine learning process which delivers the final output to the user’s question. Our case study is the public collection of minutes of plenary sessions of the German parliament and of petitions to the German parliament.

    About the exploration of data mining techniques using structured features for information extraction

    Get PDF
    The World Wide Web is a huge source of information. The amount of information being available in the World Wide Web becomes bigger and bigger every day. It is impossible to handle this amount of information by hand. Special techniques have to be used to deliver smaller excerpts of information which become manageable. Unfortunately, these techniques like search engines, for instance, just deliver a certain view of the informations original appearance. The delivered information is present in various types of les like websites, text documents, video clips, audio files and the like. The extraction of relevant and interesting pieces of information out of these files is very complex and time-consuming. Special techniques which allow for an automatic extraction of interesting informational units are analyzed in this work. Such techniques are based on Machine Learning methods. In contrast to traditional Machine Learning tasks the processing of text documents in this context needs certain techniques. The structure of natural language contained in text document poses constraints which should be respected by the Machine Learning method. These constraints and the specially tuned methods respecting them are another important aspect in this work. After defining all needed formalisms of Machine Learning which are used in this work, I present multiple approaches of Machine Learning applicable to the fields of Information Extraction. I describe the historical development from first approaches of Information Extraction over Named Entity Recognition to the point of Relation Extraction. The possibilities of using linguistic resources for the creation of feature sets for Information Extraction purposes are presented. I show how Relation Extraction is formally defined, and I additionally show what kind of methods are used for Relation Extraction in Machine Learning. I focus on Relation Extraction techniques which benefit on the one hand from minimum optimization and on the other hand from efficient data structure. Most of the experiments and implementations described in this work were done using the open source framework for Data Mining RapidMiner. To apply this framework on Information Extraction tasks I developed an extension called Information Extraction Plugin which is exhaustively described. Finally, I present applications which explicitly benefit from the collaboration of Data Mining and Information Extraction

    Towards Adjusting Mobile Devices To User's Behaviour

    Get PDF
    Mobile devices are a special class of resource-constrained em- bedded devices. Computing power, memory, the available energy, and network bandwidth are often severely limited. These constrained re- sources require extensive optimization of a mobile system compared to larger systems. Any needless operation has to be avoided. Time- consuming operations have to be started early on. For instance, load- ing files ideally starts before the user wants to access the file. So-called prefetching strategies optimize system’s operation. Our goal is to ad- just such strategies on the basis of logged system data. Optimization is then achieved by predicting an application’s behavior based on facts learned from earlier runs on the same system. In this paper, we ana- lyze system-calls on operating system level and compare two paradigms, namely server-based and device-based learning. The results could be used to optimize the runtime behaviour of mobile devices

    Providing Information by Resource- Constrained Data Analysis

    Get PDF
    The Collaborative Research Center SFB 876 (Providing Information by Resource-Constrained Data Analysis) brings together the research fields of data analysis (Data Mining, Knowledge Discovery in Data Bases, Machine Learning, Statistics) and embedded systems and enhances their methods such that information from distributed, dynamic masses of data becomes available anytime and anywhere. The research center approaches these problems with new algorithms respecting the resource constraints in the different scenarios. This Technical Report presents the work of the members of the integrated graduate school
    corecore