3 research outputs found

    Financial information extraction using pre-defined and user-definable templates in the Lolita system

    Get PDF
    Financial operators have today access to an extremely large amount of data, both quantitative and qualitative, real-time or historical and can use this information to support their decision-making process. Quantitative data are largely processed by automatic computer programs, often based on artificial intelligence techniques, that produce quantitative analysis, such as historical price analysis or technical analysis of price behaviour. Differently, little progress has been made in the processing of qualitative data, which mainly consists of financial news articles from financial newspapers or on-line news providers. As a result the financial market players are overloaded with qualitative information which is potentially extremely useful but, due to the lack of time, is often ignored. The goal of this work is to reduce the qualitative data-overload of the financial operators. The research involves the identification of the information in the source financial articles which is relevant for the financial operators' investment decision making process and to implement the associated templates in the LOLITA system. The system should process a large number of source articles and extract specific templates according to the relevant information located in the source articles. The project also involves the design and implementation in LOLITA of a user- definable template interface for allowing the users to easily design new templates using sentences in natural language. This allows user-defined information extraction from source texts. This differs from most of existing information extraction systems which require the developers to code the templates directly in the system. The results of the research have shown that the system performed well in the extraction of financial templates from source articles which would allow the financial operator to reduce his qualitative data-overload. The results have also shown that the user-definable template interface is a viable approach to user-defined information extraction. A trade-off has been identified between the ease of use of the user-definable template interface and the loss of performance compared to hand- coded templates

    Research in the Language, Information and Computation Laboratory of the University of Pennsylvania

    Get PDF
    This report takes its name from the Computational Linguistics Feedback Forum (CLiFF), an informal discussion group for students and faculty. However the scope of the research covered in this report is broader than the title might suggest; this is the yearly report of the LINC Lab, the Language, Information and Computation Laboratory of the University of Pennsylvania. It may at first be hard to see the threads that bind together the work presented here, work by faculty, graduate students and postdocs in the Computer Science and Linguistics Departments, and the Institute for Research in Cognitive Science. It includes prototypical Natural Language fields such as: Combinatorial Categorial Grammars, Tree Adjoining Grammars, syntactic parsing and the syntax-semantics interface; but it extends to statistical methods, plan inference, instruction understanding, intonation, causal reasoning, free word order languages, geometric reasoning, medical informatics, connectionism, and language acquisition. Naturally, this introduction cannot spell out all the connections between these abstracts; we invite you to explore them on your own. In fact, with this issue it’s easier than ever to do so: this document is accessible on the “information superhighway”. Just call up http://www.cis.upenn.edu/~cliff-group/94/cliffnotes.html In addition, you can find many of the papers referenced in the CLiFF Notes on the net. Most can be obtained by following links from the authors’ abstracts in the web version of this report. The abstracts describe the researchers’ many areas of investigation, explain their shared concerns, and present some interesting work in Cognitive Science. We hope its new online format makes the CLiFF Notes a more useful and interesting guide to Computational Linguistics activity at Penn

    Design of Interactive Feature Space Construction Protocol

    Get PDF
    Machine learning deals with designing systems that learn from data i.e. automatically improve with experience. Systems gain experience by detecting patterns or regularities and using them for making predictions. These predictions are based on the properties that the system learns from the data. Thus when we say a machine learns, it means it has changed in a way that allows it to perform more efficiently than before. Machine learning is emerging as an important technology for solving a number of applications involving natural language processing applications, medical diagnosis, game playing or financial applications. Wide variety of machine learning approaches have been developed and used for a number of applications. We first review the work done in the field of machine learning and analyze various concepts about machine learning that are applicable to the work presented in this thesis. Next we examine active machine learning for pipelining of an important natural language application i.e. information extraction, in which the task of prediction is carried out in different stages and the output of each stage serves as an input to the next stage. A number of machine learning algorithms have been developed for different applications. However no single machine learning algorithm can be used appropriately for all learning problems. It is not possible to create a general learner for all problems because there are varied types of real world datasets that cannot be handled by a single learner. For this purpose an evaluation of the machine learning algorithms is needed. We present an experiment for the evaluation of various state-of-the-art machine learning algorithms using an interactive machine learning tool called WEKA (Waikato Environment for Knowledge Analysis). Evaluation is carried out with the purpose of finding an optimal solution for a real world learning problemcredit approval used in banks. It is a classification problem. Finally, we present an approach of combining various learners with the aim of increasing their efficiency. We present two experiments that evaluate the machine learning algorithms for efficiency and compare their performance with the new combined approach, for the same classification problem. Later we show the effects of feature selection on the efficiency of our combined approach as well as on other machine learning techniques. The aim of this work is to analyze the techniques that increase the efficiency of the learners
    corecore