3 research outputs found
Financial information extraction using pre-defined and user-definable templates in the Lolita system
Financial operators have today access to an extremely large amount of data, both quantitative and qualitative, real-time or historical and can use this information to support their decision-making process. Quantitative data are largely processed by automatic computer programs, often based on artificial intelligence techniques, that produce quantitative analysis, such as historical price analysis or technical analysis of price behaviour. Differently, little progress has been made in the processing of qualitative data, which mainly consists of financial news articles from financial newspapers or on-line news providers. As a result the financial market players are overloaded with qualitative information which is potentially extremely useful but, due to the lack of time, is often ignored. The goal of this work is to reduce the qualitative data-overload of the financial operators. The research involves the identification of the information in the source financial articles which is relevant for the financial operators' investment decision making process and to implement the associated templates in the LOLITA system. The system should process a large number of source articles and extract specific templates according to the relevant information located in the source articles. The project also involves the design and implementation in LOLITA of a user- definable template interface for allowing the users to easily design new templates using sentences in natural language. This allows user-defined information extraction from source texts. This differs from most of existing information extraction systems which require the developers to code the templates directly in the system. The results of the research have shown that the system performed well in the extraction of financial templates from source articles which would allow the financial operator to reduce his qualitative data-overload. The results have also shown that the user-definable template interface is a viable approach to user-defined information extraction. A trade-off has been identified between the ease of use of the user-definable template interface and the loss of performance compared to hand- coded templates
Research in the Language, Information and Computation Laboratory of the University of Pennsylvania
This report takes its name from the Computational Linguistics Feedback Forum (CLiFF), an informal discussion group for students and faculty. However the scope of the research covered in this report is broader than the title might suggest; this is the yearly report of the LINC Lab, the Language, Information and Computation Laboratory of the University of Pennsylvania.
It may at first be hard to see the threads that bind together the work presented here, work by faculty, graduate students and postdocs in the Computer Science and Linguistics Departments, and the Institute for Research in Cognitive Science. It includes prototypical Natural Language fields such as: Combinatorial Categorial Grammars, Tree Adjoining Grammars, syntactic parsing and the syntax-semantics interface; but it extends to statistical methods, plan inference, instruction understanding, intonation, causal reasoning, free word order languages, geometric reasoning, medical informatics, connectionism, and language acquisition.
Naturally, this introduction cannot spell out all the connections between these abstracts; we invite you to explore them on your own. In fact, with this issue it’s easier than ever to do so: this document is accessible on the “information superhighway”. Just call up http://www.cis.upenn.edu/~cliff-group/94/cliffnotes.html
In addition, you can find many of the papers referenced in the CLiFF Notes on the net. Most can be obtained by following links from the authors’ abstracts in the web version of this report.
The abstracts describe the researchers’ many areas of investigation, explain their shared concerns, and present some interesting work in Cognitive Science. We hope its new online format makes the CLiFF Notes a more useful and interesting guide to Computational Linguistics activity at Penn
Design of Interactive Feature Space Construction Protocol
Machine learning deals with designing systems that learn from data i.e. automatically improve
with experience. Systems gain experience by detecting patterns or regularities and using them for
making predictions. These predictions are based on the properties that the system learns from the
data. Thus when we say a machine learns, it means it has changed in a way that allows it to
perform more efficiently than before. Machine learning is emerging as an important technology
for solving a number of applications involving natural language processing applications, medical
diagnosis, game playing or financial applications. Wide variety of machine learning approaches
have been developed and used for a number of applications.
We first review the work done in the field of machine learning and analyze various concepts
about machine learning that are applicable to the work presented in this thesis. Next we examine
active machine learning for pipelining of an important natural language application i.e.
information extraction, in which the task of prediction is carried out in different stages and the
output of each stage serves as an input to the next stage.
A number of machine learning algorithms have been developed for different applications.
However no single machine learning algorithm can be used appropriately for all learning
problems. It is not possible to create a general learner for all problems because there are varied
types of real world datasets that cannot be handled by a single learner. For this purpose an
evaluation of the machine learning algorithms is needed. We present an experiment for the
evaluation of various state-of-the-art machine learning algorithms using an interactive machine
learning tool called WEKA (Waikato Environment for Knowledge Analysis). Evaluation is
carried out with the purpose of finding an optimal solution for a real world learning problemcredit
approval used in banks. It is a classification problem.
Finally, we present an approach of combining various learners with the aim of increasing their
efficiency. We present two experiments that evaluate the machine learning algorithms for
efficiency and compare their performance with the new combined approach, for the same
classification problem. Later we show the effects of feature selection on the efficiency of our
combined approach as well as on other machine learning techniques. The aim of this work is to
analyze the techniques that increase the efficiency of the learners