2 research outputs found

    Information filtering in high velocity text streams using limited memory - An event-driven approach to text stream analysis

    Get PDF
    This dissertation is concerned with the processing of high velocity text streams using event processing means. It comprises a scientific approach for combining the area of information filtering and event processing. In order to be able to process text streams within event driven means, an event reference model was developed that allows for the conversion of unstructured or semi-structured text streams into discrete event types on which event processing engines can operate. Additionally, a set of essential reference processes in the domain of information filtering and text stream analysis were described using event-driven concepts. In a second step, a reference architecture was designed that described essential architectural components required for the design of information ltering and text stream analysis systems in an event-driven manner. Further to this, a set of architectural patterns for building event driven text analysis systems was derived that support the design and implementation of such systems. Subsequently, a prototype was built using the theoretic foundations. This system was initially used to study the effect of sliding window sizes on the properties of dynamic sub-corpora. It could be shown that small sliding window based corpora are similar to larger sliding windows and thus can be used as a resource-saving alternative. Next, a study of several linguistic aspects of text streams was undertaken that showed that event stream summary statistics can provide interesting insights into the characteristics of high velocity text streams. Finally, four essential information filtering and text stream analysis components were studied, viz. filter policies, term weighting, thresholds and query expansion. These were studied using three temporal search profile types and were evaluated using standard information retrieval performance measures. The goal was to study the efficiency of traditional as well as new algorithms within the given context of high velocity text stream data, in order to provide advise which methods work best. The results of this dissertation are intended to provide software architects and developers with valuable information for the design and implementation of event-driven text stream analysis systems

    Corpus Expansion for Neural CWS on Microblog-Oriented Data with <i>λ</i>-Active Learning Approach

    No full text
    corecore