4 research outputs found

    Search Agent Model: a Conceptual Framework for Search by Algorithms and Agent Systems

    Get PDF
    No abstract available

    Search Agent Model: a Conceptual Framework for Search by Algorithms and Agent Systems

    Get PDF
    No abstract available

    Improving Robustness and Scalability of Available Ner Systems

    Get PDF
    The focus of this research is to study and develop techniques to adapt existing NER resources to serve the needs of a broad range of organizations without expert NLP manpower. My methods emphasize usability, robustness and scalability of existing NER systems to ensure maximum functionality to a broad range of organizations. Usability is facilitated by ensuring that the methodologies are compatible with any available open-source NER tagger or data set, thus allowing organizations to choose resources that are easy to deploy and maintain and fit their requirements. One way of making use of available tagged data would be to aggregate a number of different tagged sets in an effort to increase the coverage of the NER system. Though, generally, more tagged data can mean a more robust NER model, extra data also introduces a significant amount of noise and complexity into the model as well. Because adding in additional training data to scale up an NER system presents a number of challenges in terms of scalability, this research aims to address these difficulties and provide a means for multiple available training sets to be aggregated while reducing noise, model complexity and training times. In an effort to maintain usability, increase robustness and improve scalability, I designed an approach to merge document clustering of the training data with open-source or available NER software packages and tagged data that can be easily acquired and implemented. Here, a tagged training set is clustered into smaller data sets, and models are then trained on these smaller clusters. This is designed not only to reduce noise by creating more focused models, but also to increase scalability and robustness. Document clustering is used extensively in information retrieval, but has never been used in conjunction with NER

    Passage Retrieval for Incorporating Global Evidence in Sequence Labeling

    No full text
    Many forms of linguistic analysis, such as part of speech tagging, named entity recognition, and other sequence labeling tasks are performed on short spans of text and assume statistical dependence within a window of only a few tokens. We propose using passage retrieval to induce non-local dependencies in structured classification that generalizes earlier work in context aggregation for namedentity recognition. We introduce a new method for feature expansion inspired by psuedo-relevance feedback (PRF). Our results on the CoNLL 2003 task show that features from cross-document feature expansion improves NER effectiveness over previous aggregation models. Utilizing all the tokens in a sentence for query context consistently perform best on both intrinsic and extrinsic evaluations. Tagging models incorporating feature expansion outperform the leading NER system when evaluated on out of domain data, a collection of publicly available scanned books on the topic of historic Deerfield, MA. Finally, the results show that retrieval based feature expansion using an external collection of unlabeled text can result in further effectiveness improvements
    corecore