4 research outputs found

    An enhanced sequential exception technique for semantic-based text anomaly detection

    Get PDF
    The detection of semantic-based text anomaly is an interesting research area which has gained considerable attention from the data mining community. Text anomaly detection identifies deviating information from general information contained in documents. Text data are characterized by having problems related to ambiguity, high dimensionality, sparsity and text representation. If these challenges are not properly resolved, identifying semantic-based text anomaly will be less accurate. This study proposes an Enhanced Sequential Exception Technique (ESET) to detect semantic-based text anomaly by achieving five objectives: (1) to modify Sequential Exception Technique (SET) in processing unstructured text; (2) to optimize Cosine Similarity for identifying similar and dissimilar text data; (3) to hybridize modified SET with Latent Semantic Analysis (LSA); (4) to integrate Lesk and Selectional Preference algorithms for disambiguating senses and identifying text canonical form; and (5) to represent semantic-based text anomaly using First Order Logic (FOL) and Concept Network Graph (CNG). ESET performs text anomaly detection by employing optimized Cosine Similarity, hybridizing LSA with modified SET, and integrating it with Word Sense Disambiguation algorithms specifically Lesk and Selectional Preference. Then, FOL and CNG are proposed to represent the detected semantic-based text anomaly. To demonstrate the feasibility of the technique, four selected datasets namely NIPS data, ENRON, Daily Koss blog, and 20Newsgroups were experimented on. The experimental evaluation revealed that ESET has significantly improved the accuracy of detecting semantic-based text anomaly from documents. When compared with existing measures, the experimental results outperformed benchmarked methods with an improved F1-score from all datasets respectively; NIPS data 0.75, ENRON 0.82, Daily Koss blog 0.93 and 20Newsgroups 0.97. The results generated from ESET has proven to be significant and supported a growing notion of semantic-based text anomaly which is increasingly evident in existing literatures. Practically, this study contributes to topic modelling and concept coherence for the purpose of visualizing information, knowledge sharing and optimized decision making

    Neural Methods for Effective, Efficient, and Exposure-Aware Information Retrieval

    Get PDF
    Neural networks with deep architectures have demonstrated significant performance improvements in computer vision, speech recognition, and natural language processing. The challenges in information retrieval (IR), however, are different from these other application areas. A common form of IR involves ranking of documents--or short passages--in response to keyword-based queries. Effective IR systems must deal with query-document vocabulary mismatch problem, by modeling relationships between different query and document terms and how they indicate relevance. Models should also consider lexical matches when the query contains rare terms--such as a person's name or a product model number--not seen during training, and to avoid retrieving semantically related but irrelevant results. In many real-life IR tasks, the retrieval involves extremely large collections--such as the document index of a commercial Web search engine--containing billions of documents. Efficient IR methods should take advantage of specialized IR data structures, such as inverted index, to efficiently retrieve from large collections. Given an information need, the IR system also mediates how much exposure an information artifact receives by deciding whether it should be displayed, and where it should be positioned, among other results. Exposure-aware IR systems may optimize for additional objectives, besides relevance, such as parity of exposure for retrieved items and content publishers. In this thesis, we present novel neural architectures and methods motivated by the specific needs and challenges of IR tasks.Comment: PhD thesis, Univ College London (2020
    corecore