42 research outputs found

    Effect of calculating Pointwise Mutual Information using a Fuzzy Sliding Window in Topic Modeling

    Get PDF
    Topic modeling is a popular method for analysing large amounts of unstructured text data and extracting meaningful insights. The coherence of the generated topics is a critical metric for determining the model quality and measuring the semantic relatedness of the words in a topic. The distributional hypothesis, a fundamental theory in linguistics, states that words occurring in the same contexts tend to have similar meanings. Based on this theory, word co-occurrence in a given context is often used to reflect word association in coherence scores. To this end, many coherence scores use Normalised Pointwise Mutual Information (NPMI), which uses a sliding window to describe the neighbourhood that defines the context. It is assumed that there is no other structure in the neighbourhood except for the presence of words. Inspired by the distributional hypothesis, we hypothesise the word distance to be relevant for determining the word association. Hence, we propose using a fuzzy sliding window to define a neighbourhood in which the association between words depends on the membership of the words in the fuzzy sliding window. To this end, we propose Fuzzy Normalized Pointwise Mutual Information (FNPMI) to calculate fuzzy coherence scores. We implement two different neighbourhood structures by the definition of the membership function of the sliding window.In the first implementation, the association between two words correlates positively with the distance, whereas the correlation is negative in the second. We compare the correlation of our proposed new coherence metrics with human judgment. We find that the use of a fuzzy sliding window correlates less with human judgment than a crisp sliding window. This finding indicates that word distance within a window is less important than defining the window size itself

    SCRE:special cargo relation extraction using representation learning

    Get PDF
    The airfreight industry of shipping goods with special handling needs, also known as special cargo, often deals with non-transparent data and outdated technology, resulting in significant inefficiency. A special cargo ontology is a means of extracting, structuring, and storing domain knowledge and representing the concepts and relationships that can be processed by computers. This ontology can be used as the base of semantic data retrieval in many artificial intelligence applications, such as planning for special cargo shipments. Domain information extraction is an essential task in implementing and maintaining special cargo ontology. However, the absence of domain information makes instantiating the cargo ontology challenging. We propose a relation representation learning approach based on a hierarchical attention-based multi-task model and leverage it in the special cargo domain. The proposed relation representation learning architecture is applied for identifying and categorizing samples of various relation types in the special cargo ontology. The model is trained with domain-specific documents on a number of semantic tasks that vary from lightweight tasks in the bottom layers to the heavyweight tasks in the top layers of the model in a hierarchical setting. Therefore, it conveys complementary input features and learns a rich representation. We also train a domain-specific relation representation model that relies only on an entity-linked corpus of cargo shipment domain. These two relation representation models are then employed in a supervised multi-class classifier called Special Cargo Relation Extractor (SCRE). The results of the experiments show that the proposed relation representation models can represent the complex semantic information of the special cargo domain efficiently.</p

    A Comparative Study of Fuzzy Topic Models and LDA in terms of Interpretability

    Get PDF
    In many domains that employ machine learning models, both high performing and interpretable models are needed. A typical machine learning task is text classification, where models are hardly interpretable. Topic models, used as topic embeddings, carry the potential to better understand the decisions made by text classification algorithms. With this goal in mind, we propose two new fuzzy topic models; FLSA-W and FLSA-V. Both models are derived from the topic model Fuzzy Latent Semantic Analysis (FLSA). After training each model ten times, we use the mean coherence score to compare the different models with the benchmark models Latent Dirichlet Allocation (LDA) and FLSA. Our proposed models generally lead to higher coherence scores and lower standard deviations than the benchmark models. These proposed models are specifically useful as topic embeddings in text classification, since the coherence scores do not drop for a high number of topics, as opposed to the decay that occurs with LDA and FLSA

    A Comparative Study of Fuzzy Topic Models and LDA in terms of Interpretability

    Get PDF
    In many domains that employ machine learning models, both high performing and interpretable models are needed. A typical machine learning task is text classification, where models are hardly interpretable. Topic models, used as topic embeddings, carry the potential to better understand the decisions made by text classification algorithms. With this goal in mind, we propose two new fuzzy topic models; FLSA-W and FLSA-V. Both models are derived from the topic model Fuzzy Latent Semantic Analysis (FLSA). After training each model ten times, we use the mean coherence score to compare the different models with the benchmark models Latent Dirichlet Allocation (LDA) and FLSA. Our proposed models generally lead to higher coherence scores and lower standard deviations than the benchmark models. These proposed models are specifically useful as topic embeddings in text classification, since the coherence scores do not drop for a high number of topics, as opposed to the decay that occurs with LDA and FLSA

    Knowledge Modelling and Incident Analysis for Special Cargo

    Get PDF
    The airfreight industry of shipping goods with special handling needs, also known as special cargo, suffers from nontransparent shipping processes, resulting in inefficiency. The LARA project (Lane Analysis and Route Advisor) aims at addressing these limitations and bringing innovation in special cargo route planning so as to improve operational deficiencies and customer services. In this chapter, we discuss the special cargo domain knowledge elicitation and modeling into an ontology. We also present research into cargo incidents, namely, automatic classification of incidents in free-text reports and experiments in detecting significant features associated with specific cargo incident types. Our work mainly addresses two of the main technical priority areas defined by the European Big Data Value (BDV) Strategic Research and Innovation Agenda, namely, the application of data analytics to improve data understanding and providing optimized architectures for analytics of data-at-rest and data-in-motion, the overall goal is to develop technologies contributing to the data value chain in the logistics sector. It addresses the horizontal concerns Data Analytics, Data Processing Architectures, and Data Management of the BDV Reference Model. It also addresses the vertical dimension Big Data Types and Semantics

    UvT: The UvT Term extraction system in the keyphrase extraction task

    No full text
    The UvT system is based on a hybrid, linguistic and statistical approach, originally proposed for the recognition of multiword terminological phrases, the C-value method (Frantzi et al., 2000). In the UvT implementation, we use an extended noun phrase rule set and take into consideration orthographic and morphological variation, term abbreviations and acronyms, and basic document structure information.
    corecore