7 research outputs found

    A framework for clustering and adaptive topic tracking on evolving text and social media data streams.

    Get PDF
    Recent advances and widespread usage of online web services and social media platforms, coupled with ubiquitous low cost devices, mobile technologies, and increasing capacity of lower cost storage, has led to a proliferation of Big data, ranging from, news, e-commerce clickstreams, and online business transactions to continuous event logs and social media expressions. These large amounts of online data, often referred to as data streams, because they get generated at extremely high throughputs or velocity, can make conventional and classical data analytics methodologies obsolete. For these reasons, the issues of management and analysis of data streams have been researched extensively in recent years. The special case of social media Big Data brings additional challenges, particularly because of the unstructured nature of the data, specifically free text. One classical approach to mine text data has been Topic Modeling. Topic Models are statistical models that can be used for discovering the abstract ``topics\u27\u27 that may occur in a corpus of documents. Topic models have emerged as a powerful technique in machine learning and data science, providing a great balance between simplicity and complexity. They also provide sophisticated insight without the need for real natural language understanding. However they have not been designed to cope with the type of text data that is abundant on social media platforms, but rather for traditional medium size corpora consisting of longer documents, adhering to a specific language and typically spanning a stable set of topics. Unlike traditional document corpora, social media messages tend to be very short, sparse, noisy, and do not adhere to a standard vocabulary, linguistic patterns, or stable topic distributions. They are also generated at high velocity that impose high demands on topic modeling; and their evolving or dynamic nature, makes any set of results from topic modeling quickly become stale in the face of changes in the textual content and topics discussed within social media streams. In this dissertation, we propose an integrated topic modeling framework built on top of an existing stream-clustering framework called Stream-Dashboard, which can extract, isolate, and track topics over any given time period. In this new framework, Stream Dashboard first clusters the data stream points into homogeneous groups. Then data from each group is ushered to the topic modeling framework which extracts finer topics from the group. The proposed framework tracks the evolution of the clusters over time to detect milestones corresponding to changes in topic evolution, and to trigger an adaptation of the learned groups and topics at each milestone. The proposed approach to topic modeling is different from a generic Topic Modeling approach because it works in a compartmentalized fashion, where the input document stream is split into distinct compartments, and Topic Modeling is applied on each compartment separately. Furthermore, we propose extensions to existing topic modeling and stream clustering methods, including: an adaptive query reformulation approach to help focus on the topic discovery with time; a topic modeling extension with adaptive hyper-parameter and with infinite vocabulary; an adaptive stream clustering algorithm incorporating the automated estimation of dynamic, cluster-specific temporal scales for adaptive forgetting to help facilitate clustering in a fast evolving data stream. Our experimental results show that the proposed adaptive forgetting clustering algorithm can mine better quality clusters; that our proposed compartmentalized framework is able to mine topics of better quality compared to competitive baselines; and that the proposed framework can automatically adapt to focus on changing topics using the proposed query reformulation strategy

    Requirements-oriented methodology for evaluating ontologies

    Get PDF
    Ontologies play key roles in many applications today. Therefore, whether using a newly-specified ontology or an existing ontology for use in its target application, it is important to determine the suitability of an ontology to the application at hand. This need is addressed by carrying out ontology evaluation, which determines qualities of an ontology using methodologies, criteria or measures. However, for addressing the ontology requirements from a given application, it is necessary to determine what the appropriate set of criteria and measures are. In this thesis, we propose a Requirements-Oriented Methodology for Evaluating Ontologies (ROMEO). ROMEO outlines a methodology for determining appropriate methods for ontology evaluation that incorporates a suite of existing ontology evaluation criteria and measures. ROMEO helps ontology engineers to determine relevant ontology evaluation measures for a given set of ontology requirements by linking these requirements to existing ontology evaluation measures through a set of questions. There are three main parts to ROMEO. First, ontology requirements are elicited from a given application and form the basis for an appropriate evaluation of ontologies. Second, appropriate questions are mapped to each ontology requirement. Third, relevant ontology evaluation measures are mapped to each of those questions. From the ontology requirements of an application, ROMEO is used to determine appropriate methods for ontology evaluation by mapping applicable questions to the requirements and mapping those questions to appropriate measures. In this thesis, we perform the ROMEO methodology to obtain appropriate ontology evaluation methods for ontology-driven applications through case studies of Lonely Planet and Wikipedia. Since the mappings determined by ROMEO are dependent on the analysis of the ontology engineer, the validation of these mappings is needed. As such, in addition to proposing the ROMEO methodology, a method for the empirical validation of ROMEO mappings is proposed in this thesis. We report on two empirical validation experiments that are carried out in controlled environments to examine the performance of the ontologies over a set of tasks. These tasks vary and are used to compare the performance of a set of ontologies in the respective experimental environment. The ontologies used vary on a specific ontology quality or measure being examined. Empirical validation experiments are conducted for two mappings between questions and their associated measures, which are drawn from case studies of Lonely Planet and Wikipedia. These validation experiments focus on mappings between questions and their measures. Furthermore, as these mappings are application-independent, they may be reusable in subsequent applications of the ROMEO methodology. Using a ROMEO mapping from the Lonely Planet case study, we validate a mapping of a coverage question to the F-measure. The validation experiment carried out for this mapping was inconclusive, thus requiring further analysis. Using a ROMEO mapping from the Wikipedia case study, we carry out a separate validation experiment examining a mapping between an intersectedness question and the tangledness measure. The results from this experiment showed the mapping to be valid. For future work, we propose additional validation experiments for mappings that have been identified between questions and measures

    Using semantic technologies to resolve heterogeneity issues in sustainability and disaster management knowledge bases

    Get PDF
    This thesis examines issues of semantic heterogeneity in the domains of sustainability indicators and disaster management. We propose a model that links two domains with the following logic. While disaster management implies a proper and efficient response to a risk that has materialised as a disaster, sustainability can be defined as the preparedness to unexpected situations by applying measurements such as sustainability indicators. As a step to this direction, we investigate how semantic technologies can tackle the issues of heterogeneity in the aforementioned domains. First, we consider approaches to resolve the heterogeneity issues of representing the key concepts of sustainability indicator sets. To develop a knowledge base, we apply the METHONTOLOGY approach to guide the construction of two ontology design candidates: generic and specic. Of the two, the generic design is more abstract, with fewer classes and properties. Documents describing two indicator systems - the Global Reporting Initiative and the Organisation for Economic Co-operation and Development - are used in the design of both candidate ontologies. We then evaluate both ontology designs using the ROMEO approach, to calculate their level of coverage against the seen indicators, as well as against an unseen third indicator set (the United Nations Statistics Division). We also show that use of existing structured approaches like METHONTOLOGY and ROMEO can reduce ambiguity in ontology design and evaluation for domain-level ontologies. It is concluded that where an ontology needs to be designed for both seen and unseen indicator systems, a generic and reusable design is preferable. Second, having addressed the heterogeneity issues at the data level of sustainability indicators in the first phase of the research, we then develop a software for a sustainability reporting framework - Circles of Sustainability - which provides two mechanisms for browsing heterogeneous sustainability indicator sets: a Tabular view and a Circular view. In particular, the generic design of ontology developed during the first phase of the research is applied to this software. Next, we evaluate the overall usefulness and ease of use for the presented software and the associated user interfaces by conducting a user study. The analysis of quantitative and qualitative results of the user study concludes that the Circular view is the preferred interface by most participants for browsing semantic heterogeneous indicators. Third, in the context of disaster management, we present a geotagger method for the OzCrisisTracker application that automatically detects and disambiguates the heterogeneity of georeferences mentioned in the tweets' content with three possibilities: definite, ambiguous and no-location. Our method semantically annotates the tweet components utilising existing and new ontologies. We also concluded that the accuracy of geographic focus of our geotagger is considerably higher than other systems. From a more general perspective the research contributions can be articulated as follows. The knowledge bases developed in this research have been applied to the two domain applications. The thesis therefore demonstrates how semantic technologies, such as ontology design patterns, browsing tools and geocoding, can untangle data representation and navigation issues of semantic heterogeneity in sustainability and disaster management domains

    E-commerce website personalisation based on ontological profiling

    Get PDF
    Electronic commerce has become an important part of our consumer lives, and we increasingly choose to do more and more of our shopping online. Along with the growth of online sales, the number of e-commerce retailers has also increased. This has inevitably put additional demands on existing companies as well as new market entrants to ensure that their growth (if not just survival) as well as competitiveness are sustainable and evolving. Web personalisation has been adopted as a means to support business sustainability and competitiveness. It is now increasingly common and has been recognised by e-commerce businesses and consumers as a feature and functionality, expected to be offered as ‘standard’. Recent World Wide Web technology advances have greatly improved the way ecommerce websites are designed and deployed. However, the analysis of academic literature and professional practices shows that these advances are not used to their full potential. This research gap is an opportunity for this community to consider how techniques such as ontologies could be used to enhance personalisation of e-commerce websites. This thesis presents a novel approach to e-commerce website personalisation (PERSONTO), and in particular, personalisation of content presentation. Personalisation is achieved by means of an ontology-based e-shopper profiling. For this purpose, a reusable, extendible and Semantic Web compatible customer profiling ontology OntoProfi is designed and implemented. A ‘proof-of-concept’ prototype of PERSONTO confirmed the feasibility of the proposed approach. The analysis of achievements of the research objectives and outcomes showed that the approach is flexible, extendible and reusable, and that it was achieved by using systematic methods in the system design and implementation of the prototype. The evaluation of the acceptance of the proposed approach suggests there is a high level of acceptance of the approach by the prospective end users and e-commerce developers

    Adaptive and personalized semantic web

    No full text
    corecore