49 research outputs found

    Incremental clustering of news reports

    Get PDF
    When an event occurs in the real world, numerous news reports describing this event start to appear on different news sites within a few minutes of the event occurrence. This may result in a huge amount of information for users, and automated processes may be required to help manage this information. In this paper, we describe a clustering system that can cluster news reports from disparate sources into event-centric clusters—i.e., clusters of news reports describing the same event. A user can identify any RSS feed as a source of news he/she would like to receive and our clustering system can cluster reports received from the separate RSS feeds as they arrive without knowing the number of clusters in advance. Our clustering system was designed to function well in an online incremental environment. In evaluating our system, we found that our system is very good in performing fine-grained clustering, but performs rather poorly when performing coarser-grained clustering.peer-reviewe

    URECA – the research ethics and data protection online review platform used by the University of Malta

    Get PDF
    Nowadays, research ethics and data protection are given very high importance, and research organizations, including universities, need to safeguard their level of professionalism and integrity by providing the necessary guidelines. Moreover, they need to ensure that these guidelines are being adhered to by their affiliated researchers, including students. This is needed for protection of the research subjects, researchers, and the organization (university) itself. However, care must be taken so that the research ethics review process is streamlined as much as possible to minimize bureaucracy, as such guidelines would then be viewed as a research barrier. This study describes URECA, the online review platform developed in-house by the University of Malta to streamline its research ethics review process, thus simplifying matters for the researchers/students, reviewers, and auditing committees. This platform is utilized by researchers and students to submit information regarding their research and related data collection, by supervisors to endorse their students’ research, by Faculty Research Ethics Committee members to review research proposals as required, and by the University Research Ethics Committee to manage and audit the overall process.peer-reviewe

    Emergent realities for social wellbeing : environmental, spatial and social pathways

    Get PDF
    Multiple sectors of modern society have become dependent on accurate and regular weather forecasts. These allow them to make strategic and informed decisions in order to preserve and maintain their assets. Weather forecasts have also become an integral part of various systems and services, such as Decision Support Systems and Early Warning Systems, all of which play a crucial role in modern societies. Numerical weather prediction (NWP) models are used to accurately compute synoptic weather conditions. One of the most commonly used NWP atmospheric simulators is the Weather Research and Forecasting (WRF) model (Lu, Zhong, Charney, Bian, & Liu, 2012; Evan, Alexander, & Dudhia, 2012), (Evan, Alexander, & Dudhia, 2012); Giannaros, Melas, Daglis, Keramitsoglou, & Kourtidis, 2013)). This model is a collaborative design effort between research and operational meteorological communities. It offers a state-of-theart system which is continuously maintained to represent this critical body of knowledge within the scientific community. The WRF model is freeware with a wide variety of applications and which can be transferred and downloaded onto a variety of platforms. It is often used for both research and operational applications (Skamarock, et al., 2008).peer-reviewe

    Performing fusion of news reports through the construction of conceptual graphs

    Get PDF
    As events occur around the world, different reports about them will be posted on various web portals. Different news agencies write their own report based on the information obtained by its reports on site or through its contacts – thus each report may have its own ‘unique’ information. A per- son interested in a particular event may read various reports about that event from different sources to get all the available information. In our research, we are attempting to fuse all the different pieces of information found in the different reports about the same event into one report – thus providing the user with one document where he/she can find all the information related to the event in question. We attempt to do this by constructing conceptual graph representations of the different news reports, and then merging those graphs together. To evaluate our system, we are building an operational system which will display on a web portal fused reports on events which are currently in the news. Web users can then grade the system on its effectiveness.peer-reviewe

    Automatic clustering of news reports

    Get PDF
    The automatic clustering of news reports from various web-based news sites into clusters according to the event they cover serves not only to facilitate browsing of news reports by a users but may also serve as an initial stage in other complex systems such as Multi-Document Summarization systems or Document Fusion systems. In contrast to the usual scenarios of document clustering whereby the document collections are static or quasi-static, news sites are continuously updated with re- ports concerning new events. Here, we present a News Report Clustering system which is able to receive a stream of news reports which it clusters on the fly according to the event they cover. New clusters are automat- ically created as necessary for news reports which are covering ‘new’, previously unreported events. We compare the results of our system to the results produced by a standard K-Means clustering system, and we show that our system performs significantly better than the standard K- Means system even though the K-Means system was supplied with the correct number of clusters that should be produced. In fact, our clustering system obtained an average of 11.95% better recall, 28.68% better precision and 0.89% less fallout than the standard K-Means clustering system.peer-reviewe

    Oil spill risk assessment on the Maltese coastal areas

    Get PDF
    A significant percentage of the global oil transport goes through the Mediterranean sea. Most of the maritime traffic carrying oil and other dangerous liquid substances travels across the Malta Channel. The risk of marine spillages within the stretch of sea between Malta and Sicily is very high and beaching on the Maltese shores can cause irreversible environmental damage at the detriment of important economic resources. The aim of this work is to determine the probability and volume percentage of oil that would reach the coast in case of an accident in the proximity of the Maltese Islands. Various spill scenarios are considered to get a realistic estimate as much as possible.peer-reviewe

    An automatic participant detection framework for event tracking on twitter

    Get PDF
    Topic Detection and Tracking (TDT) on Twitter emulates human identifying developments in events from a stream of tweets, but while event participants are important for humans to understand what happens during events, machines have no knowledge of them. Our evaluation on football matches and basketball games shows that identifying event participants from tweets is a difficult problem exacerbated by Twitter’s noise and bias. As a result, traditional Named Entity Recognition (NER) approaches struggle to identify participants from the pre-event Twitter stream. To overcome these challenges, we describe Automatic Participant Detection (APD) to detect an event’s participants before the event starts and improve the machine understanding of events. We propose a six-step framework to identify participants and present our implementation, which combines information from Twitter’s pre-event stream and Wikipedia. In spite of the difficulties associated with Twitter and NER in the challenging context of events, our approach manages to restrict noise and consistently detects the majority of the participants. By empowering machines with some of the knowledge that humans have about events, APD lays the foundation not just for improved TDT.peer-reviewe

    Predicting customer behavioural patterns using a virtual credit card transactions dataset

    Get PDF
    Nowadays, many businesses are resorting to data mining techniques on their data, to save costs and time, as well as to understand customers’ needs. Analysing such data can leader to higher profits and higher customer satisfaction. This paper presents a data mining study that is applied on millions of transactional records collected for a number of years, by a leading virtual credit card company based in Malta. In this study, 2 machine learning techniques, namely Artificial Neural Networks (ANN) and Gradient Boosting (GBM), are analysed to identify the best modelling framework that predicts the churning behaviour of this company’s customers. Apart from helping the marketing department of this firm by providing a model that predicts churning customers, we contribute to literature by identifying the minimum amount of customer activity needed to predict churn. In addition, we also analyse the “cold start” problem by performing a time-series experiment based on the few data available at the beginning of the customer purchase history.peer-reviewe

    The myth of reproducibility : a review of event tracking evaluations on Twitter

    Get PDF
    Event tracking literature based on Twitter does not have a state-of-the-art. What it does have is a plethora of manual evaluation methodologies and inventive automatic alternatives: incomparable and irreproducible studies incongruous with the idea of a state-of-the-art. Many researchers blame Twitter's data sharing policy for the lack of common datasets and a universal ground truth–for the lack of reproducibility–but many other issues stem from the conscious decisions of those same researchers. In this paper, we present the most comprehensive review yet on event tracking literature's evaluations on Twitter. We explore the challenges of manual experiments, the insufficiencies of automatic analyses and the misguided notions on reproducibility. Crucially, we discredit the widely-held belief that reusing tweet datasets could induce reproducibility. We reveal how tweet datasets self-sanitize over time; how spam and noise become unavailable at much higher rates than legitimate content, rendering downloaded datasets incomparable with the original. Nevertheless, we argue that Twitter's policy can be a hindrance without being an insurmountable barrier, and propose how the research community can make its evaluations more reproducible. A state-of-the-art remains attainable for event tracking research.peer-reviewe

    Development of a novel tool to predict different water quality scenarios within a Marine Protected Area (MPA) in the Maltese Islands : the 2D SHYFEM-BFM model

    Get PDF
    Effective operational marine conservation and management is thwarted by a lack of financial and human resources. A coupled 2D hydrodynamic (SHYFEM) and ecological (BFM) model was developed in the current study as a Decision Support System (DSS) to spearhead good governance of a Marine Protected Area (MPA) in Dwejra (Maltese Islands) in the Central Mediterranean. Two scenarios were considered – one with the current levels of nutrient runoff from land and one in which such levels are increased as a result of a greater human activity within the area. Although the developed numerical modeling platform needs to be refined and to be run for a longer time -frame, its output suggests that it is a promising tool to assist in the operational management of an MPA.peer-reviewe
    corecore