49 research outputs found
Incremental clustering of news reports
When an event occurs in the real world, numerous news reports describing this
event start to appear on different news sites within a few minutes of the event occurrence.
This may result in a huge amount of information for users, and automated processes may be
required to help manage this information. In this paper, we describe a clustering system that
can cluster news reports from disparate sources into event-centric clusters—i.e., clusters of
news reports describing the same event. A user can identify any RSS feed as a source of news
he/she would like to receive and our clustering system can cluster reports received from the
separate RSS feeds as they arrive without knowing the number of clusters in advance. Our
clustering system was designed to function well in an online incremental environment. In
evaluating our system, we found that our system is very good in performing fine-grained
clustering, but performs rather poorly when performing coarser-grained clustering.peer-reviewe
URECA – the research ethics and data protection online review platform used by the University of Malta
Nowadays, research ethics and data protection are given very high importance, and research
organizations, including universities, need to safeguard their level of professionalism and
integrity by providing the necessary guidelines. Moreover, they need to ensure that these
guidelines are being adhered to by their affiliated researchers, including students. This is needed
for protection of the research subjects, researchers, and the organization (university) itself.
However, care must be taken so that the research ethics review process is streamlined as much as
possible to minimize bureaucracy, as such guidelines would then be viewed as a research barrier.
This study describes URECA, the online review platform developed in-house by the University
of Malta to streamline its research ethics review process, thus simplifying matters for the
researchers/students, reviewers, and auditing committees. This platform is utilized by researchers
and students to submit information regarding their research and related data collection, by
supervisors to endorse their students’ research, by Faculty Research Ethics Committee members
to review research proposals as required, and by the University Research Ethics Committee to
manage and audit the overall process.peer-reviewe
Emergent realities for social wellbeing : environmental, spatial and social pathways
Multiple sectors of modern society have become dependent on accurate and regular weather forecasts. These allow them to make strategic and informed decisions in order to preserve and maintain their assets. Weather forecasts have also become an integral part of various systems and services, such as Decision Support Systems and Early Warning Systems, all of which play a crucial role in modern societies. Numerical weather prediction (NWP) models are used to accurately compute synoptic weather conditions. One of the most commonly used NWP atmospheric simulators is the Weather Research and Forecasting (WRF) model (Lu, Zhong, Charney, Bian, & Liu, 2012; Evan, Alexander, & Dudhia, 2012), (Evan, Alexander, & Dudhia, 2012); Giannaros, Melas, Daglis, Keramitsoglou, & Kourtidis, 2013)). This model is a collaborative design effort between research and operational meteorological communities. It offers a state-of-theart system which is continuously maintained to represent this critical body of knowledge within the scientific community. The WRF model is freeware with a wide variety of applications and which can be transferred and downloaded onto a variety of platforms. It is often used for both research and operational applications (Skamarock, et al., 2008).peer-reviewe
Performing fusion of news reports through the construction of conceptual graphs
As events occur around the world, different reports about them will be posted on various web portals. Different news agencies write their own report based on the information obtained by its reports on site or through its contacts – thus each report may have its own ‘unique’ information. A per- son interested in a particular event may read various reports about that event from different sources to get all the available information. In our research, we are attempting to fuse all the different pieces of information found in the different reports about the same event into one report – thus providing the user with one document where he/she can find all the information related to the event in question. We attempt to do this by constructing conceptual graph representations of the different news reports, and then merging those graphs together. To evaluate our system, we are building an operational system which will display on a web portal fused reports on events which are currently in the news. Web users can then grade the system on its effectiveness.peer-reviewe
Automatic clustering of news reports
The automatic clustering of news reports from various web-based news sites into clusters according to the event they cover serves not only to facilitate browsing of news reports by a users but may also serve as an initial stage in other complex systems such as Multi-Document Summarization systems or Document Fusion systems. In contrast to the usual scenarios of document clustering whereby the document collections are static or quasi-static, news sites are continuously updated with re- ports concerning new events. Here, we present a News Report Clustering system which is able to receive a stream of news reports which it clusters on the fly according to the event they cover. New clusters are automat- ically created as necessary for news reports which are covering ‘new’, previously unreported events. We compare the results of our system to the results produced by a standard K-Means clustering system, and we show that our system performs significantly better than the standard K- Means system even though the K-Means system was supplied with the correct number of clusters that should be produced. In fact, our clustering system obtained an average of 11.95% better recall, 28.68% better precision and 0.89% less fallout than the standard K-Means clustering system.peer-reviewe
Oil spill risk assessment on the Maltese coastal areas
A significant percentage of the global oil transport goes through the Mediterranean sea. Most of the maritime traffic carrying oil
and other dangerous liquid substances travels across the Malta Channel. The risk of marine spillages within the stretch of sea
between Malta and Sicily is very high and beaching on the Maltese shores can cause irreversible environmental damage at the
detriment of important economic resources. The aim of this work is to determine the probability and volume percentage of oil that
would reach the coast in case of an accident in the proximity of the Maltese Islands. Various spill scenarios are considered to get a
realistic estimate as much as possible.peer-reviewe
An automatic participant detection framework for event tracking on twitter
Topic Detection and Tracking (TDT) on Twitter emulates human identifying developments in events from a stream of tweets, but while event participants are important for humans to understand what happens during events, machines have no knowledge of them. Our evaluation on football matches and basketball games shows that identifying event participants from tweets is a difficult problem exacerbated by Twitter’s noise and bias. As a result, traditional Named Entity Recognition (NER) approaches struggle to identify participants from the pre-event Twitter stream. To overcome these challenges, we describe Automatic Participant Detection (APD) to detect an event’s participants before the event starts and improve the machine understanding of events. We propose a six-step framework to identify participants and present our implementation, which combines information from Twitter’s pre-event stream and Wikipedia. In spite of the difficulties associated with Twitter and NER in the challenging context of events, our approach manages to restrict noise and consistently detects the majority of the participants. By empowering machines with some of the knowledge that humans have about events, APD lays the foundation not just for improved TDT.peer-reviewe
Predicting customer behavioural patterns using a virtual credit card transactions dataset
Nowadays, many businesses are resorting to data mining techniques on their data, to save costs and time, as well as to understand customers’ needs. Analysing such data can leader to higher profits and higher customer satisfaction. This paper presents a data mining study that is applied on millions of transactional records collected for a number of years, by a leading virtual credit card company based in Malta. In this study, 2 machine learning techniques, namely Artificial Neural Networks (ANN) and Gradient Boosting (GBM), are analysed to identify the best modelling framework that predicts the churning behaviour of this company’s customers. Apart from helping the marketing department of this firm by providing a model that predicts churning customers, we contribute to literature by identifying the minimum amount of customer activity needed to predict churn. In addition, we also analyse the “cold start” problem by performing a time-series experiment based on the few data available at the beginning of the customer purchase history.peer-reviewe
The myth of reproducibility : a review of event tracking evaluations on Twitter
Event tracking literature based on Twitter does not have a state-of-the-art. What it does have is a plethora of manual evaluation methodologies and inventive automatic alternatives: incomparable and irreproducible studies incongruous with the idea of a state-of-the-art. Many researchers blame Twitter's data sharing policy for the lack of common datasets and a universal ground truth–for the lack of reproducibility–but many other issues stem from the conscious decisions of those same researchers. In this paper, we present the most comprehensive review yet on event tracking literature's evaluations on Twitter. We explore the challenges of manual experiments, the insufficiencies of automatic analyses and the misguided notions on reproducibility. Crucially, we discredit the widely-held belief that reusing tweet datasets could induce reproducibility. We reveal how tweet datasets self-sanitize over time; how spam and noise become unavailable at much higher rates than legitimate content, rendering downloaded datasets incomparable with the original. Nevertheless, we argue that Twitter's policy can be a hindrance without being an insurmountable barrier, and propose how the research community can make its evaluations more reproducible. A state-of-the-art remains attainable for event tracking research.peer-reviewe
Development of a novel tool to predict different water quality scenarios within a Marine Protected Area (MPA) in the Maltese Islands : the 2D SHYFEM-BFM model
Effective operational marine conservation and management is thwarted by a lack of financial and human resources. A coupled 2D
hydrodynamic (SHYFEM) and ecological (BFM) model was developed in the current study as a Decision Support System (DSS)
to spearhead good governance of a Marine Protected Area (MPA) in Dwejra (Maltese Islands) in the Central Mediterranean.
Two scenarios were considered
– one with the current levels of nutrient runoff from land and one in which such levels are
increased as a result of a greater human activity within the area. Although the developed numerical modeling platform needs to be
refined and to be run for a longer time
-frame, its output suggests that it is a promising tool to assist in the operational management
of an MPA.peer-reviewe