22 research outputs found
A Semantic-Based Framework for Summarization and Page Segmentation in Web Mining
This chapter addresses two crucial issues that arise when one applies Web-mining techniques for extracting relevant information. The first one is the acquisition of useful knowledge from textual data; the second issue stems from the fact that a web page often proposes a considerable amount of \u2018noise\u2019 with respect to the sections that are truly informative for the user's purposes. The novelty contribution of this work lies in a framework that can tackle both these tasks at the same time, supporting text summarization and page segmentation. The approach achieves this goal by exploiting semantic networks to map natural language into an abstract representation, which eventually supports the identification of the topics addressed in a text source. A heuristic algorithm uses the abstract representation to highlight the relevant segments of text in the original document. The verification of the approach effectiveness involved a publicly available benchmark, the DUC 2002 dataset, and satisfactory results confirmed the method effectiveness
Investigation of Third Party Rights Service and Shibboleth Modification to Introduce the Service
Shibboleth is an architecture to support inter-institutional sharing of electronic resources that are subject to access control. Codifying copyright in Shibboleth authorization policies is difficult because of the copyright exceptions which can be highly subjective. Third Party Rights Service is a high-level concept that has been suggested as a solution to approximate the exceptions of copyright law. In this thesis, I investigate the components of the Third Party Rights Service. I design and analyze a modified Shibboleth architecture based on these components. The resulting architecture allows for the phased addition of the resources to make use of the Third Party Rights Service, while keeping the existing resources in Shibboleth
Avoin data ja semanttinen verkko - yhdessä kohti älykkäämpää internetiä
Digitaalisen vallankumouksen tuoma datan määrän räjähdysmäinen kasvu on tuonut esiin toisaalta haasteita mutta myös mahdollisuuksia datan hyödyntämiseksi. Samaan aikaan käynnissä oleva avoimen ideologian esiinmarssi ja datan hyödyntämiseen tähtäävien teknisten menetelmien kehitys on muuttamassa suhtautumistamme dataan. Datasta on tulossa seuraava internetin resurssi. Internetin standardointiin tähtäävän W3-organisaation tavoitteena on tukea tätä kehitystä, ja se tuottaa tätä varten datan laadun parantamiseksi tarkoitettuja määrittelyitä. Datan kuvaamiseen tehdyt ja semanttisen datan ja semanttisen verkon mahdollistavat määrittelyt ovat näistä keskeisimmät. Avoimen datan ideologia on saanut julkiset instituutiot avaamaan dataa, ja tässä yhteydessä datan laadulle asetetaan vaatimuksia. Arvioidessani julkisen avoimen datan laatua tähän tarkoitukseen esitellyllä viiden tähden asteikolla tulen siihen tulokseen, ettei tämän datan laatu vastaa semanttisen verkon vaatimuksia.
Asiasanat:avoin data, semanttinen data, semanttinen verkk
Incident Prioritisation for Intrusion Response Systems
The landscape of security threats continues to evolve, with attacks becoming more serious and the number of vulnerabilities rising. To manage these threats, many security studies have been undertaken in recent years, mainly focusing on improving detection, prevention and response efficiency. Although there are security tools such as antivirus software and firewalls available to counter them, Intrusion Detection Systems and similar tools such as Intrusion Prevention Systems are still one of the most popular approaches. There are hundreds of published works related to intrusion detection that aim to increase the efficiency and reliability of detection, prevention and response systems. Whilst intrusion detection system technologies have advanced, there are still areas available to explore, particularly with respect to the process of selecting appropriate responses.
Supporting a variety of response options, such as proactive, reactive and passive responses, enables security analysts to select the most appropriate response in different contexts. In view of that, a methodical approach that identifies important incidents as opposed to trivial ones is first needed. However, with thousands of incidents identified every day, relying upon manual processes to identify their importance and urgency is complicated, difficult, error-prone and time-consuming, and so prioritising them automatically would help security analysts to focus only on the most critical ones. The existing approaches to incident prioritisation provide various ways to prioritise incidents, but less attention has been given to adopting them into an automated response system. Although some studies have realised the advantages of prioritisation, they released no further studies showing they had continued to investigate the effectiveness of the process.
This study concerns enhancing the incident prioritisation scheme to identify critical incidents based upon their criticality and urgency, in order to facilitate an autonomous mode for the response selection process in Intrusion Response Systems. To achieve this aim, this study proposed a novel framework which combines models and strategies identified from the comprehensive literature review. A model to estimate the level of risks of incidents is established, named the Risk Index Model (RIM). With different levels of risk, the Response Strategy Model (RSM) dynamically maps incidents into different types of response, with serious incidents being mapped to active responses in order to minimise their impact, while incidents with less impact have passive responses. The combination of these models provides a seamless way to map incidents automatically; however, it needs to be evaluated in terms of its effectiveness and performances. To demonstrate the results, an evaluation study with four stages was undertaken; these stages were a feasibility study of the RIM, comparison studies with industrial standards such as Common Vulnerabilities Scoring System (CVSS) and Snort, an examination of the effect of different strategies in the rating and ranking process, and a test of the effectiveness and performance of the Response Strategy Model (RSM). With promising results being gathered, a proof-of-concept study was conducted to demonstrate the framework using a live traffic network simulation with online assessment mode via the Security Incident Prioritisation Module (SIPM); this study was used to investigate its effectiveness and practicality.
Through the results gathered, this study has demonstrated that the prioritisation process can feasibly be used to facilitate the response selection process in Intrusion Response Systems. The main contribution of this study is to have proposed, designed, evaluated and simulated a framework to support the incident prioritisation process for Intrusion Response Systems.Ministry of Higher Education in Malaysia and University of Malay
Systematics of Clematis in Nepal, the evolution of tribe Anemoneae DC. (Ranunculaceae) and phylogeography and the dynamics of speciation in the Himalaya
The genus Clematis L. (Ranunculaceae) was used as a new model group to assess the role of
the Himalayan orogeny on generation of biodiversity through investigations of its
phylogeny, phylogeography and taxonomy.
Although existing checklists include 28 species of Clematis from Nepal, a comprehensive
taxonomic revision of available material in herbaria and additional sampling from fieldwork
during this study has led to the recognition of 21 species of Clematis in Nepal, including one
species (C. kilungensis) not previously recorded from Nepal.
Exisiting phylogenetic and taxonomic concepts were tested with the addition of new samples
from Nepal. The results highlight the shortcomings of the previous studies which were
poorly resolved and indicate the need for a thorough revision of the sectional classification.
Despite the increased sampling the results are still equivocal due to poor statistical support
along the backbone of the phylogeny. Groups of species in well supported terminal clades
are broadly comparable with results from previous studies although there are fewer clearly
recognisable and well supported clades.
The published dates for the evolution of Clematis were tested and the methodology of the
previous study critically reappraised. The results indicate that the genus Clematis is
approximately twice as old as previously reported and evolved in the middle Miocene. The
phylogeny also demonstrates that, even allowing for poor support for the relationships
between groups of species within Clematis, the extant Nepalese species must have multiple
independent origins from at least 6 different colonisations. With their occurrence in the
Pliocene and Pleistocene, these events are relatively recent in relation to the Himalayan
orogeny, and may be linked more to the dispersal ability of Clematis than to the direct effects
of the orogeny.
Additional Nepalese samples of Koenigia and Meconopsis were added to exisiting datasets
and these were reanalysed. The result from Clematis, Koenigia and Meconopsis were
appraised in light of the the geocientific literature and previously published phylogeographic
studies to create an overview of the drivers behind speciation in the Himalaya
Exploring attributes, sequences, and time in Recommender Systems: From classical to Point-of-Interest recommendation
Tesis Doctoral inédita leída en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Ingenieria Informática. Fecha de lectura: 08-07-2021Since the emergence of the Internet and the spread of digital communications
throughout the world, the amount of data stored on the Web has been
growing exponentially. In this new digital era, a large number of companies
have emerged with the purpose of ltering the information available on the
web and provide users with interesting items. The algorithms and models
used to recommend these items are called Recommender Systems. These
systems are applied to a large number of domains, from music, books, or
movies to dating or Point-of-Interest (POI), which is an increasingly popular
domain where users receive recommendations of di erent places when
they arrive to a city.
In this thesis, we focus on exploiting the use of contextual information, especially
temporal and sequential data, and apply it in novel ways in both
traditional and Point-of-Interest recommendation. We believe that this type
of information can be used not only for creating new recommendation models
but also for developing new metrics for analyzing the quality of these
recommendations. In one of our rst contributions we propose di erent
metrics, some of them derived from previously existing frameworks, using
this contextual information. Besides, we also propose an intuitive algorithm
that is able to provide recommendations to a target user by exploiting the
last common interactions with other similar users of the system.
At the same time, we conduct a comprehensive review of the algorithms
that have been proposed in the area of POI recommendation between 2011
and 2019, identifying the common characteristics and methodologies used.
Once this classi cation of the algorithms proposed to date is completed, we
design a mechanism to recommend complete routes (not only independent
POIs) to users, making use of reranking techniques. In addition, due to the
great di culty of making recommendations in the POI domain, we propose
the use of data aggregation techniques to use information from di erent
cities to generate POI recommendations in a given target city.
In the experimental work we present our approaches on di erent datasets
belonging to both classical and POI recommendation. The results obtained
in these experiments con rm the usefulness of our recommendation proposals,
in terms of ranking accuracy and other dimensions like novelty, diversity,
and coverage, and the appropriateness of our metrics for analyzing temporal
information and biases in the recommendations producedDesde la aparici on de Internet y la difusi on de las redes de comunicaciones
en todo el mundo, la cantidad de datos almacenados en la red ha crecido
exponencialmente. En esta nueva era digital, han surgido un gran n umero
de empresas con el objetivo de ltrar la informaci on disponible en la red
y ofrecer a los usuarios art culos interesantes. Los algoritmos y modelos
utilizados para recomendar estos art culos reciben el nombre de Sistemas de
Recomendaci on. Estos sistemas se aplican a un gran n umero de dominios,
desde m usica, libros o pel culas hasta las citas o los Puntos de Inter es (POIs,
en ingl es), un dominio cada vez m as popular en el que los usuarios reciben
recomendaciones de diferentes lugares cuando llegan a una ciudad.
En esta tesis, nos centramos en explotar el uso de la informaci on contextual,
especialmente los datos temporales y secuenciales, y aplicarla de forma novedosa
tanto en la recomendaci on cl asica como en la recomendaci on de POIs.
Creemos que este tipo de informaci on puede utilizarse no s olo para crear
nuevos modelos de recomendaci on, sino tambi en para desarrollar nuevas
m etricas para analizar la calidad de estas recomendaciones. En una de
nuestras primeras contribuciones proponemos diferentes m etricas, algunas
derivadas de formulaciones previamente existentes, utilizando esta informaci
on contextual. Adem as, proponemos un algoritmo intuitivo que es
capaz de proporcionar recomendaciones a un usuario objetivo explotando
las ultimas interacciones comunes con otros usuarios similares del sistema.
Al mismo tiempo, realizamos una revisi on exhaustiva de los algoritmos que
se han propuesto en el a mbito de la recomendaci o n de POIs entre 2011 y
2019, identi cando las caracter sticas comunes y las metodolog as utilizadas.
Una vez realizada esta clasi caci on de los algoritmos propuestos hasta la
fecha, dise~namos un mecanismo para recomendar rutas completas (no s olo
POIs independientes) a los usuarios, haciendo uso de t ecnicas de reranking.
Adem as, debido a la gran di cultad de realizar recomendaciones en el
ambito de los POIs, proponemos el uso de t ecnicas de agregaci on de datos
para utilizar la informaci on de diferentes ciudades y generar recomendaciones
de POIs en una determinada ciudad objetivo.
En el trabajo experimental presentamos nuestros m etodos en diferentes
conjuntos de datos tanto de recomendaci on cl asica como de POIs. Los
resultados obtenidos en estos experimentos con rman la utilidad de nuestras
propuestas de recomendaci on en t erminos de precisi on de ranking y de
otras dimensiones como la novedad, la diversidad y la cobertura, y c omo de
apropiadas son nuestras m etricas para analizar la informaci on temporal y
los sesgos en las recomendaciones producida
Plant Virus Emergence
This compilation of articles elaborates on plant virus diseases that are among the most recent epidemiological concerns. The chapters explore several paradigms in plant virus epidemiology, outbreaks, epidemics, and pandemics paralleling zoonotic viruses and that can be consequential to global food security. There is evidence that the local, regional, national, and global trade of agricultural products has aided the global dispersal of plant virus diseases. Expanding farmlands into pristine natural areas has created opportunities for viruses in native landscapes to invade crops, while the movement of food and food products disseminates viruses, creating epidemics or pandemics. Moreover, plant virus outbreaks not only directly impact food supply, but also incidentally affect human health
Automated Realistic Test Input Generation and Cost Reduction in Service-centric System Testing
Service-centric System Testing (ScST) is more challenging than testing traditional software due to the complexity of service technologies and the limitations that are imposed by the SOA environment. One of the most important problems in ScST is the problem of realistic test data generation. Realistic test data is often generated manually or using an existing source, thus it is hard to automate and laborious to generate. One of the limitations that makes ScST challenging is the cost associated with invoking services during testing process. This thesis aims to provide solutions to the aforementioned problems, automated realistic input generation and cost reduction in ScST. To address automation in realistic test data generation, the concept of Service-centric Test Data Generation (ScTDG) is presented, in which existing services used as realistic data sources. ScTDG minimises the need for tester input and dependence on existing data sources by automatically generating service compositions that can generate the required test data. In experimental analysis, our approach achieved between 93% and 100% success rates in generating realistic data while state-of-the-art automated test data generation achieved only between 2% and 34%. The thesis addresses cost concerns at test data generation level by enabling data source selection in ScTDG. Source selection in ScTDG has many dimensions such as cost, reliability and availability. This thesis formulates this problem as an optimisation problem and presents a multi-objective characterisation of service selection in ScTDG, aiming to reduce the cost of test data generation. A cost-aware pareto optimal test suite minimisation approach addressing testing cost concerns during test execution is also presented. The approach adapts traditional multi-objective minimisation approaches to ScST domain by formulating ScST concerns, such as invocation cost and test case reliability. In experimental analysis, the approach achieved reductions between 69% and 98.6% in monetary cost of service invocations during testin