32 research outputs found
The iCrawl Wizard -- Supporting Interactive Focused Crawl Specification
Collections of Web documents about specific topics are needed for many areas
of current research. Focused crawling enables the creation of such collections
on demand. Current focused crawlers require the user to manually specify
starting points for the crawl (seed URLs). These are also used to describe the
expected topic of the collection. The choice of seed URLs influences the
quality of the resulting collection and requires a lot of expertise. In this
demonstration we present the iCrawl Wizard, a tool that assists users in
defining focused crawls efficiently and semi-automatically. Our tool uses major
search engines and Social Media APIs as well as information extraction
techniques to find seed URLs and a semantic description of the crawl intent.
Using the iCrawl Wizard even non-expert users can create semantic
specifications for focused crawlers interactively and efficiently.Comment: Published in the Proceedings of the European Conference on
Information Retrieval (ECIR) 201
Analysing entity context in multilingual Wikipedia to support entity-centric retrieval applications
Representation of influential entities, such as famous people and multinational corporations, on the Web can vary across languages, reflecting language-specific entity aspects as well as divergent views on these entities in different communities. A systematic analysis of language specific entity contexts can provide a better overview of the existing aspects and support entity-centric retrieval applications over multilingual Web data. An important source of cross-lingual information about influential entities is Wikipedia — an online community-created encyclopaedia — containing more than 280 language editions. In this paper we focus on the extraction and analysis of the language-specific entity contexts from different Wikipedia language editions over multilingual data. We discuss alternative ways such contexts can be built, including graph-based and article-based contexts. Furthermore, we analyse the similarities and the differences in these contexts in a case study including 80 entities and five Wikipedia language editions
A First Step Towards Keyword-Based Searching for Recommendation Systems
Due to the high availability of data, users are frequently overloaded with a huge amount of alternatives when they need to choose a particular item. This has motivated an increased interest in research on recommendation systems, which lter the options and provide users with suggestions about specic elements (e.g., movies, restaurants, hotels, news, etc.) that are estimated to be potentially relevant for the user.
Recommendation systems are still an active area of research, and particularly in the last years the concept of context-aware recommendation systems has started to be popular, due to the interest of considering the context of the user in the recommendation process. In this paper, we describe our work-in-progress concerning pull-based recommendations (i.e., recommendations about certain types of items that are explicitly requested by the user). In particular, we focus on the problem of detecting the type of item the user is interested in. Due to its
popularity, we consider a keyword-based user interface: the user types a few keywords and the system must determine what the user is searching for. Whereas there is extensive work in the field of keyword-based search, which is still a very active research area, keyword searching has not been applied so far in most recommendation contexts
Topic detection in multichannel Italian newspapers
Nowadays, any person, company or public institution uses and exploits different channels to share private or public information with other people (friends, customers, relatives, etc.) or institutions. This context has changed the journalism, thus, the major newspapers report news not just on its own web site, but also on several social media such as Twitter or YouTube. The use of multiple communication media stimulates the need for integration and analysis of the content published globally and not just at the level of a single medium. An analysis to achieve a comprehensive overview of the information that reaches the end users and how they consume the information is needed. This analysis should identify the main topics in the news flow and reveal the mechanisms of publication of news on different media (e.g. news timeline). Currently, most of the work on this area is still focused on a single medium. So, an analysis across different media (channels) should improve the result of topic detection. This paper shows the application of a graph analytical approach, called Keygraph, to a set of very heterogeneous documents such as the news published on various media. A preliminary evaluation on the news published in a 5 days period was able to identify the main topics within the publications of a single newspaper, and also within the publications of 20 newspapers on several on-line channels
No users no dataspaces! Query-driven dataspace orchestration
Data analysis in rich spaces of heterogeneous data sources
is an increasingly common activity. Examples include querying the web
of linked data and personal information management. Such analytics on
dataspaces is often iterative and dynamic, in an open-ended interaction
between discovery and data orchestration. The current state of the art in
integration and orchestration in dataspaces is primarily geared towards
close-ended analysis, targeting the discovery of stable data mappings or
one-time, pay-as-you-go ad hoc data mappings. The perspective here is
dataspace-centric.
In this paper, we propose a shift to a user-centric perspective on dataspace
orchestration. We outline basic conceptual and technical challenges
in supporting data analytics which is open-ended and always evolving,
as users respond to new discoveries and connections
TRAFAIR: Understanding Traffic Flow to Improve Air Quality
Environmental impacts of traffic are of major concern throughout many European metropolitan areas. Air pollution causes 400 000 deaths per year, making it first environmental cause of premature death in Europe. Among the main sources of air pollution in Europe, there are road traffic, domestic heating, and industrial combustion. The TRAFAIR project brings together 9 partners from two European countries (Italy and Spain) to develop innovative and sustainable services combining air quality, weather conditions, and traffic flows data to produce new information for the benefit of citizens and government decision-makers. The project is started in November 2018 and lasts two years. It is motivated by the huge amount of deaths caused by the air pollution. Nowadays, the situation is particularly critical in some member states of Europe. In February 2017, the European Commission warned five countries, among which Spain and Italy, of continued air pollution breaches. In this context, public administrations and citizens suffer from the lack of comprehensive and fast tools to estimate the level of pollution on an urban scale resulting from varying traffic flow conditions that would allow optimizing control strategies and increase air quality awareness. The goals of the project are twofold: monitoring urban air quality by using sensors in 6 European cities and making urban air quality predictions thanks to simulation models. The project is co-financed by the European Commission under the CEF TELECOM call on Open Data
A software processing chain for evaluating thesaurus quality
Thesauri are knowledge models commonly used for information classification and retrieval whose structure is defined by standards that describe the main features the concepts and relations must have. However, following these standards requires a deep knowledge of the field the thesaurus is going to cover and experience in their creation. To help in this task, this paper describes a software processing chain that provides different validation components that evaluates the quality of the main thesaurus features
Semantic URL Analytics to Support Efficient Annotation of Large Scale Web Archives
Long-term Web archives comprise Web documents gathered over longer time periods and can easily reach hundreds of terabytes in size. Semantic annotations such as named entities can facilitate intelligent access to the Web archive data. However, the annotation of the entire archive content on this scale is often infeasible. The most efficient way to access the documents within Web archives is provided through their URLs, which are typically stored in dedicated index files. The URLs of the archived Web documents can contain semantic information and can offer an efficient way to obtain initial semantic annotations for the archived documents. In this paper, we analyse the applicability of semantic analysis techniques such as named entity extraction to the URLs in a Web archive. We evaluate the precision of the named entity extraction from the URLs in the Popular German Web dataset and analyse the proportion of the archived URLs from 1,444 popular domains in the time interval from 2000 to 2012 to which these techniques are applicable. Our results demonstrate that named entity recognition can be successfully applied to a large number of URLs in our Web archive and provide a good starting point to efficiently annotate large scale collections of Web documents
From a research project to an Information System course: a professional approach
[EN] Nowadays, new business models are arising thanks to the development of ICT. In this context, the law is
constantlybeing adaptedto guarantee the rights of individuals. Studyingtopics related to legislation without considering
its relation with a particular project is unattractive and generally it does not motivate computer science students.
However, according to reports by the Instituto Nacional de TecnologĂas de la ComunicaciĂłn (INTECO), a high
percentage of Small and Medium Enterprises (SMEs) does not consider current legislation on issues related to ICT. For
these reasons, we develop a series of guides definingbehaviour protocols, based on an active computer researchproject
oriented to SMEs; and, at the same time, we decided to try to engage computer science students of the need to respect
the regulations for the development of any software project (part of their next career future) making clear the relation
between their tasks in any project of this kind and the laws and norms that should be respected during this process by
the practical use and respect of these laws, in an Information Systems course. This last part is the work we present hereLozano Albalate, MT.; Trillo Lado, R. (2015). From a research project to an Information System course: a professional approach. En 1ST INTERNATIONAL CONFERENCE ON HIGHER EDUCATION ADVANCES (HEAD' 15). Editorial Universitat Politècnica de València. 83-89. https://doi.org/10.4995/HEAd15.2015.420OCS838
Consolidation of a professional approach experience on motivating Computer Engineering students to the application of legal issues
[EN] In previous courses, professors of the degree of Computer Science and
Software Engineering of the University of Zaragoza realised that students did
not like studying materias related to Legislation and Information Systems.
However, these topics are key when Computers Science and Software
Engineers have to analyse, design, implement, and mantain Information
Systems in different environments such as an enterprise, a public entity, etc.
because the rights of the users/clients of these systems must be guaranteed.
So, a more appeling way to teach those topics to motivate the students to take
them into account was designed.
This paper describes the methodology and the main activities designed in the
2014/2015 and 2015/2016 courses in order to get the attention of the students
on topics related to the current Spanish legislation and Information Systems.
Moreover, some indicators about the performance of the students and their
opinions about this new methodology are also described and analysed.Lozano, M.; Trillo-Lado, R. (2016). Consolidation of a professional approach experience on motivating Computer Engineering students to the application of legal issues. En 2nd. International conference on higher education advances (HEAD'16). Editorial Universitat Politècnica de València. 295-301. https://doi.org/10.4995/HEAD16.2015.2713OCS29530