3,110 research outputs found
Web Data Extraction, Applications and Techniques: A Survey
Web Data Extraction is an important problem that has been studied by means of
different scientific tools and in a broad range of applications. Many
approaches to extracting data from the Web have been designed to solve specific
problems and operate in ad-hoc domains. Other approaches, instead, heavily
reuse techniques and algorithms developed in the field of Information
Extraction.
This survey aims at providing a structured and comprehensive overview of the
literature in the field of Web Data Extraction. We provided a simple
classification framework in which existing Web Data Extraction applications are
grouped into two main classes, namely applications at the Enterprise level and
at the Social Web level. At the Enterprise level, Web Data Extraction
techniques emerge as a key tool to perform data analysis in Business and
Competitive Intelligence systems as well as for business process
re-engineering. At the Social Web level, Web Data Extraction techniques allow
to gather a large amount of structured data continuously generated and
disseminated by Web 2.0, Social Media and Online Social Network users and this
offers unprecedented opportunities to analyze human behavior at a very large
scale. We discuss also the potential of cross-fertilization, i.e., on the
possibility of re-using Web Data Extraction techniques originally designed to
work in a given domain, in other domains.Comment: Knowledge-based System
Mobile Cloud Support for Semantic-Enriched Speech Recognition in Social Care
Nowadays, most users carry high computing power mobile devices where speech recognition is certainly one of the main technologies available in every modern smartphone, although battery draining and application performance (resource shortage) have a big impact on the experienced quality. Shifting applications and services to the cloud may help to improve mobile user satisfaction as demonstrated by several ongoing efforts in the mobile cloud area. However, the quality of speech recognition is still not sufficient in many complex cases to replace the common hand written text, especially when prompt reaction to short-term provisioning requests is required. To address the new scenario, this paper proposes a mobile cloud infrastructure to support the extraction of semantics information from speech recognition in the Social Care domain, where carers have to speak about their patients conditions in order to have reliable notes used afterward to plan the best support. We present not only an architecture proposal, but also a real prototype that we have deployed and thoroughly assessed with different queries, accents, and in presence of load peaks, in our experimental mobile cloud Platform as a Service (PaaS) testbed based on Cloud Foundry
Elearning, Communication and Open-data: Massive Mobile, Ubiquitous and Open Learning
ABSTRACT: In MOOCs, learning analytics have to be addressed to the various types of learners that participate. This deliverable describes indicators that enable both teachers and learner to monitor the progress and performance as well as identify whether there are learners at risk of dropping out. How these indicators should be computed and displayed to end users by means of dashboards is also explained. Furthermore a proposal based on xAPI statements for storing relevant data and events is provided
ECO D2.5 Learning analytics requirements and metrics report
In MOOCs, learning analytics have to be addressed to the various types of learners that participate. This deliverable describes indicators that enable both teachers and learner to monitor the progress and performance as well as identify whether there are learners at risk of dropping out. How these indicators should be computed and displayed to end users by means of dashboards is also explained. Furthermore a proposal based on xAPI statements for storing relevant data and events is provided.Part of the work carried out has been funded with support from the European Commission, under the ICT Policy Support Programme, as part of the Competitiveness and Innovation Framework Programme (CIP) in the ECO project under grant agreement n° 21127
Recommended from our members
Exploiting Social Media Sources for Search, Fusion and Evaluation
The web contains heterogeneous information that is generated with different characteristics and is presented via different media. Social media, as one of the largest content carriers, has generated information from millions of users worldwide, creating material rapidly in all types of forms such as comments, images, tags, videos and ratings, etc. In social applications, the formation of online communities contributes to conversations of substantially broader aspects, as well as unfiltered opinions about subjects that are rarely covered in public media. Information accrued on social platforms, therefore, presents a unique opportunity to augment web sources such as Wikipedia or news pages, which are usually characterized as being more formal. The goal of this dissertation is to investigate in depth how social data can be exploited and applied in the context of three fundamental information retrieval (IR) tasks: search, fusion, and evaluation. Improving search performance has consistently been a major focus in the IR community. Given the in-depth discussions and active interactions contained in social media, we present approaches to incorporating this type of data to improve search on general web corpora. In particular, we propose two graph-based frameworks, social anchor and information network, to associate related web and social content, where information sources of diverse characteristics can be used to complement each other in a unified manner. We investigate how the enriched representation can potentially reduce vocabulary mismatch and improve retrieval effectiveness. Presenting social media content to users is valuable particularly for queries intended for time-sensitive events or community opinions. Current major search engines commonly blend results from different search services (or verticals) into core web results. Motivated by this real-world need, we explore ways to merge results from different web and social services into a single ranked list. We present an optimization framework for fusion, where impact of documents, ranked lists, and verticals can be modeled simultaneously to maximize performance. Evaluating search system performance has largely relied on creating reusable test collections in IR. Traditional ways to creating evaluation sets can require substantial manual effort. To reduce such effort, we explore an approach to automating the process of collecting pairs of queries and relevance judgments, using high quality social media, Community Question Answering (CQA). Our approach is based on the idea that CQA services support platforms for users to raise questions and to share answers, therefore encoding the associations between real user information needs and real user assessments. To demonstrate the effectiveness of our approaches, we conduct extensive retrieval and fusion experiments, as well as verify the reliability of the new, CQA-based evaluation test sets
Learning Social Links and Communities from Interaction, Topical, and Spatio-Temporal Information
The immense popularity of today's social networks has lead to the availability and accessibility of vast amounts of data created by users on a daily basis. Various types of
information can be extracted from such data, for example, interactions among users, topics of user postings,
and geographic locations of users. While most of the existing works on social network analysis, in particular those focusing on social links and communities, rely on explicit and static link structures among users,
extracting knowledge from exploiting more features embedded in user-generated data is another important direction that only recently has gained more attention.
Initial studies employing this approach show good results in terms of a better understanding latent interactions among users.
In the context of this dissertation, multiple features embedded in user-generated data are investigated
to develop new models and algorithms for (1) revealing hidden social links between users and (2)
extracting and analyzing dynamic feature-based communities in social networks.
We introduce two approaches for extracting and measuring interpretable and meaningful
social links between users. One is based on the participation of users in threads
of discussions. The other one relies on the social characteristics of users as reflected in their postings.
A novel probabilistic model called rLinkTopic is developed to address the problem of extracting
a new type of feature-based community called regional LinkTopic: a community of users
that are geographically close to each other over time, have common interests indicated by the topical
similarity of their postings, and are contextually linked to each other.
Based on the rLinkTopic model, a comprehensive framework called ErLinkTopic is developed that allows
to extract and capture complex changes in the features describing regional LinkTopic communities, for example, the community membership of users and topics of communities.
Our framework provides a novel basis for important studies such as exploring social characteristics of users in geographic regions
and predicting the evolution of user communities.
For each approach developed in this dissertation, extensive comparative experiments are conducted using data from real-world social networks to validate
the proposed models and algorithms in terms of effectiveness and efficiency. The experimental results are further discussed in detail to show improvements over
existing approaches and the applicability and advantages of our models in terms of learning social links and communities from user-generated data
Contributions to the cornerstones of interaction in visualization: strengthening the interaction of visualization
Visualization has become an accepted means for data exploration and analysis. Although interaction is an important component of visualization approaches, current visualization research pays less attention to interaction than to aspects of the graphical representation. Therefore, the goal of this work is to strengthen the interaction side of visualization. To this end, we establish a unified view on interaction in visualization. This unified view covers four cornerstones: the data, the tasks, the technology, and the human.Visualisierung hat sich zu einem unverzichtbaren Werkzeug für die Exploration und Analyse von Daten entwickelt. Obwohl Interaktion ein wichtiger Bestandteil solcher Werkzeuge ist, wird der Interaktion in der aktuellen Visualisierungsforschung weniger Aufmerksamkeit gewidmet als Aspekten der graphischen Repräsentation. Daher ist es das Ziel dieser Arbeit, die Interaktion im Bereich der Visualisierung zu stärken. Hierzu wird eine einheitliche Sicht auf Interaktion in der Visualisierung entwickelt
- …