Search CORE

3,501 research outputs found

Web Data Extraction, Applications and Techniques: A Survey

Author: Abel
Amalfitano
Balduzzi
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Berger
Berthold
Bettencourt
Califf
Catanese
Chang
Chen
Chen
Chen
Collins
Conover
Crandall
Crescenzi
Crescenzi
Dalvi
Dalvi
De Meo
De Meo
Doan
Emilio Ferrara
Ferrara
Ferrara
Ferrara
Ferrara
Ferrara
Flesca
Freitag
Furche
Gatterbauer
Gatterbauer
Giacomo Fiumara
Gjoka
Gkotsis
Gottlob
Gottlob
Hammersley
Han
Hecht
Hsu
Irmak
Khare
Kim
Kinsella
Kleinberg
Kleinberg
Kohlschütter
Kokkoras
Kokkoras
Kokkoras
Krüpl
Kushmerick
Kwak
Laender
Liu
Manning
Masanès
Mathes
Meng
Mislove
Monge
Muslea
Oro
Pan
Pasquale De Meo
Perito
Phan
Plake
Rahm
Rahm
Reis
Robert Baumgartner
Sahuguet
Sarawagi
Schifanella
Selkow
Shi
Soderland
Szomszor
Turmo
Vosecky
Wang
Wang
Weikum
Wilson
Winograd
Yang
Ye
Zafarani
Zanasi
Zhai
Zhang
Zhang
Publication venue: 'Elsevier BV'
Publication date: 09/06/2014
Field of study

Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction. This survey aims at providing a structured and comprehensive overview of the literature in the field of Web Data Extraction. We provided a simple classification framework in which existing Web Data Extraction applications are grouped into two main classes, namely applications at the Enterprise level and at the Social Web level. At the Enterprise level, Web Data Extraction techniques emerge as a key tool to perform data analysis in Business and Competitive Intelligence systems as well as for business process re-engineering. At the Social Web level, Web Data Extraction techniques allow to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users and this offers unprecedented opportunities to analyze human behavior at a very large scale. We discuss also the potential of cross-fertilization, i.e., on the possibility of re-using Web Data Extraction techniques originally designed to work in a given domain, in other domains.Comment: Knowledge-based System

arXiv.org e-Print Archive

Crossref

Implementing a bank sales analytics solution and a predictive model for the next best offer

Author: Abbass Ziad El
Publication venue
Publication date: 18/01/2019
Field of study

Internship Report presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsIn the banking industry, the quantity of information that is processed is huge. Knowing also that clients are doted with changing needs every time, companies must adapt their approaches to attract clients with the best offers. That can be done by various machine learning and data mining techniques that enable them to understand better the clients. Also, internally, banks should be equipped with fast and efficient processes that enable them to take quickly the best decision. That is why real-time reporting tools should be implemented as an upper layer of the data sources. In this optic, this internship report is presenting 2 ambitious projects that aim to leverage Millennium BCP bank to a greater level in Analytics and Data Science. The first one is about building a Sales Analytics Solution to track weekly sales of retail products in the bank. The second one is about building a mechanism that will help reach to each client’s best adequate product to recommend

Repositório da Universidade Nova de Lisboa

Algorithms for Academic Search and Recommendation Systems

Author: Amolochitis Emmanouil
Publication venue
Publication date: 01/01/2014
Field of study

VBN

Towards platforms for improved recommender systems at social media scale

Author: Sowinski Christina Diedhiou
Publication venue
Publication date: 01/09/2019
Field of study

Portsmouth University Research Portal (Pure)

Effects of motion on jet exhaust noise from aircraft

Author: Berman C. H.
Chun K. S.
Cowan S. J.
Publication venue
Publication date
Field of study

The various problems involved in the evaluation of the jet noise field prevailing between an observer on the ground and an aircraft in flight in a typical takeoff or landing approach pattern were studied. Areas examined include: (1) literature survey and preliminary investigation, (2) propagation effects, (3) source alteration effects, and (4) investigation of verification techniques. Sixteen problem areas were identified and studied. Six follow-up programs were recommended for further work. The results and the proposed follow-on programs provide a practical general technique for predicting flyover jet noise for conventional jet nozzles

NASA Technical Reports Server

Recommender systems in industrial contexts

Author: Meyer Frank
Publication venue
Publication date: 25/01/2012
Field of study

This thesis consists of four parts: - An analysis of the core functions and the prerequisites for recommender systems in an industrial context: we identify four core functions for recommendation systems: Help do Decide, Help to Compare, Help to Explore, Help to Discover. The implementation of these functions has implications for the choices at the heart of algorithmic recommender systems. - A state of the art, which deals with the main techniques used in automated recommendation system: the two most commonly used algorithmic methods, the K-Nearest-Neighbor methods (KNN) and the fast factorization methods are detailed. The state of the art presents also purely content-based methods, hybridization techniques, and the classical performance metrics used to evaluate the recommender systems. This state of the art then gives an overview of several systems, both from academia and industry (Amazon, Google ...). - An analysis of the performances and implications of a recommendation system developed during this thesis: this system, Reperio, is a hybrid recommender engine using KNN methods. We study the performance of the KNN methods, including the impact of similarity functions used. Then we study the performance of the KNN method in critical uses cases in cold start situation. - A methodology for analyzing the performance of recommender systems in industrial context: this methodology assesses the added value of algorithmic strategies and recommendation systems according to its core functions.Comment: version 3.30, May 201

arXiv.org e-Print Archive

Theses.fr

Active caching for recommender systems

Author: Qasim Muhammad Umar
Publication venue: Digital Commons @ NJIT
Publication date: 31/05/2011
Field of study

Web users are often overwhelmed by the amount of information available while carrying out browsing and searching tasks. Recommender systems substantially reduce the information overload by suggesting a list of similar documents that users might find interesting. However, generating these ranked lists requires an enormous amount of resources that often results in access latency. Caching frequently accessed data has been a useful technique for reducing stress on limited resources and improving response time. Traditional passive caching techniques, where the focus is on answering queries based on temporal locality or popularity, achieve a very limited performance gain. In this dissertation, we are proposing an ‘active caching’ technique for recommender systems as an extension of the caching model. In this approach estimation is used to generate an answer for queries whose results are not explicitly cached, where the estimation makes use of the partial order lists cached for related queries. By answering non-cached queries along with cached queries, the active caching system acts as a form of query processor and offers substantial improvement over traditional caching methodologies. Test results for several data sets and recommendation techniques show substantial improvement in the cache hit rate, byte hit rate and CPU costs, while achieving reasonable recall rates. To ameliorate the performance of proposed active caching solution, a shared neighbor similarity measure is introduced which improves the recall rates by eliminating the dependence on monotinicity in the partial order lists. Finally, a greedy balancing cache selection policy is also proposed to select most appropriate data objects for the cache that help to improve the cache hit rate and recall further

Digital Commons @ New Jersey Institute of Technology (NJIT)

The Effect of Storage Temperature for the Detection of Silver Nanoparticles via Engineered Biomolecules

Author: Agustin Yuana Elly
Shen-Long Tsai
Publication venue: Universitas Indonesia, Fakutas Teknik
Publication date: 01/08/2015
Field of study

Temperature plays an important role in biology as a way to regulate reaction. In this study, we report the effect of storage temperature (4, 25, and 37oC) for the detection of silver nanoparticles via engineered biomolecules by monitoring the fluorescence intensity. We genetically engineered a biomolecule consisting of silver binding peptide that fused with cellulose binding domain and green fluorescent protein (GFP). This modular protein was a genetically designed peptide, possesses unique and specific interaction with cellulose as a matrix immobilized surface and can be able to capture silver nanoparticle from wastewater solution. Samples were instrumentally analysed everyday. We aim to assess the long-term stability of our genetically modular protein. This strategy was demonstrated a rapid and green environmentally monitoring

University of Surabaya Institutional Repository

Content Recommendation Through Linked Data

Author: Vagliano Iacopo
Publication venue: Politecnico di Torino
Publication date: 01/01/2017
Field of study

Nowadays, people can easily obtain a huge amount of information from the Web, but often they have no criteria to discern it. This issue is known as information overload. Recommender systems are software tools to suggest interesting items to users and can help them to deal with a vast amount of information. Linked Data is a set of best practices to publish data on the Web, and it is the basis of the Web of Data, an interconnected global dataspace. This thesis discusses how to discover information useful for the user from the vast amount of structured data, and notably Linked Data available on the Web. The work addresses this issue by considering three research questions: how to exploit existing relationships between resources published on the Web to provide recommendations to users; how to represent the user and his context to generate better recommendations for the current situation; and how to effectively visualize the recommended resources and their relationships. To address the first question, the thesis proposes a new algorithm based on Linked Data which exploits existing relationships between resources to recommend related resources. The algorithm was integrated into a framework to deploy and evaluate Linked Data based recommendation algorithms. In fact, a related problem is how to compare them and how to evaluate their performance when applied to a given dataset. The user evaluation showed that our algorithm improves the rate of new recommendations, while maintaining a satisfying prediction accuracy. To represent the user and their context, this thesis presents the Recommender System Context ontology, which is exploited in a new context-aware approach that can be used with existing recommendation algorithms. The evaluation showed that this method can significantly improve the prediction accuracy. As regards the problem of effectively visualizing the recommended resources and their relationships, this thesis proposes a visualization framework for DBpedia (the Linked Data version of Wikipedia) and mobile devices, which is designed to be extended to other datasets. In summary, this thesis shows how it is possible to exploit structured data available on the Web to recommend useful resources to users. Linked Data were successfully exploited in recommender systems. Various proposed approaches were implemented and applied to use cases of Telecom Italia

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino