3,501 research outputs found

    Web Data Extraction, Applications and Techniques: A Survey

    Full text link
    Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction. This survey aims at providing a structured and comprehensive overview of the literature in the field of Web Data Extraction. We provided a simple classification framework in which existing Web Data Extraction applications are grouped into two main classes, namely applications at the Enterprise level and at the Social Web level. At the Enterprise level, Web Data Extraction techniques emerge as a key tool to perform data analysis in Business and Competitive Intelligence systems as well as for business process re-engineering. At the Social Web level, Web Data Extraction techniques allow to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users and this offers unprecedented opportunities to analyze human behavior at a very large scale. We discuss also the potential of cross-fertilization, i.e., on the possibility of re-using Web Data Extraction techniques originally designed to work in a given domain, in other domains.Comment: Knowledge-based System

    Implementing a bank sales analytics solution and a predictive model for the next best offer

    Get PDF
    Internship Report presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsIn the banking industry, the quantity of information that is processed is huge. Knowing also that clients are doted with changing needs every time, companies must adapt their approaches to attract clients with the best offers. That can be done by various machine learning and data mining techniques that enable them to understand better the clients. Also, internally, banks should be equipped with fast and efficient processes that enable them to take quickly the best decision. That is why real-time reporting tools should be implemented as an upper layer of the data sources. In this optic, this internship report is presenting 2 ambitious projects that aim to leverage Millennium BCP bank to a greater level in Analytics and Data Science. The first one is about building a Sales Analytics Solution to track weekly sales of retail products in the bank. The second one is about building a mechanism that will help reach to each client’s best adequate product to recommend

    Algorithms for Academic Search and Recommendation Systems

    Get PDF

    Effects of motion on jet exhaust noise from aircraft

    Get PDF
    The various problems involved in the evaluation of the jet noise field prevailing between an observer on the ground and an aircraft in flight in a typical takeoff or landing approach pattern were studied. Areas examined include: (1) literature survey and preliminary investigation, (2) propagation effects, (3) source alteration effects, and (4) investigation of verification techniques. Sixteen problem areas were identified and studied. Six follow-up programs were recommended for further work. The results and the proposed follow-on programs provide a practical general technique for predicting flyover jet noise for conventional jet nozzles

    Recommender systems in industrial contexts

    Full text link
    This thesis consists of four parts: - An analysis of the core functions and the prerequisites for recommender systems in an industrial context: we identify four core functions for recommendation systems: Help do Decide, Help to Compare, Help to Explore, Help to Discover. The implementation of these functions has implications for the choices at the heart of algorithmic recommender systems. - A state of the art, which deals with the main techniques used in automated recommendation system: the two most commonly used algorithmic methods, the K-Nearest-Neighbor methods (KNN) and the fast factorization methods are detailed. The state of the art presents also purely content-based methods, hybridization techniques, and the classical performance metrics used to evaluate the recommender systems. This state of the art then gives an overview of several systems, both from academia and industry (Amazon, Google ...). - An analysis of the performances and implications of a recommendation system developed during this thesis: this system, Reperio, is a hybrid recommender engine using KNN methods. We study the performance of the KNN methods, including the impact of similarity functions used. Then we study the performance of the KNN method in critical uses cases in cold start situation. - A methodology for analyzing the performance of recommender systems in industrial context: this methodology assesses the added value of algorithmic strategies and recommendation systems according to its core functions.Comment: version 3.30, May 201

    Active caching for recommender systems

    Get PDF
    Web users are often overwhelmed by the amount of information available while carrying out browsing and searching tasks. Recommender systems substantially reduce the information overload by suggesting a list of similar documents that users might find interesting. However, generating these ranked lists requires an enormous amount of resources that often results in access latency. Caching frequently accessed data has been a useful technique for reducing stress on limited resources and improving response time. Traditional passive caching techniques, where the focus is on answering queries based on temporal locality or popularity, achieve a very limited performance gain. In this dissertation, we are proposing an ‘active caching’ technique for recommender systems as an extension of the caching model. In this approach estimation is used to generate an answer for queries whose results are not explicitly cached, where the estimation makes use of the partial order lists cached for related queries. By answering non-cached queries along with cached queries, the active caching system acts as a form of query processor and offers substantial improvement over traditional caching methodologies. Test results for several data sets and recommendation techniques show substantial improvement in the cache hit rate, byte hit rate and CPU costs, while achieving reasonable recall rates. To ameliorate the performance of proposed active caching solution, a shared neighbor similarity measure is introduced which improves the recall rates by eliminating the dependence on monotinicity in the partial order lists. Finally, a greedy balancing cache selection policy is also proposed to select most appropriate data objects for the cache that help to improve the cache hit rate and recall further

    The Effect of Storage Temperature for the Detection of Silver Nanoparticles via Engineered Biomolecules

    Get PDF
    Temperature plays an important role in biology as a way to regulate reaction. In this study, we report the effect of storage temperature (4, 25, and 37oC) for the detection of silver nanoparticles via engineered biomolecules by monitoring the fluorescence intensity. We genetically engineered a biomolecule consisting of silver binding peptide that fused with cellulose binding domain and green fluorescent protein (GFP). This modular protein was a genetically designed peptide, possesses unique and specific interaction with cellulose as a matrix immobilized surface and can be able to capture silver nanoparticle from wastewater solution. Samples were instrumentally analysed everyday. We aim to assess the long-term stability of our genetically modular protein. This strategy was demonstrated a rapid and green environmentally monitoring

    Content Recommendation Through Linked Data

    Get PDF
    Nowadays, people can easily obtain a huge amount of information from the Web, but often they have no criteria to discern it. This issue is known as information overload. Recommender systems are software tools to suggest interesting items to users and can help them to deal with a vast amount of information. Linked Data is a set of best practices to publish data on the Web, and it is the basis of the Web of Data, an interconnected global dataspace. This thesis discusses how to discover information useful for the user from the vast amount of structured data, and notably Linked Data available on the Web. The work addresses this issue by considering three research questions: how to exploit existing relationships between resources published on the Web to provide recommendations to users; how to represent the user and his context to generate better recommendations for the current situation; and how to effectively visualize the recommended resources and their relationships. To address the first question, the thesis proposes a new algorithm based on Linked Data which exploits existing relationships between resources to recommend related resources. The algorithm was integrated into a framework to deploy and evaluate Linked Data based recommendation algorithms. In fact, a related problem is how to compare them and how to evaluate their performance when applied to a given dataset. The user evaluation showed that our algorithm improves the rate of new recommendations, while maintaining a satisfying prediction accuracy. To represent the user and their context, this thesis presents the Recommender System Context ontology, which is exploited in a new context-aware approach that can be used with existing recommendation algorithms. The evaluation showed that this method can significantly improve the prediction accuracy. As regards the problem of effectively visualizing the recommended resources and their relationships, this thesis proposes a visualization framework for DBpedia (the Linked Data version of Wikipedia) and mobile devices, which is designed to be extended to other datasets. In summary, this thesis shows how it is possible to exploit structured data available on the Web to recommend useful resources to users. Linked Data were successfully exploited in recommender systems. Various proposed approaches were implemented and applied to use cases of Telecom Italia
    • …
    corecore