1,292 research outputs found

    Web Data Extraction, Applications and Techniques: A Survey

    Full text link
    Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction. This survey aims at providing a structured and comprehensive overview of the literature in the field of Web Data Extraction. We provided a simple classification framework in which existing Web Data Extraction applications are grouped into two main classes, namely applications at the Enterprise level and at the Social Web level. At the Enterprise level, Web Data Extraction techniques emerge as a key tool to perform data analysis in Business and Competitive Intelligence systems as well as for business process re-engineering. At the Social Web level, Web Data Extraction techniques allow to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users and this offers unprecedented opportunities to analyze human behavior at a very large scale. We discuss also the potential of cross-fertilization, i.e., on the possibility of re-using Web Data Extraction techniques originally designed to work in a given domain, in other domains.Comment: Knowledge-based System

    Academic writing for IT students

    Get PDF
    This textbook is intended for Master and PhD Information Technology students (B1-C1 level of English proficiency). The instructions of how to write a research paper in English and the relevant exercises are given. The peculiarities of each section of a paper are presented. The exercises are based on real science materials taken from peer-reviewed journals. The subject area covers a wide scope of different Information Technology domains

    The Repeatability Experiment of SIGMOD 2008

    Get PDF
    SIGMOD 2008 was the first database conference that offered to test submitters' programs against their data to verify the experiments published. This paper discusses the rationale for this effort, the community's reaction, our experiences, and advice for future similar efforts

    Tapjacking Threats and Mitigation Techniques for Android Applications

    Get PDF
    With the increased dependency on web applications through mobile devices, malicious attack techniques have now shifted from traditional web applications running on desktop or laptop (allowing mouse click- based interactions) to mobile applications running on mobile devices (allowing touch-based interactions). Clickjacking is a type of malicious attack originating in web applications, where victims are lured to click on seemingly benign objects in web pages. However, when clicked, unintended actions are performed without the user’s knowledge. In particular, it is shown that users are lured to touch an object of an application triggering unintended actions not actually intended by victims. This new form of clickjacking on mobile devices is called tapjacking. There is little research that thoroughly investigates attacks and mitigation techniques due to tapjacking in mobile devices. In this thesis, we identify coding practices that can be helpful for software practitioners to avoid malicious attacks and define a detection techniques to prevent the consequence of malicious attacks for the end users. We first find out where tapjacking attack type falls within the broader literature of malware, in particular for Android malware. In this direction, we propose a classification of Android malware. Then, we propose a novel technique based on Kullback-Leibler Divergence (KLD) to identify possible tapjacking behavior in applications. We validate the approach with a set of benign and malicious android applications. We also implemented a prototype tool for detecting tapjacking attack symptom using the KLD based measurement. The evaluation results show that tapjacking can be detected effectively with KLD

    Achieving the Potential: The Future of Federal e-Rulemaking: A Report to Congress and the President

    Get PDF
    Federal regulations are among the most important and widely used tools for implementing the laws of the land – affecting the food we eat, the air we breathe, the safety of consumer products, the quality of the workplace, the soundness of our financial institutions, the smooth operation of our businesses, and much more. Despite the central role of rulemaking in executing public policy, both regulated entities (especially small businesses) and the general public find it extremely difficult to follow the regulatory process; actively participating in it is even harder. E-rulemaking is the use of technology (particularly, computers and the World Wide Web) to: (i) help develop proposed rules; (ii) make rulemaking materials broadly available online, along with tools for searching, analyzing, explaining and managing the information they contain; and (iii) enable more effective and diverse public participation. E-rulemaking has transformative potential to increase the comprehensibility, transparency and accountability of the regulatory process. Specifically, e-rulemaking – effectively implemented – can open the rulemaking process to a broader range of participants, offer easier access to rulemaking and implementation materials, facilitate dialogue among interested parties about policy and enforcement, enhance regulatory coordination, and help produce better decisions that lead to more effective, accepted and enforceable rules. If realized, this vision would greatly strengthen civic participation and our democratic form of government

    Supporting Multi-Domain Model Management

    Get PDF
    Model-driven engineering has been used in different domains such as software engineering, robotics, and automotive. This approach has models as the primary artifacts, and it is expected to improve quality of system specification and design, as well as the communication among the development team. Managing models that belong to the same domain might not be a complex task because of the features provided by the available development tools. However, managing interrelated models of different domains is challenging. A robot is an example of such a multi-domain system. To develop it one might need to combine models created by experts from mechanics, electronics and software domains. These models might be created using domain specific tools of each domain, and a change in one model of one domain might impact a model from a different domain causing inconsistency in the entire system. This thesis therefore aims to facilitate the evolution of the models in this multi-domain setting. It starts with a systematic literature review in order to identify the open issues, and strategies used to manage models from different domains. We identified that making explicit the relationship between models from different domains can support the models maintenance, making it easy to recognize affected models because of a change. The following step was to investigate ways of extracting information from different engineering models that were created using different modeling notations. For this goal, we required a uniform approach that would be independent from the peculiarities of the notations. This uniform approach can only be based on elements typically present in various modeling notations, i.e., text, boxes, and lines. Thus, we investigated the suitability of optical character recognition (OCR) for extracting textual elements from models from different domains. We also identified the common errors made by the off-the-shelf OCR services, and we proposed two approaches to correct one of these errors. After that, we used name matching techniques on the textual elements extracted by OCR to identify relationships between models from different domains. To conclude, we created an infrastructure that combines all the previous elements into one single tool that can also store the relationships in a structured manner making it easier to maintain the consistency of an entire system. We evaluated it by means of an observational study with a multidisciplinary team that builds autonomous robots designed to play football

    Achieving the Potential: The Future of Federal e-Rulemaking: A Report to Congress and the President

    Get PDF
    Federal regulations are among the most important and widely used tools for implementing the laws of the land – affecting the food we eat, the air we breathe, the safety of consumer products, the quality of the workplace, the soundness of our financial institutions, the smooth operation of our businesses, and much more. Despite the central role of rulemaking in executing public policy, both regulated entities (especially small businesses) and the general public find it extremely difficult to follow the regulatory process; actively participating in it is even harder. E-rulemaking is the use of technology (particularly, computers and the World Wide Web) to: (i) help develop proposed rules; (ii) make rulemaking materials broadly available online, along with tools for searching, analyzing, explaining and managing the information they contain; and (iii) enable more effective and diverse public participation. E-rulemaking has transformative potential to increase the comprehensibility, transparency and accountability of the regulatory process. Specifically, e-rulemaking – effectively implemented – can open the rulemaking process to a broader range of participants, offer easier access to rulemaking and implementation materials, facilitate dialogue among interested parties about policy and enforcement, enhance regulatory coordination, and help produce better decisions that lead to more effective, accepted and enforceable rules. If realized, this vision would greatly strengthen civic participation and our democratic form of government
    corecore