146 research outputs found

    Web crawler research methodology

    Get PDF
    In economic and social sciences it is crucial to test theoretical models against reliable and big enough databases. The general research challenge is to build up a well-structured database that suits well to the given research question and that is cost efficient at the same time. In this paper we focus on crawler programs that proved to be an effective tool of data base building in very different problem settings. First we explain how crawler programs work and illustrate a complex research process mapping business relationships using social media information sources. In this case we illustrate how search robots can be used to collect data for mapping complex network relationship to characterize business relationships in a well defined environment. After that extend the case and present a framework of three structurally different research models where crawler programs can be applied successfully: exploration, classification and time series analysis. In the case of exploration we present findings about the Hungarian web agency industry when no previous statistical data was available about their operations. For classification we show how the top visited Hungarian web domains can be divided into predefined categories of e-business models. In the third research we used a crawler to gather the values of concrete pre-defined records containing ticket prices of low cost airlines from one single site. Based on the experiences we highlight some conceptual conclusions and opportunities of crawler based research in e-business. --e-business research,web search,web crawler,Hungarian web,social network analyis

    Hyp3rArmor: reducing web application exposure to automated attacks

    Full text link
    Web applications (webapps) are subjected constantly to automated, opportunistic attacks from autonomous robots (bots) engaged in reconnaissance to discover victims that may be vulnerable to specific exploits. This is a typical behavior found in botnet recruitment, worm propagation, largescale fingerprinting and vulnerability scanners. Most anti-bot techniques are deployed at the application layer, thus leaving the network stack of the webapp’s server exposed. In this paper we present a mechanism called Hyp3rArmor, that addresses this vulnerability by minimizing the webapp’s attack surface exposed to automated opportunistic attackers, for JavaScriptenabled web browser clients. Our solution uses port knocking to eliminate the webapp’s visible network footprint. Clients of the webapp are directed to a visible static web server to obtain JavaScript that authenticates the client to the webapp server (using port knocking) before making any requests to the webapp. Our implementation of Hyp3rArmor, which is compatible with all webapp architectures, has been deployed and used to defend single and multi-page websites on the Internet for 114 days. During this time period the static web server observed 964 attempted attacks that were deflected from the webapp, which was only accessed by authenticated clients. Our evaluation shows that in most cases client-side overheads were negligible and that server-side overheads were minimal. Hyp3rArmor is ideal for critical systems and legacy applications that must be accessible on the Internet. Additionally Hyp3rArmor is composable with other security tools, adding an additional layer to a defense in depth approach.This work has been supported by the National Science Foundation (NSF) awards #1430145, #1414119, and #1012798

    PDFS: Practical Data Feed Service for Smart Contracts

    Full text link
    Smart contracts are a new paradigm that emerged with the rise of the blockchain technology. They allow untrusting parties to arrange agreements. These agreements are encoded as a programming language code and deployed on a blockchain platform, where all participants execute them and maintain their state. Smart contracts are promising since they are automated and decentralized, thus limiting the involvement of third trusted parties, and can contain monetary transfers. Due to these features, many people believe that smart contracts will revolutionize the way we think of distributed applications, information sharing, financial services, and infrastructures. To release the potential of smart contracts, it is necessary to connect the contracts with the outside world, such that they can understand and use information from other infrastructures. For instance, smart contracts would greatly benefit when they have access to web content. However, there are many challenges associated with realizing such a system, and despite the existence of many proposals, no solution is secure, provides easily-parsable data, introduces small overheads, and is easy to deploy. In this paper we propose PDFS, a practical system for data feeds that combines the advantages of the previous schemes and introduces new functionalities. PDFS extends content providers by including new features for data transparency and consistency validations. This combination provides multiple benefits like content which is easy to parse and efficient authenticity verification without breaking natural trust chains. PDFS keeps content providers auditable, mitigates their malicious activities (like data modification or censorship), and allows them to create a new business model. We show how PDFS is integrated with existing web services, report on a PDFS implementation and present results from conducted case studies and experiments.Comment: Blockchain; Smart Contracts; Data Authentication; Ethereu

    Next Generation Repositories: Behaviours and Technical Recommendations of the COAR Next Generation Repositories Working Group

    Get PDF
    In April 2016, the Confederation of Open Access Repositories (COAR) launched the Next Generation Repository Working Group to identify new functionalities and technologies for repositories. In this report, we are pleased to present the results of the work of this group, including recommendations for the adoption of new technologies, standards, and protocols that will help repositories become more integrated into the web environment and enable them to play a larger role in the scholarly communication ecosystem. At COAR, we believe the globally distributed network of more than 3000 repositories can be leveraged to create a more sustainable and innovative system for sharing and building on the results of research. Collectively, repositories can provide a comprehensive view of the research of the whole world, while also enabling each scholar and institution to participate in the global network of scientific and scholarly enquiry. Building additional services such as standardized usage metrics, peer review and social networking on top of a trusted global network of repositories has the potential to offer a viable alternative. The vision underlying the work of Next Generation Repositories is, “to position repositories as the foundation for a distributed, globally networked infrastructure for scholarly communication, on top of which layers of value added services will be deployed, thereby transforming the system, making it more research-centric, open to and supportive of innovation , while also collectively managed by the scholarly community.

    Research and Development Workstation Environment: the new class of Current Research Information Systems

    Get PDF
    Against the backdrop of the development of modern technologies in the field of scientific research the new class of Current Research Information Systems (CRIS) and related intelligent information technologies has arisen. It was called - Research and Development Workstation Environment (RDWE) - the comprehensive problem-oriented information systems for scientific research and development lifecycle support. The given paper describes design and development fundamentals of the RDWE class systems. The RDWE class system's generalized information model is represented in the article as a three-tuple composite web service that include: a set of atomic web services, each of them can be designed and developed as a microservice or a desktop application, that allows them to be used as an independent software separately; a set of functions, the functional filling-up of the Research and Development Workstation Environment; a subset of atomic web services that are required to implement function of composite web service. In accordance with the fundamental information model of the RDWE class the system for supporting research in the field of ontology engineering - the automated building of applied ontology in an arbitrary domain area, scientific and technical creativity - the automated preparation of application documents for patenting inventions in Ukraine was developed. It was called - Personal Research Information System. A distinctive feature of such systems is the possibility of their problematic orientation to various types of scientific activities by combining on a variety of functional services and adding new ones within the cloud integrated environment. The main results of our work are focused on enhancing the effectiveness of the scientist's research and development lifecycle in the arbitrary domain area.Comment: In English, 13 pages, 1 figure, 1 table, added references in Russian. Published. Prepared for special issue (UkrPROG 2018 conference) of the scientific journal "Problems of programming" (Founder: National Academy of Sciences of Ukraine, Institute of Software Systems of NAS Ukraine

    A SERVICE PROTOTYPE TO VALIDATE RESULT REVELATION OF OUTSOURCED REGULAR ITEMSET PROSPECTED

    Get PDF
    Within previously mentioned plaster, we think about the issue of checking if the retainer got here subsidize proper and stop overrun itemset. There are plenty of you can causes of your association overhanging uphold in right kind solutions. Our intention will be to fashion skillful and robust purity credentials techniques to take such helper a certain could return improper and incomplete attend itemset. Our experiments show the effectiveness and efficiency in our concepts. The host performs haunt itemset mining around the received dataset and returns the mining leads to the customer. We permit the customer to make use of privacy-preserving play itemset mining algorithms. We optimize the testimony formula by reduction of your amount of reasons for right kindness and integrity evidence. The right kindness credentials in the customer part is easy. The customer uses the menial’s data to ensure the itemset the one in question every MPB node matches is haunt by running the set intersection averment protocol. A naïve method of certify the plenum of MPB nodes will be that one the customer re-computes MPB taken away FS, which can close boulevard of nearly while outlay. We form a much more active technique as suite. we eye the conduct of certification apprehension inside the host part and information in the customer view and explored numerous factors the one in question change up the scoop opera in our deterministic come, with a range of miscalculation rate, commonplace itemset of a variety of lengths, and a variety of table sizes

    Determining Unique Agents by Evaluating Web Form Interaction

    Get PDF
    Because of the inherent risks in today’s online activities, it becomes imperative to identify a malicious user masquerading as someone else. Incorporating biometric analysis enhances the confidence of authenticating valid users over the Internet while providing additional layers of security with no hindrance to the end user. Through the analysis of traffic patterns and HTTP Header analysis, the detection and early refusal of robot agents plays a great role in reducing fraudulent login attempts

    Introducing CARONTE: a Crawler for Adversarial Resources Over Non Trusted Environments

    Get PDF
    The monitoring of underground criminal activities is often automated to maximize the data collection and to train ML models to automatically adapt data collection tools to different communities. On the other hand, sophisticated adversaries may adopt crawling-detection capabilities that may significantly jeopardize researchers' opportunities to perform the data collection, for example by putting their accounts under the spotlight and being expelled from the community. This is particularly undesirable in prominent and high-profile criminal communities where entry costs are significant (either monetarily or for example for background checking or other trust-building mechanisms). This work presents CARONTE, a tool to semi-automatically learn virtually any forum structure for parsing and data-extraction, while maintaining a low profile for the data collection and avoiding the requirement of collecting massive datasets to maintain tool scalability. We showcase CARONTE against four underground forum communities, and show that from the adversary's perspective CARONTE maintains a profile similar to humans, whereas state-of-the-art crawling tools show clearly distinct and easy to detect patterns of automated activity
    corecore