3 research outputs found

    Staged event-driven architecture as a micro-architecture of distributed and pluginable crawling platform

    Get PDF
    There are many crawling systems available on the market but they are rather close systems dedicated for performing particular kind and class of tasks with predefined set of scope, strategy etc. In real life however there are meaningful groups of users (e.g. marketing, criminal or governmental analysts) requiring not just a yet another crawling system dedicated for performing predefined tasks. They need rather easy-to-use, user friendly all-in-one studio for not only executing and running internet robots and crawlers, but also for (graphical) (re)defining and (re)composing crawlers according to dynamically changing requirements and use-cases. To realize the above-mentioned idea, Cassiopeia framework has been designed and developed. One has to remember, however, that enormous size and unimaginable structural complexity of WWW network are the reasons that, from a technical and architectural point of view, developing effective internet robots – and the more so developing a framework supporting graphical robots’ composition – becomes a really challenging task. The crucial aspect in the context of crawling efficiency and scalability is concurrency model applied. There are two the most typical concurrency management models i.e. classical concurrency based on the pool of threads and processes and event-driven concurrency. None of them are ideal approaches. That is why, research on alternative models is still conducted to propose efficient and convenient architecture for concurrent and distributed applications. One of promising models is staged event-driven architecture mixing to some extent both of above mentioned classical approaches and providing some additional benefits such as splitting application into separate stages connected by events queues – what is interesting taking requirements about crawler (re)composition into account. The goal of this paper is to present the idea and the PoC  implementation of Cassiopeia framework, with the special attention paid to its crucial architectural element i.e. design, implementation and applying of staged event-driven architecture being a micro-architecture of Cassiopeia’s agents i.e. its key computational and processing unit

    Cassiopeia – Towards a Distributed and Composable Crawling Platform, Journal of Telecommunications and Information Technology, 2014, nr 2

    Get PDF
    When it comes to designing and implementing crawling systems or Internet robots, it is of the utmost importance to first address efficiency and scalability issues (from a technical and architectural point of view), due to the enormous size and unimaginable structural complexity of the World Wide Web. There are, however, a significant number of users for whom flexibility and ease of execution are as important as efficiency. Running, defining, and composing Internet robots and crawlers according to dynamically-changing requirements and use-cases in the easiest possible way (e.g. in a graphical, drag & drop manner) is necessary especially for criminal analysts. The goal of this paper is to present the idea, design, crucial architectural elements, Proof- of-Concept (PoC) implementation, and preliminary experimental assessment of Cassiopeia framework, i.e. an all-in-one studio addressing both of the above-mentioned aspect
    corecore