3,147 research outputs found

    An Automated Algorithm for Extracting Website Skeleton

    Get PDF
    The huge amount of information available on the Web has attracted many research e#orts into developing wrappers that extract data from webpages. However, as most of the systems for generating wrappers focus on extracting data at page-level, data extraction at site-level remains a manual or semi-automatic process. In this paper, we study the problem of extracting website skeleton, i.e. extracting the underlying hyperlink structure that is used to organize the content pages in a given website. We propose an automated algorithm, called the Sew algorithm, to discover the skeleton of a website. Given a page, the algorithm examines hyperlinks in groups and identifies the navigation links that point to pages in the next level in the website structure. The entire skeleton is then constructed by recursively fetching pages pointed by the discovered links and analyzing these pages using the same process. Our experiments on real life websites show that the algorithm achieves a high recall with moderate precision

    Querying websites using compact skeletons

    Get PDF
    AbstractSeveral commercial applications, such as online comparison shopping and process automation, require integrating information that is scattered across multiple websites or XML documents. Much research has been devoted to this problem, resulting in several research prototypes and commercial implementations. Such systems rely on wrappers that provide relational or other structured interfaces to websites. Traditionally, wrappers have been constructed by hand on a per-website basis, constraining the scalability of the system. We introduce a website structure inference mechanism called compact skeletons that is a step in the direction of automated wrapper generation. Compact skeletons provide a transformation from websites or other hierarchical data, such as XML documents, to relational tables. We study several classes of compact skeletons and provide polynomial-time algorithms and heuristics for automated construction of compact skeletons from websites. Experimental results show that our heuristics work well in practice. We also argue that compact skeletons are a natural extension of commercially deployed techniques for wrapper construction

    Automated Video Analysis of Animal Movements Using Gabor Orientation Filters

    Get PDF
    To quantify locomotory behavior, tools for determining the location and shape of an animal’s body are a first requirement. Video recording is a convenient technology to store raw movement data, but extracting body coordinates from video recordings is a nontrivial task. The algorithm described in this paper solves this task for videos of leeches or other quasi-linear animals in a manner inspired by the mammalian visual processing system: the video frames are fed through a bank of Gabor filters, which locally detect segments of the animal at a particular orientation. The algorithm assumes that the image location with maximal filter output lies on the animal’s body and traces its shape out in both directions from there. The algorithm successfully extracted location and shape information from video clips of swimming leeches, as well as from still photographs of swimming and crawling snakes. A Matlab implementation with a graphical user interface is available online, and should make this algorithm conveniently usable in many other contexts

    Smart Intrusion Detection System for DMZ

    Get PDF
    Prediction of network attacks and machine understandable security vulnerabilities are complex tasks for current available Intrusion Detection System [IDS]. IDS software is important for an enterprise network. It logs security information occurred in the network. In addition, IDSs are useful in recognizing malicious hack attempts, and protecting it without the need for change to client‟s software. Several researches in the field of machine learning have been applied to make these IDSs better a d smarter. In our work, we propose approach for making IDSs more analytical, using semantic technology. We made a useful semantic connection between IDSs and National Vulnerability Databases [NVDs], to make the system semantically analyzed each attack logged, so it can perform prediction about incoming attacks or services that might be in danger. We built our ontology skeleton based on standard network security. Furthermore, we added useful classes and relations that are specific for DMZ network services. In addition, we made an option to mallow the user to update the ontology skeleton automatically according to the network needs. Our work is evaluated and validated using four different methods: we presented a prototype that works over the web. Also, we applied KDDCup99 dataset to the prototype. Furthermore,we modeled our system using queuing model, and simulated it using Anylogic simulator. Validating the system using KDDCup99 benchmark shows good results law false positive attacks prediction. Modeling the system in a queuing model allows us to predict the behavior of the system in a multi-users system for heavy network traffic

    Multi-Environment Model Estimation for Motility Analysis of \u3cem\u3eCaenorhabditis elegans\u3c/em\u3e

    Get PDF
    The nematode Caenorhabditis elegans is a well-known model organism used to investigate fundamental questions in biology. Motility assays of this small roundworm are designed to study the relationships between genes and behavior. Commonly, motility analysis is used to classify nematode movements and characterize them quantitatively. Over the past years, C. elegans’ motility has been studied across a wide range of environments, including crawling on substrates, swimming in fluids, and locomoting through microfluidic substrates. However, each environment often requires customized image processing tools relying on heuristic parameter tuning. In the present study, we propose a novel Multi Environment Model Estimation (MEME) framework for automated image segmentation that is versatile across various environments. The MEME platform is constructed around the concept of Mixture of Gaussian (MOG) models, where statistical models for both the background environment and the nematode appearance are explicitly learned and used to accurately segment a target nematode. Our method is designed to simplify the burden often imposed on users; here, only a single image which includes a nematode in its environment must be provided for model learning. In addition, our platform enables the extraction of nematode ‘skeletons’ for straightforward motility quantification. We test our algorithm on various locomotive environments and compare performances with an intensity-based thresholding method. Overall, MEME outperforms the threshold-based approach for the overwhelming majority of cases examined. Ultimately, MEME provides researchers with an attractive platform for C. elegans’ segmentation and ‘skeletonizing’ across a wide range of motility assays
    corecore