4,024 research outputs found

    Content-Aware DataGuides for Indexing Large Collections of XML Documents

    Get PDF
    XML is well-suited for modelling structured data with textual content. However, most indexing approaches perform structure and content matching independently, combining the retrieved path and keyword occurrences in a third step. This paper shows that retrieval in XML documents can be accelerated significantly by processing text and structure simultaneously during all retrieval phases. To this end, the Content-Aware DataGuide (CADG) enhances the wellknown DataGuide with (1) simultaneous keyword and path matching and (2) a precomputed content/structure join. Extensive experiments prove the CADG to be 50-90% faster than the DataGuide for various sorts of query and document, including difficult cases such as poorly structured queries and recursive document paths. A new query classification scheme identifies precise query characteristics with a predominant influence on the performance of the individual indices. The experiments show that the CADG is applicable to many real-world applications, in particular large collections of heterogeneously structured XML documents

    Workflow repository for providing configurable workflow in ERP

    Get PDF
    Workflow pada ERP dengan domain fungsi yang besar rentan dengan adanya duplikasi. Membuat workflow repository yang menyimpan berbagai macam workflow dari proses bisnis ERP yang dapat digunakan untuk menyusun workflow baru sesuai kebutuhan tenant baru Metode yang diusulkan: Metode yang diusulkan terdiri dari 2 tahapan, preprocessing dan processing. Tahap preprocessing bertujuan untuk mencari common dan sub variant dari existing workflow variant. Workflow variant yang disimpan oleh pengguna adalah Procure to Pay workflow. Variasi tersebut diseleksi berdasarkan kemiripannya dengan similarity filtering, kemudian dimerge untuk mencari common dan sub variantnya. Common dan sub variant disimpan menggunakan metadata yang dipetakan pada basis data relasional. Deteksi common dan sub variant workflow mencapai tingkat akurasi sebesar 92%. Ccommon workflow terdiri dari 3-common dari 8-variant workflow. Common workflow tersebut memiliki tingkat kompleksitas lebih rendah 10% dari model sebelumnya. Tahapan processing adalah tahapan penyediaan configurable workflow. Pengguna memasukan query model untuk mencari workflow yang diinginkan. Dengan menggunakan metode similarity filtering, didapatkan common dan/atau sub variant yang memungkinkan. Pengguna dapat menggunakan common workflow melalui workflow designer untuk melakukan rekomposisi ulang. Penyediaan configurable workflow oleh ERP mencapai tingkat 100% dimana apapun yang diinginkan pengguna dapat disediakaan workflownya oleh ERP, ataupun sebagai dasar membentuk workflow yang lain. Berdasarkan hasil percobaan, tempat penyimpanan workflow dapat dibangun dengan arsitektur yang diajukan dan mampu menyimpan dan menyediakan workflow. Tempat penyimpanan ERP mampu mendeteksi workflow yang bersifat common dan sub variant. Tempat penyimpanan ERP mampu menyediakan configurable workflow, dimana pengguna dapat memanfaatkan common dan sub variant workflow untuk menjadi dasar mengkomposisi workflow yang lain. =================================================================================================== Workflow in ERP which covered big domain faced duplication issues. Scope of this research was developing workflow from business process ERP which could be used for required workflow as user needs. Proposed approach consisted of 2 stages preprocessing and processing. Preprocessing stages aimed for finding common and variant of sub workflow based on existing workflow variant. The workflow variants that were stored by user were procured to pay workflow. The workflows was filtered by similarity filtering method then merged for identifying the common and variant of sub workflow. The common and sub variant workflow were stored using metadata that mapped into relational database. The common and variant of sub workflow detection achieved 92% accuracy. The common workflow consisted of 3- the common workflow from 8-variant workflow. The common workflow has 10% lesser complexity than its predecessor. Processing was providing configurable workflow. User inputted query model to find required workflow. Utilizing similarity filtering, possible the common and variant of sub workflow was collected. User used the common workflow through workflow designer to recompose. Providing configurable workflow ERP achieved 100%, where any user need would be provided by ERP, as workflow or as based template for creating other. Based on evaluation, repository was built based on proposed architecture and was able to store or provide workflow. Repository detected workflow whether common or variant of sub workflow. Repository ERP was able to provide configurable ERP, where user utilized common and variant of sub workflow as based for creating one of their need

    The Family of MapReduce and Large Scale Data Processing Systems

    Full text link
    In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a simple and powerful programming model that enables easy development of scalable parallel applications to process vast amounts of data on large clusters of commodity machines. It isolates the application from the details of running a distributed program such as issues on data distribution, scheduling and fault tolerance. However, the original implementation of the MapReduce framework had some limitations that have been tackled by many research efforts in several followup works after its introduction. This article provides a comprehensive survey for a family of approaches and mechanisms of large scale data processing mechanisms that have been implemented based on the original idea of the MapReduce framework and are currently gaining a lot of momentum in both research and industrial communities. We also cover a set of introduced systems that have been implemented to provide declarative programming interfaces on top of the MapReduce framework. In addition, we review several large scale data processing systems that resemble some of the ideas of the MapReduce framework for different purposes and application scenarios. Finally, we discuss some of the future research directions for implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author

    SQL Injection Detection Using Machine Learning Techniques and Multiple Data Sources

    Get PDF
    SQL Injection continues to be one of the most damaging security exploits in terms of personal information exposure as well as monetary loss. Injection attacks are the number one vulnerability in the most recent OWASP Top 10 report, and the number of these attacks continues to increase. Traditional defense strategies often involve static, signature-based IDS (Intrusion Detection System) rules which are mostly effective only against previously observed attacks but not unknown, or zero-day, attacks. Much current research involves the use of machine learning techniques, which are able to detect unknown attacks, but depending on the algorithm can be costly in terms of performance. In addition, most current intrusion detection strategies involve collection of traffic coming into the web application either from a network device or from the web application host, while other strategies collect data from the database server logs. In this project, we are collecting traffic from two points: the web application host, and a Datiphy appliance node located between the webapp host and the associated MySQL database server. In our analysis of these two datasets, and another dataset that is correlated between the two, we have been able to demonstrate that accuracy obtained with the correlated dataset using algorithms such as rule-based and decision tree are nearly the same as those with a neural network algorithm, but with greatly improved performance
    corecore