4,024 research outputs found
Content-Aware DataGuides for Indexing Large Collections of XML Documents
XML is well-suited for modelling structured data with
textual content. However, most indexing approaches perform
structure and content matching independently, combining
the retrieved path and keyword occurrences in a third
step. This paper shows that retrieval in XML documents can
be accelerated significantly by processing text and structure
simultaneously during all retrieval phases. To this end,
the Content-Aware DataGuide (CADG) enhances the wellknown
DataGuide with (1) simultaneous keyword and path
matching and (2) a precomputed content/structure join. Extensive
experiments prove the CADG to be 50-90% faster
than the DataGuide for various sorts of query and document,
including difficult cases such as poorly structured
queries and recursive document paths. A new query classification
scheme identifies precise query characteristics with
a predominant influence on the performance of the individual
indices. The experiments show that the CADG is applicable
to many real-world applications, in particular large
collections of heterogeneously structured XML documents
Workflow repository for providing configurable workflow in ERP
Workflow pada ERP dengan domain fungsi yang besar rentan dengan
adanya duplikasi. Membuat workflow repository yang menyimpan berbagai
macam workflow dari proses bisnis ERP yang dapat digunakan untuk menyusun
workflow baru sesuai kebutuhan tenant baru
Metode yang diusulkan: Metode yang diusulkan terdiri dari 2 tahapan,
preprocessing dan processing. Tahap preprocessing bertujuan untuk mencari
common dan sub variant dari existing workflow variant. Workflow variant yang
disimpan oleh pengguna adalah Procure to Pay workflow. Variasi tersebut
diseleksi berdasarkan kemiripannya dengan similarity filtering, kemudian dimerge
untuk mencari common dan sub variantnya. Common dan sub variant disimpan
menggunakan metadata yang dipetakan pada basis data relasional. Deteksi
common dan sub variant workflow mencapai tingkat akurasi sebesar 92%.
Ccommon workflow terdiri dari 3-common dari 8-variant workflow. Common
workflow tersebut memiliki tingkat kompleksitas lebih rendah 10% dari model
sebelumnya.
Tahapan processing adalah tahapan penyediaan configurable workflow.
Pengguna memasukan query model untuk mencari workflow yang diinginkan.
Dengan menggunakan metode similarity filtering, didapatkan common dan/atau
sub variant yang memungkinkan. Pengguna dapat menggunakan common
workflow melalui workflow designer untuk melakukan rekomposisi ulang.
Penyediaan configurable workflow oleh ERP mencapai tingkat 100% dimana
apapun yang diinginkan pengguna dapat disediakaan workflownya oleh ERP,
ataupun sebagai dasar membentuk workflow yang lain. Berdasarkan hasil
percobaan, tempat penyimpanan workflow dapat dibangun dengan arsitektur yang
diajukan dan mampu menyimpan dan menyediakan workflow. Tempat
penyimpanan ERP mampu mendeteksi workflow yang bersifat common dan sub
variant. Tempat penyimpanan ERP mampu menyediakan configurable workflow,
dimana pengguna dapat memanfaatkan common dan sub variant workflow untuk
menjadi dasar mengkomposisi workflow yang lain.
===================================================================================================
Workflow in ERP which covered big domain faced duplication issues.
Scope of this research was developing workflow from business process ERP
which could be used for required workflow as user needs.
Proposed approach consisted of 2 stages preprocessing and processing.
Preprocessing stages aimed for finding common and variant of sub workflow
based on existing workflow variant. The workflow variants that were stored by
user were procured to pay workflow. The workflows was filtered by similarity
filtering method then merged for identifying the common and variant of sub
workflow. The common and sub variant workflow were stored using metadata
that mapped into relational database. The common and variant of sub workflow
detection achieved 92% accuracy. The common workflow consisted of 3- the
common workflow from 8-variant workflow. The common workflow has 10%
lesser complexity than its predecessor.
Processing was providing configurable workflow. User inputted query
model to find required workflow. Utilizing similarity filtering, possible the
common and variant of sub workflow was collected. User used the common
workflow through workflow designer to recompose. Providing configurable
workflow ERP achieved 100%, where any user need would be provided by ERP,
as workflow or as based template for creating other.
Based on evaluation, repository was built based on proposed architecture
and was able to store or provide workflow. Repository detected workflow whether
common or variant of sub workflow. Repository ERP was able to provide
configurable ERP, where user utilized common and variant of sub workflow as
based for creating one of their need
The Family of MapReduce and Large Scale Data Processing Systems
In the last two decades, the continuous increase of computational power has
produced an overwhelming flow of data which has called for a paradigm shift in
the computing architecture and large scale data processing mechanisms.
MapReduce is a simple and powerful programming model that enables easy
development of scalable parallel applications to process vast amounts of data
on large clusters of commodity machines. It isolates the application from the
details of running a distributed program such as issues on data distribution,
scheduling and fault tolerance. However, the original implementation of the
MapReduce framework had some limitations that have been tackled by many
research efforts in several followup works after its introduction. This article
provides a comprehensive survey for a family of approaches and mechanisms of
large scale data processing mechanisms that have been implemented based on the
original idea of the MapReduce framework and are currently gaining a lot of
momentum in both research and industrial communities. We also cover a set of
introduced systems that have been implemented to provide declarative
programming interfaces on top of the MapReduce framework. In addition, we
review several large scale data processing systems that resemble some of the
ideas of the MapReduce framework for different purposes and application
scenarios. Finally, we discuss some of the future research directions for
implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author
SQL Injection Detection Using Machine Learning Techniques and Multiple Data Sources
SQL Injection continues to be one of the most damaging security exploits in terms of personal information exposure as well as monetary loss. Injection attacks are the number one vulnerability in the most recent OWASP Top 10 report, and the number of these attacks continues to increase. Traditional defense strategies often involve static, signature-based IDS (Intrusion Detection System) rules which are mostly effective only against previously observed attacks but not unknown, or zero-day, attacks. Much current research involves the use of machine learning techniques, which are able to detect unknown attacks, but depending on the algorithm can be costly in terms of performance. In addition, most current intrusion detection strategies involve collection of traffic coming into the web application either from a network device or from the web application host, while other strategies collect data from the database server logs. In this project, we are collecting traffic from two points: the web application host, and a Datiphy appliance node located between the webapp host and the associated MySQL database server. In our analysis of these two datasets, and another dataset that is correlated between the two, we have been able to demonstrate that accuracy obtained with the correlated dataset using algorithms such as rule-based and decision tree are nearly the same as those with a neural network algorithm, but with greatly improved performance
- …