18,183 research outputs found
Pencarian Dokumen Berbasis Web pada Drive Lokal dan Off-line Web dengan Menggunakan Metode Suffix Tree Clustering
With the richer of language and many collections of text documents, the search system is very important. But most of search engines do not give satisfaction in finding the information. Users sometimes have to do twice work with another sort of documents that have been sought in a long list. It will certainly take time. Many methods are developed for the search process, one of them uses clustering model to classify documents according to the desired results. In this thesis, the author uses the method of Suffix Tree Clustering. The results obtained are digging deeper into the information, more concise, effective and no need in long time. The author performs the process of importing file in advance, and then will be process it offline. This avoids search of structured documents based on ranked matches in a long list. This means do not treat the document as a set of words but as a string.Likewise, according to Quals Report (Journal) entitled "Fast and Intuitive Clustering of Web Documents" by Oren Zamir (1997), it states that the search data on a local drive or a weboffline that is generally only file names that can be searched. Users are required to find more details by themselves as well as the need to open the file, so that the search process is often long 2and tedious. Based on the description above, the author entitles "The Web-Based Document Search Local Drive and Off-line Web Method Using Suffix Tree Clustering".
Adaptive content mapping for internet navigation
The Internet as the biggest human library ever assembled keeps on growing. Although all kinds of information carriers (e.g. audio/video/hybrid file formats) are available, text based documents dominate. It is estimated that about 80% of all information worldwide stored electronically exists in (or can be converted into) text form. More and more, all kinds of documents are generated by means of a text processing system and are therefore available electronically. Nowadays, many printed journals are also published online and may even discontinue to appear in print form tomorrow. This development has many convincing advantages: the documents are both available faster (cf. prepress services) and cheaper, they can be searched more easily, the physical storage only needs a fraction of the space previously necessary and the medium will not age. For most people, fast and easy access is the most interesting feature of the new age; computer-aided search for specific documents or Web pages becomes the basic tool for information-oriented work. But this tool has problems. The current keyword based search machines available on the Internet are not really appropriate for such a task; either there are (way) too many documents matching the specified keywords are presented or none at all. The problem lies in the fact that it is often very difficult to choose appropriate terms describing the desired topic in the first place. This contribution discusses the current state-of-the-art techniques in content-based searching (along with common visualization/browsing approaches) and proposes a particular adaptive solution for intuitive Internet document navigation, which not only enables the user to provide full texts instead of manually selected keywords (if available), but also allows him/her to explore the whole database
PENCARIAN DOKUMEN BERBASIS WEB PADA DRIVE LOKAL DAN OFF-LINE WEB DENGAN MENGGUNAKAN METODE SUFFIX TREE CLUSTERING
ABSTRAKSemakin kaya bahasa dan banyaknya koleksi dokumen teks, sistem pencarian merupakan hal yang sangat penting. Tetapi kebanyakan mesin pencarian tidak memberikan kepuasan dalam menemukan informasi yang dicari. Terkadang Pengguna terpaksa melakukan kerja dobel dengan memilah lagi dokumen yang sudah dicari dalam daftar yang panjang. Ini tentunya akan menyita waktu. Banyak metode yang dikembangkan untuk proses pencarian,  salah satunya dengan menggunakan model clustering untuk mengelompokkan hasil pencarian dokumen sesuai dengan hasil yang diinginkan. Dalam pengerjaan skripsi ini, penulis menggunakan metode Suffix Tree Clustering. Adapun hasil yang didapatkan adalah menggali lebih dalam informasi, lebih singkat, efektif dan tidak memerlukan waktu lama. Penulis melakukan proses import dokumen terlebih dahulu, dan kemudian akan diproses secara offline. Hal ini menghindari pencarian berupa dokumen dokumen yang tersusun berdasarkan peringkat kecocokan dalam daftar yang panjang. Singkatnya tidak memperlakukan dokumen sebagai himpunan kata-kata tetapi sebagai string. Kata Kunci Text Mining, Mesin Pencari, Web-Offline, Suffix Tree Clustering, Pengelompokan dokumen, String, Import DokumenABSTRACTWith the richer of language and many collections of text documents, the search system is very important. But most of search engines do not give satisfaction in finding the information. Users sometimes have to do twice work with another sort of documents that have been sought in a long list. It will certainly take time. Many methods are developed for the search process, one of them uses clustering model to classify documents according to the desired results. In this thesis, the author uses the method of Suffix Tree Clustering. The results obtained are digging deeper into the information, more concise, effective and no need in long time. The author performs the process of importing file in advance, and then will be process it offline. This avoids search of structured documents based on ranked matches in a long list. This means do not treat the document as a set of words but as a string.Likewise, according to Quals Report (Journal) entitled "Fast and Intuitive Clustering of Web Documents" by Oren Zamir (1997), it states that the search data on a local drive or a weboffline that is generally only file names that can be searched. Users are required to find more details by themselves as well as the need to open the file, so that the search process is often long 2and tedious. Based on the description above, the author entitles "The Web-Based Document Search Local Drive and Off-line Web Method Using Suffix Tree Clustering".
Automatic document classification of biological literature
Background: Document classification is a wide-spread problem with many applications, from organizing search engine snippets to spam filtering. We previously described Textpresso, a text-mining system for biological literature, which marks up full text according to a shallow ontology that includes terms of biological interest. This project investigates document classification in the context of biological literature, making use of the Textpresso markup of a corpus of Caenorhabditis elegans literature.
Results: We present a two-step text categorization algorithm to classify a corpus of C. elegans papers. Our classification method first uses a support vector machine-trained classifier, followed by a novel, phrase-based clustering algorithm. This clustering step autonomously creates cluster labels that are descriptive and understandable by humans. This clustering engine performed better on a standard test-set (Reuters 21578) compared to previously published results (F-value of 0.55 vs. 0.49), while producing cluster descriptions that appear more useful. A web interface allows researchers to quickly navigate through the hierarchy and look for documents that belong to a specific concept.
Conclusions: We have demonstrated a simple method to classify biological documents that embodies an improvement over current methods. While the classification results are currently optimized for Caenorhabditis elegans papers by human-created rules, the classification engine can be adapted to different types of documents. We have demonstrated this by presenting a web interface that allows researchers to quickly navigate through the hierarchy and look for documents that belong to a specific concept
Datamining for Web-Enabled Electronic Business Applications
Web-Enabled Electronic Business is generating massive amount of data on customer purchases, browsing patterns, usage times and preferences at an increasing rate. Data mining techniques can be applied to all the data being collected for obtaining useful information. This chapter attempts to present issues associated with data mining for web-enabled electronic-business
ARM Wrestling with Big Data: A Study of Commodity ARM64 Server for Big Data Workloads
ARM processors have dominated the mobile device market in the last decade due
to their favorable computing to energy ratio. In this age of Cloud data centers
and Big Data analytics, the focus is increasingly on power efficient
processing, rather than just high throughput computing. ARM's first commodity
server-grade processor is the recent AMD A1100-series processor, based on a
64-bit ARM Cortex A57 architecture. In this paper, we study the performance and
energy efficiency of a server based on this ARM64 CPU, relative to a comparable
server running an AMD Opteron 3300-series x64 CPU, for Big Data workloads.
Specifically, we study these for Intel's HiBench suite of web, query and
machine learning benchmarks on Apache Hadoop v2.7 in a pseudo-distributed
setup, for data sizes up to files, web pages and tuples. Our
results show that the ARM64 server's runtime performance is comparable to the
x64 server for integer-based workloads like Sort and Hive queries, and only
lags behind for floating-point intensive benchmarks like PageRank, when they do
not exploit data parallelism adequately. We also see that the ARM64 server
takes the energy, and has an Energy Delay Product (EDP) that
is lower than the x64 server. These results hold promise for ARM64
data centers hosting Big Data workloads to reduce their operational costs,
while opening up opportunities for further analysis.Comment: Accepted for publication in the Proceedings of the 24th IEEE
International Conference on High Performance Computing, Data, and Analytics
(HiPC), 201
- …