Search CORE

18,183 research outputs found

Pencarian Dokumen Berbasis Web pada Drive Lokal dan Off-line Web dengan Menggunakan Metode Suffix Tree Clustering

Author: Wiji Setyaningsih Zaenal Fanani
Publication venue: Universitas Kanjuruhan Malang
Publication date: 01/01/2014
Field of study

With the richer of language and many collections of text documents, the search system is very important. But most of search engines do not give satisfaction in finding the information. Users sometimes have to do twice work with another sort of documents that have been sought in a long list. It will certainly take time. Many methods are developed for the search process, one of them uses clustering model to classify documents according to the desired results. In this thesis, the author uses the method of Suffix Tree Clustering. The results obtained are digging deeper into the information, more concise, effective and no need in long time. The author performs the process of importing file in advance, and then will be process it offline. This avoids search of structured documents based on ranked matches in a long list. This means do not treat the document as a set of words but as a string.Likewise, according to Quals Report (Journal) entitled "Fast and Intuitive Clustering of Web Documents" by Oren Zamir (1997), it states that the search data on a local drive or a weboffline that is generally only file names that can be searched. Users are required to find more details by themselves as well as the need to open the file, so that the search process is often long 2and tedious. Based on the description above, the author entitles "The Web-Based Document Search Local Drive and Off-line Web Method Using Suffix Tree Clustering".

Neliti

Universitas Kanjuruhan Malang: Journal Unikama

Adaptive content mapping for internet navigation

Author: Brause Rüdiger W.
Ueberall Markus
Publication venue
Publication date: 08/09/2010
Field of study

The Internet as the biggest human library ever assembled keeps on growing. Although all kinds of information carriers (e.g. audio/video/hybrid file formats) are available, text based documents dominate. It is estimated that about 80% of all information worldwide stored electronically exists in (or can be converted into) text form. More and more, all kinds of documents are generated by means of a text processing system and are therefore available electronically. Nowadays, many printed journals are also published online and may even discontinue to appear in print form tomorrow. This development has many convincing advantages: the documents are both available faster (cf. prepress services) and cheaper, they can be searched more easily, the physical storage only needs a fraction of the space previously necessary and the medium will not age. For most people, fast and easy access is the most interesting feature of the new age; computer-aided search for specific documents or Web pages becomes the basic tool for information-oriented work. But this tool has problems. The current keyword based search machines available on the Internet are not really appropriate for such a task; either there are (way) too many documents matching the specified keywords are presented or none at all. The problem lies in the fact that it is often very difficult to choose appropriate terms describing the desired topic in the first place. This contribution discusses the current state-of-the-art techniques in content-based searching (along with common visualization/browsing approaches) and proposes a particular adaptive solution for intuitive Internet document navigation, which not only enables the user to provide full texts instead of manually selected keywords (if available), but also allows him/her to explore the whole database

Hochschulschriftenserver - Universität Frankfurt am Main

PENCARIAN DOKUMEN BERBASIS WEB PADA DRIVE LOKAL DAN OFF-LINE WEB DENGAN MENGGUNAKAN METODE SUFFIX TREE CLUSTERING

Author: Wiji Setyaningsih Zaenal Fanani
Publication venue: Jurnal Fakultas Teknologi Informasi
Publication date: 19/03/2015
Field of study

ABSTRAKSemakinÂ kayaÂ bahasaÂ dan banyaknyaÂ koleksiÂ dokumenÂ teks,Â sistemÂ pencarian merupakanÂ hal yangÂ sangatÂ penting.Â TetapiÂ kebanyakanÂ mesinÂ pencarianÂ tidakÂ memberikanÂ kepuasanÂ dalam menemukanÂ informasiÂ yangÂ dicari.Â TerkadangÂ PenggunaÂ terpaksaÂ melakukanÂ kerjaÂ dobel dengan memilah lagi dokumen yang sudah dicari dalam daftar yang panjang. Ini tentunya akan menyitaÂ waktu.Â BanyakÂ metodeÂ yangÂ dikembangkanÂ untukÂ prosesÂ pencarian,Â Â salahÂ satunya dengan menggunakan model clustering untuk mengelompokkan hasil pencarian dokumen sesuai denganÂ hasilÂ yangÂ diinginkan.Â DalamÂ pengerjaanÂ skripsiÂ ini,Â penulisÂ menggunakanÂ metode Suffix Tree Clustering. Adapun hasil yang didapatkan adalah menggali lebih dalam informasi, lebihÂ singkat,Â efektifÂ danÂ tidakÂ memerlukanÂ waktuÂ lama.Â PenulisÂ melakukanÂ prosesÂ import dokumenÂ terlebihÂ dahulu,Â danÂ kemudianÂ akanÂ diprosesÂ secara offline.Â HalÂ iniÂ menghindari pencarianÂ berupaÂ dokumenÂ dokumenÂ yangÂ tersusunÂ berdasarkanÂ peringkatÂ kecocokanÂ dalam daftarÂ yangÂ panjang.Â SingkatnyaÂ tidakÂ memperlakukanÂ dokumenÂ sebagaiÂ himpunanÂ kata-kata tetapi sebagai string. KataÂ KunciÂ TextÂ Mining,Â MesinÂ Pencari,Â Web-Offline,Â SuffixÂ TreeÂ Clustering, Pengelompokan dokumen, String,Â Import DokumenABSTRACTWith the richer of language and many collections of text documents, the search system is very important. But most of search engines do not give satisfaction in finding the information. Users sometimes have to do twice work with another sort of documents that have been sought in a long list. It will certainly take time. Many methods are developed for the search process, one of them uses clustering model to classify documents according to the desired results. In this thesis, the author uses the method of Suffix Tree Clustering. The results obtained are digging deeper into the information, more concise, effective and no need in long time. The author performs the process of importing file in advance, and then will be process it offline. This avoids search of structuredÂ documentsÂ basedÂ onÂ rankedÂ matchesÂ inÂ aÂ longÂ list.Â ThisÂ meansÂ doÂ notÂ treatÂ the document as a set of words but as a string.Likewise, according to Quals Report (Journal) entitled "Fast and Intuitive Clustering of Web Documents" byÂ OrenÂ ZamirÂ (1997),Â itÂ states that theÂ searchÂ data onÂ aÂ localÂ drive orÂ aÂ weboffline that is generally only file names that can be searched. Users are required to find more details by themselves as well as the need to open the file, so that the search process is often long 2andÂ tedious.Â BasedÂ onÂ theÂ descriptionÂ above,Â theÂ authorÂ entitlesÂ "TheÂ Web-BasedÂ Document Search Local Drive and Off-line Web Method Using Suffix Tree Clustering".

Universitas Kanjuruhan Malang: Journal Unikama

Automatic document classification of biological literature

Author: Chen David
Muller Hans-Michael
Sternberg Paul W.
Publication venue
Publication date: 01/08/2006
Field of study

Background: Document classification is a wide-spread problem with many applications, from organizing search engine snippets to spam filtering. We previously described Textpresso, a text-mining system for biological literature, which marks up full text according to a shallow ontology that includes terms of biological interest. This project investigates document classification in the context of biological literature, making use of the Textpresso markup of a corpus of Caenorhabditis elegans literature. Results: We present a two-step text categorization algorithm to classify a corpus of C. elegans papers. Our classification method first uses a support vector machine-trained classifier, followed by a novel, phrase-based clustering algorithm. This clustering step autonomously creates cluster labels that are descriptive and understandable by humans. This clustering engine performed better on a standard test-set (Reuters 21578) compared to previously published results (F-value of 0.55 vs. 0.49), while producing cluster descriptions that appear more useful. A web interface allows researchers to quickly navigate through the hierarchy and look for documents that belong to a specific concept. Conclusions: We have demonstrated a simple method to classify biological documents that embodies an improvement over current methods. While the classification results are currently optimized for Caenorhabditis elegans papers by human-created rules, the classification engine can be adapted to different types of documents. We have demonstrated this by presenting a web interface that allows researchers to quickly navigate through the hierarchy and look for documents that belong to a specific concept

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Caltech Authors

Datamining for Web-Enabled Electronic Business Applications

Author: Nayak Richi
Publication venue: Idea Group
Publication date: 01/01/2003
Field of study

Web-Enabled Electronic Business is generating massive amount of data on customer purchases, browsing patterns, usage times and preferences at an increasing rate. Data mining techniques can be applied to all the data being collected for obtaining useful information. This chapter attempts to present issues associated with data mining for web-enabled electronic-business

Queensland University of Technology ePrints Archive

ARM Wrestling with Big Data: A Study of Commodity ARM64 Server for Big Data Workloads

Author: Kalyanasundaram Jayanth
Simmhan Yogesh
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 08/09/2017
Field of study

ARM processors have dominated the mobile device market in the last decade due to their favorable computing to energy ratio. In this age of Cloud data centers and Big Data analytics, the focus is increasingly on power efficient processing, rather than just high throughput computing. ARM's first commodity server-grade processor is the recent AMD A1100-series processor, based on a 64-bit ARM Cortex A57 architecture. In this paper, we study the performance and energy efficiency of a server based on this ARM64 CPU, relative to a comparable server running an AMD Opteron 3300-series x64 CPU, for Big Data workloads. Specifically, we study these for Intel's HiBench suite of web, query and machine learning benchmarks on Apache Hadoop v2.7 in a pseudo-distributed setup, for data sizes up to

20GB

files,

5M

web pages and

500M

tuples. Our results show that the ARM64 server's runtime performance is comparable to the x64 server for integer-based workloads like Sort and Hive queries, and only lags behind for floating-point intensive benchmarks like PageRank, when they do not exploit data parallelism adequately. We also see that the ARM64 server takes

\frac{1}{3}^{rd}

the energy, and has an Energy Delay Product (EDP) that is

50-71\%

lower than the x64 server. These results hold promise for ARM64 data centers hosting Big Data workloads to reduce their operational costs, while opening up opportunities for further analysis.Comment: Accepted for publication in the Proceedings of the 24th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC), 201

arXiv.org e-Print Archive

Crossref

Open Access Repository of IISc Research Publications