18,183 research outputs found

    Pencarian Dokumen Berbasis Web pada Drive Lokal dan Off-line Web dengan Menggunakan Metode Suffix Tree Clustering

    Get PDF
    With the richer of language and many collections of text documents, the search system is very important. But most of search engines do not give satisfaction in finding the information. Users sometimes have to do twice work with another sort of documents that have been sought in a long list. It will certainly take time. Many methods are developed for the search process, one of them uses clustering model to classify documents according to the desired results. In this thesis, the author uses the method of Suffix Tree Clustering. The results obtained are digging deeper into the information, more concise, effective and no need in long time. The author performs the process of importing file in advance, and then will be process it offline. This avoids search of structured documents based on ranked matches in a long list. This means do not treat the document as a set of words but as a string.Likewise, according to Quals Report (Journal) entitled "Fast and Intuitive Clustering of Web Documents" by Oren Zamir (1997), it states that the search data on a local drive or a weboffline that is generally only file names that can be searched. Users are required to find more details by themselves as well as the need to open the file, so that the search process is often long 2and tedious. Based on the description above, the author entitles "The Web-Based Document Search Local Drive and Off-line Web Method Using Suffix Tree Clustering".

    Adaptive content mapping for internet navigation

    Get PDF
    The Internet as the biggest human library ever assembled keeps on growing. Although all kinds of information carriers (e.g. audio/video/hybrid file formats) are available, text based documents dominate. It is estimated that about 80% of all information worldwide stored electronically exists in (or can be converted into) text form. More and more, all kinds of documents are generated by means of a text processing system and are therefore available electronically. Nowadays, many printed journals are also published online and may even discontinue to appear in print form tomorrow. This development has many convincing advantages: the documents are both available faster (cf. prepress services) and cheaper, they can be searched more easily, the physical storage only needs a fraction of the space previously necessary and the medium will not age. For most people, fast and easy access is the most interesting feature of the new age; computer-aided search for specific documents or Web pages becomes the basic tool for information-oriented work. But this tool has problems. The current keyword based search machines available on the Internet are not really appropriate for such a task; either there are (way) too many documents matching the specified keywords are presented or none at all. The problem lies in the fact that it is often very difficult to choose appropriate terms describing the desired topic in the first place. This contribution discusses the current state-of-the-art techniques in content-based searching (along with common visualization/browsing approaches) and proposes a particular adaptive solution for intuitive Internet document navigation, which not only enables the user to provide full texts instead of manually selected keywords (if available), but also allows him/her to explore the whole database

    PENCARIAN DOKUMEN BERBASIS WEB PADA DRIVE LOKAL DAN OFF-LINE WEB DENGAN MENGGUNAKAN METODE SUFFIX TREE CLUSTERING

    Get PDF
    ABSTRAKSemakin  kaya  bahasa  dan banyaknya  koleksi  dokumen  teks,  sistem  pencarian merupakan  hal yang  sangat  penting.  Tetapi  kebanyakan  mesin  pencarian  tidak  memberikan  kepuasan  dalam menemukan  informasi  yang  dicari.  Terkadang  Pengguna  terpaksa  melakukan  kerja  dobel dengan memilah lagi dokumen yang sudah dicari dalam daftar yang panjang. Ini tentunya akan menyita  waktu.  Banyak  metode  yang  dikembangkan  untuk  proses  pencarian,   salah  satunya dengan menggunakan model clustering untuk mengelompokkan hasil pencarian dokumen sesuai dengan  hasil  yang  diinginkan.  Dalam  pengerjaan  skripsi  ini,  penulis  menggunakan  metode Suffix Tree Clustering. Adapun hasil yang didapatkan adalah menggali lebih dalam informasi, lebih  singkat,  efektif  dan  tidak  memerlukan  waktu  lama.  Penulis  melakukan  proses  import dokumen  terlebih  dahulu,  dan  kemudian  akan  diproses  secara offline.  Hal  ini  menghindari pencarian  berupa  dokumen  dokumen  yang  tersusun  berdasarkan  peringkat  kecocokan  dalam daftar  yang  panjang.  Singkatnya  tidak  memperlakukan  dokumen  sebagai  himpunan  kata-kata tetapi sebagai string. Kata  Kunci  Text  Mining,  Mesin  Pencari,  Web-Offline,  Suffix  Tree  Clustering, Pengelompokan dokumen, String,  Import DokumenABSTRACTWith the richer of language and many collections of text documents, the search system is very important. But most of search engines do not give satisfaction in finding the information. Users sometimes have to do twice work with another sort of documents that have been sought in a long list. It will certainly take time. Many methods are developed for the search process, one of them uses clustering model to classify documents according to the desired results. In this thesis, the author uses the method of Suffix Tree Clustering. The results obtained are digging deeper into the information, more concise, effective and no need in long time. The author performs the process of importing file in advance, and then will be process it offline. This avoids search of structured  documents  based  on  ranked  matches  in  a  long  list.  This  means  do  not  treat  the document as a set of words but as a string.Likewise, according to Quals Report (Journal) entitled "Fast and Intuitive Clustering of Web Documents" by  Oren  Zamir  (1997),  it  states that the  search  data on  a  local  drive or  a  weboffline that is generally only file names that can be searched. Users are required to find more details by themselves as well as the need to open the file, so that the search process is often long 2and  tedious.  Based  on  the  description  above,  the  author  entitles  "The  Web-Based  Document Search Local Drive and Off-line Web Method Using Suffix Tree Clustering".

    Automatic document classification of biological literature

    Get PDF
    Background: Document classification is a wide-spread problem with many applications, from organizing search engine snippets to spam filtering. We previously described Textpresso, a text-mining system for biological literature, which marks up full text according to a shallow ontology that includes terms of biological interest. This project investigates document classification in the context of biological literature, making use of the Textpresso markup of a corpus of Caenorhabditis elegans literature. Results: We present a two-step text categorization algorithm to classify a corpus of C. elegans papers. Our classification method first uses a support vector machine-trained classifier, followed by a novel, phrase-based clustering algorithm. This clustering step autonomously creates cluster labels that are descriptive and understandable by humans. This clustering engine performed better on a standard test-set (Reuters 21578) compared to previously published results (F-value of 0.55 vs. 0.49), while producing cluster descriptions that appear more useful. A web interface allows researchers to quickly navigate through the hierarchy and look for documents that belong to a specific concept. Conclusions: We have demonstrated a simple method to classify biological documents that embodies an improvement over current methods. While the classification results are currently optimized for Caenorhabditis elegans papers by human-created rules, the classification engine can be adapted to different types of documents. We have demonstrated this by presenting a web interface that allows researchers to quickly navigate through the hierarchy and look for documents that belong to a specific concept

    Datamining for Web-Enabled Electronic Business Applications

    Get PDF
    Web-Enabled Electronic Business is generating massive amount of data on customer purchases, browsing patterns, usage times and preferences at an increasing rate. Data mining techniques can be applied to all the data being collected for obtaining useful information. This chapter attempts to present issues associated with data mining for web-enabled electronic-business

    ARM Wrestling with Big Data: A Study of Commodity ARM64 Server for Big Data Workloads

    Full text link
    ARM processors have dominated the mobile device market in the last decade due to their favorable computing to energy ratio. In this age of Cloud data centers and Big Data analytics, the focus is increasingly on power efficient processing, rather than just high throughput computing. ARM's first commodity server-grade processor is the recent AMD A1100-series processor, based on a 64-bit ARM Cortex A57 architecture. In this paper, we study the performance and energy efficiency of a server based on this ARM64 CPU, relative to a comparable server running an AMD Opteron 3300-series x64 CPU, for Big Data workloads. Specifically, we study these for Intel's HiBench suite of web, query and machine learning benchmarks on Apache Hadoop v2.7 in a pseudo-distributed setup, for data sizes up to 20GB20GB files, 5M5M web pages and 500M500M tuples. Our results show that the ARM64 server's runtime performance is comparable to the x64 server for integer-based workloads like Sort and Hive queries, and only lags behind for floating-point intensive benchmarks like PageRank, when they do not exploit data parallelism adequately. We also see that the ARM64 server takes 13rd\frac{1}{3}^{rd} the energy, and has an Energy Delay Product (EDP) that is 5071%50-71\% lower than the x64 server. These results hold promise for ARM64 data centers hosting Big Data workloads to reduce their operational costs, while opening up opportunities for further analysis.Comment: Accepted for publication in the Proceedings of the 24th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC), 201
    corecore