Search CORE

4,937 research outputs found

Performance comparison of query evaluation techniques in parallel text retrieval systems

Author: Tokuç A Aylin
Publication venue: Bilkent University
Publication date: 01/01/2008
Field of study

Ankara : The Department of Computer Engineering and the Institute of Engineering and Science of Bilkent University, 2008.Thesis (Master's) -- Bilkent University, 2008.Includes bibliographical references leaves 47-52.Today’s state-of-the-art search engines utilize the inverted index data structure for fast text retrieval on large document collections. To parallelize the retrieval process, the inverted index should be distributed among multiple index servers. Generally the distribution of the inverted index is done in either a term-based or a document-based fashion. The performances of both schemes depend on the total number of disk accesses and the total volume of communication in the system. The classical approach for both distributions is to use the Central Broker Query Evaluation Scheme (CB) for parallel text retrieval. It is known that in this approach the central broker is heavily loaded and becomes a bottleneck. Recently, an alternative query evaluation technique, named Pipelined Query Evaluation Scheme (PPL), has been proposed to alleviate this problem by performing the merge operation on the index servers. In this study, we analyze the scalability and relative performances of the CB and PPL under various query loads to report the benefits and drawbacks of each method.Tokuç, A AylinM.S

Bilkent University Institutional Repository

MongoDB Performance In The Cloud

Author: Matei Tudor
Publication venue: SJSU ScholarWorks
Publication date: 01/04/2013
Field of study

Web applications are growing at a staggering rate every day. As web applications keep getting more complex, their data storage requirements tend to grow exponentially. Databases play an important role in the way web applications store their information. Mongodb is a document store database that does not have strict schemas that RDBMs require and can grow horizontally without performance degradation. MongoDB brings possibilities for different storage scenarios and allow the programmers to use the database as a storage that fits their needs, not the other way around. Scaling MongoDB horizontally requires tens to hundreds of servers, making it very difficult to afford this kind of setup on dedicated hardware. By moving the database into the cloud, this opens up a possibility for low cost virtual machine instances at reasonable prices. There are many cloud services to choose from and without testing performance on each one, there is very little information out there. This paper provides benchmarks on the performance of MongoDB in the cloud

SJSU ScholarWorks

COSPO/CENDI Industry Day Conference

Author
Publication venue
Publication date
Field of study

The conference's objective was to provide a forum where government information managers and industry information technology experts could have an open exchange and discuss their respective needs and compare them to the available, or soon to be available, solutions. Technical summaries and points of contact are provided for the following sessions: secure products, protocols, and encryption; information providers; electronic document management and publishing; information indexing, discovery, and retrieval (IIDR); automated language translators; IIDR - natural language capabilities; IIDR - advanced technologies; IIDR - distributed heterogeneous and large database support; and communications - speed, bandwidth, and wireless

NASA Technical Reports Server

Improving an Open Source Geocoding Server

Author: Garcia Paje Victor
Publication venue: Lunds universitet/Institutionen för elektro- och informationsteknik
Publication date: 01/01/2015
Field of study

A common problem in geocoding is that the postal addresses as requested by the user differ from the addresses as described in the database. The online, open source geocoder called Nominatim is one of the most used geocoders nowadays. However, this geocoder lacks the interactivity that most of the online geocoders already offer. The Nominatim geocoder provides no feedback to the user while typing addresses. Also, the geocoder cannot deal with any misspelling errors introduced by the user in the requested address. This thesis is about extending the functionality of the Nominatim geocoder to provide fuzzy search and autocomplete features. In this work I propose a new index and search strategy for the OpenStreetMap reference dataset. Also, I extend the search algorithm to geocode new address types such as street intersections. Both the original Nominatim geocoder and the proposed solution are compared using metrics such as the precision of the results, match rate and keystrokes saved by the autocomplete feature. The test addresses used in this work are a subset selected among the Swedish addresses available in the OpenStreetMap data set. The results show that the proposed geocoder performs better when compared to the original Nominatim geocoder. In the proposed geocoder, the users get address suggestions as they type, adding interactivity to the original geocoder. Also, the proposed geocoder is able to find the right address in the presence of errors in the user query with a match rate of 98%.The demand of geospatial information is increasing during the last years. There are more and more mobile applications and services that require from the users to enter some information about where they are, or the address of the place they want to find for example. The systems that convert postal addresses or place descriptions into coordinates are called geocoders. How good or bad a geocoder is not only depends on the information the geocoder contains, but also on how easy is for the users to find the desired addresses. There are many well-known web sites that we use in our everyday life to find the location of an address. For example sites like Google Maps, Bing Maps or Yahoo Maps are accessed by millions of users every day to use such services. Among the main features of the mentioned geocoders are the ability to predict the address the user is writing in the search box, and sometimes even to correct any misspellings introduced by the user. To make it more complicated, the predictions and error corrections these systems perform are done in real time. The owners of these address search engines usually impose some restrictions on the number of addresses a user is allowed to search monthly, above which the user needs to pay a fee in order to keep using the system. This limit is usually high enough for the end user, but it might not be enough for the software developers that want to use geospatial data in their products. There is a free alternative to the address search engines mentioned above called Nominatim. Nominatim is an open source project whose purpose is to search addresses among the OpenStreetMap dataset. OpenStreetMap is a collaborative project that tries to map places in the real world into coordinates. The main drawback of Nominatim is that the usability is not as good as the competitors. Nominatim is unable to find addresses that are not correctly spelled, neither predicts the user needs. In order for this address search engine to be among the most used the prediction and error correction features need to be added. In this thesis work I extend the search algorithms of Nominatim to add the functionality mentioned above. The address search engine proposed in this thesis offers a free and open source alternative to users and systems that require access to geospatial data without restrictions

Development and Performance Evaluation of a Real-Time Web Search Engine

Author: Watters Burr S., IV
Publication venue: UNF Digital Commons
Publication date: 01/01/2004
Field of study

As the World Wide Web continues to grow, the tools to retrieve the information must develop in terms of locating web pages, categorizing content, and retrieving quality pages. Web search engines have enhanced the online experience by making pages easier to find. Search engines have made a science of cataloging page content, but the data can age, becoming outdated and irrelevant. By searching pages in real time in a localized area of the web, information that is retrieved is guaranteed to be available at the time of the search. The real-time search engines intriguing premise provides an overwhelming challenge. Since the web is searched in real time, the engine\u27s execution will take longer than traditional search engines. The challenge is to determine what factors can enhance the performance of the real-time search engine. This research takes a look at three components: traversal methodologies for searching the web, utilizing concurrently executing spiders, and implementing a caching resource to reduce the execution time of the real-time search engine. These components represent some basic methodologies to improve performance. By determining which implementations provide the best response, a better and faster real-time search engine can become a useful searching tool for Internet users

UNF Digital Commons

Functional requirements document for the Earth Observing System Data and Information System (EOSDIS) Scientific Computing Facilities (SCF) of the NASA/MSFC Earth Science and Applications Division, 1992

Author: Botts Michael E.
Parker John V.
Phillips Ron J.
Wright Patrick D.
Publication venue
Publication date
Field of study

Five scientists at MSFC/ESAD have EOS SCF investigator status. Each SCF has unique tasks which require the establishment of a computing facility dedicated to accomplishing those tasks. A SCF Working Group was established at ESAD with the charter of defining the computing requirements of the individual SCFs and recommending options for meeting these requirements. The primary goal of the working group was to determine which computing needs can be satisfied using either shared resources or separate but compatible resources, and which needs require unique individual resources. The requirements investigated included CPU-intensive vector and scalar processing, visualization, data storage, connectivity, and I/O peripherals. A review of computer industry directions and a market survey of computing hardware provided information regarding important industry standards and candidate computing platforms. It was determined that the total SCF computing requirements might be most effectively met using a hierarchy consisting of shared and individual resources. This hierarchy is composed of five major system types: (1) a supercomputer class vector processor; (2) a high-end scalar multiprocessor workstation; (3) a file server; (4) a few medium- to high-end visualization workstations; and (5) several low- to medium-range personal graphics workstations. Specific recommendations for meeting the needs of each of these types are presented

NASA Technical Reports Server