Search CORE

1 research outputs found

A more efficient document retrieval method for TEXPROS

Author: Dong Yin
Publication venue: Digital Commons @ NJIT
Publication date: 31/01/2001
Field of study

Document processing is a critical element of office automation. Through document classification, extraction and filing, documents are automatically placed into a knowledge base according to certain rules. Document retrieval is a process to get a document back according to a user\u27s requirements and to show the results to the user. Hence, a good user-interface and an efficient retrieval algorithm become core parts of document retrieval. Unlike previous browsers that have been proposed for this purpose, this dissertation develops a new browser that has a user interface with more tools, and one that has a more efficient retrieval algorithm that can deal with a wide variety of retrieval situations. In this dissertation, from the view of an interface, the new browser provides more functions such as zoom in and zoom out , (i.e. automatic scaling of the portion of a graph that is of interest to a user), and help. These functions give users an easier way to view a large graph in one window and provide users with help during the retrieval process. The new browser also provides an algorithm that makes retrieval more efficient by using a reusable base. The Reusable Base is used to hold information that is most related to the user previous desires and the information stored in the Reusable Base is more easily used to form the OP-Net than that in the System Catalog. Hence, it eliminates the need to go to the System Catalog to find the results. This speeds up the retrieval significantly -at least two times faster than without the Reusable Base. Further, the new browser provides information about the folder organization and the document type hierarchy that is in addition to the OP-Net. If users know the type of documents they want, or which folder they are interested in, they can go to the particular document type or the particular folder directly

Digital Commons @ New Jersey Institute of Technology (NJIT)