Abstract At the Lister Hill National Center for Biomedical Communications, an R&D division of the National Library of Medicine, we are engaged in an effort in content-based image retrieval (CBIR) for biomedical image databases. Toward the goal of developing a functional and significant CBIR capability, we have created a prototype system for image indexing and retrieval which operates on a collection of spine x-rays and associated health survey data. In this paper, we present our prototype system functionality, performance results, ongoing research, and outstanding technical issues. 1. Content-based image retrieval (CBIR) Our work in CBIR is the latest phase of research and development into the use of technology for the dissemination of biomedical multimedia information; this work has previously resulted in the development of a biomedical multimedia database system, a digital atlas of the cervical and lumbar spine, and an Internet archive of digitized x-ray images Indexing -the computer-assisted data reduction of images into mathematical features. For the spine images, the features capture the shape information for vertebrae. Indexing consists of the steps of segmentation of the objects of interest (the vertebrae) and extraction of feature vectors (shape representation, in a data-reduced fashion) from the raw segmentations. An implicit requirement for indexing is that the feature vectors are organized for efficient search and retrieval. A step that we also propose to carry out at indexing time is the classification of the shape data (raw segmentations or feature vectors) into categories of interest at a semantic level: namely, the categories of "normal" or "abnormal" for particular biomedical characteristics associated with osteoarthritis and degenerative disk disease, such as anterior osteophytes, disc space narrowing, subluxation, and spondylolisthesis. Finally, we propose to store any text data that may be associated with our images as additional indexing information. Retrieval -the user interaction to obtain desired images from the database. We break retrieval into the steps of user query formulation, user query feature vector extraction, query search, and similarity matching. At retrieval time, a feature vector q is derived from the user's query, and the database of feature vectors is navigated to locate feature vectors similar to q. Efficient organization of the database is required to avoid searches that are prohibitively expensive in search time. For example, if the database is organized as a tree, an efficient organization will allow a search to quickly rule out nodes too distant from q, and to localize the search to nodes that are computed to lie within an acceptable search radius, with respect to the similarity metric being used. A characteristic that we also desire for our retrieval system is th