Semantic search, a process aimed at delivering highly relevant search results
by comprehending the searcher's intent and the contextual meaning of terms
within a searchable dataspace, plays a pivotal role in information retrieval.
In this paper, we investigate the application of large language models to
enhance semantic search capabilities, specifically tailored for the domain of
Jupyter Notebooks. Our objective is to retrieve generated outputs, such as
figures or tables, associated functions and methods, and other pertinent
information.
We demonstrate a semantic search framework that achieves a comprehensive
semantic understanding of the entire notebook's contents, enabling it to
effectively handle various types of user queries. Key components of this
framework include:
1). A data preprocessor is designed to handle diverse types of cells within
Jupyter Notebooks, encompassing both markdown and code cells. 2). An innovative
methodology is devised to address token size limitations that arise with
code-type cells. We implement a finer-grained approach to data input,
transitioning from the cell level to the function level, effectively resolving
these issues