# Feature Design: Highlight Jump (Precise Sourcing) ## Problem Statement Currently, when a user clicks a citation in the chat, they can see the source text in a drawer or open the PDF. However, the PDF opens to the first page (or just the file) without pinpointing the exact location of the referenced information. This forces the user to manually search for the content. ## Proposed Solution Implement "Highlight Jump" functionality: 1. **Page Jump**: When opening a citation, the PDF viewer should immediately jump to the specific page number containing the chunk. 2. **Text Highlighting**: The specific text segment used in the citation should be highlighted visually on the PDF page. ## Technical Implementation ### Frontend #### 1. `PDFPreview.tsx` - **Enable Text Layer**: Currently, `PDFPreview` renders only to a ``. We must enable `pdf.js` **Text Layer** rendering on top of the canvas. This allows text selection and searching. - **New Props**: - `initialPage`: Already exists? Need to verify it works reliably. - `highlightText`: A string (the chunk content) to search for and highlight. - **Highlight Logic**: - On page load, if `highlightText` is provided, search for this text in the Text Layer. - Apply a visual highlight (e.g., yellow background) to the matching DOM elements in the text layer. - Scroll the highlighted element into view. #### 2. `SourcePreviewDrawer.tsx` - Pass the `pageNumber` and `content` (as `highlightText`) to the `onOpenFile` callback. - Update the "Open File" button to trigger this with the correct metadata. #### 3. `ChatInterface.tsx` / `ChatView.tsx` - Ensure the state that manages the open PDF preview receives the `pageNumber` and `highlightText` from the source. ### Backend - **No changes required** if `RagSearchResult` already contains `pageNumber`. (Verified: It does). ## Limitations - **OCR Files**: If the file was indexed via OCR (images), `pdf.js` might not extract a text layer that matches exactly what Tika extracted, or might have no text layer. In this case, we fallback to just Page Jump. - **Text Mismatch**: If the chunk text is slightly different from the PDF text layer (due to cleaning/normalization during indexing), exact string matching might fail. We will try to match a substring or a fuzzy match if possible, but exact match of the first ~50 chars is a good starting point.