# Design: Cross-Document Comparison (Agentic Workflow) ## 1. Background & Problem Users often need to compare multiple documents (e.g., "Compare the financial reports of Q1 and Q2" or "Differences between Product A and Product B specs"). Standard RAG retrieves chunks based on semantic similarity to the query. While "Multi-Query" helps, standard RAG might: 1. Retrieve too many chunks from one document and miss the other. 2. Fail to align comparable attributes (e.g., comparing "revenue" in Doc A with "profit" in Doc B). 3. Produce a generic text answer instead of a structured comparison. ## 2. Solution: Agentic Comparison Workflow We will implement a specialized workflow (or "Light Agent") that: 1. **Analyzes the Request**: Identifies the subjects to compare (e.g., "Q1 Report", "Q2 Report") and the dimensions (e.g., "Revenue", "Risks"). 2. **Targeted Retrieval**: - Explicitly filters/searches for Doc A. - Explicitly filters/searches for Doc B. 3. **Structured Synthesis**: Generates the answer, potentially forcing a Markdown Table format for clarity. ## 3. Technical Architecture ### 3.1 Backend (`ComparisonService` or extension to `RagService`) - **Intent Detection**: Modify `ChatService` or `RagService` to detect comparison intent (can utilize LLM or simple heuristics + keywords). - **Planning**: If comparison is detected: 1. Identify Target Files: Resolve file names/IDs from the query (e.g., "Q1" -> matches file "2024_Q1_Report.pdf"). 2. Dimension Extraction: What to compare? (e.g., "summary", "key metrics"). 3. Execution: - Run Search on File A with query "key metrics". - Run Search on File B with query "key metrics". - Combine context. - **Prompting**: Use a prompt optimized for comparison (e.g., "Generate a comparison table..."). ### 3.2 Frontend (`ChatInterface`) - **UI Trigger**: (Optional) specific "Compare" button, or just natural language. - **Visuals**: Render the response standard markdown (which supports tables). - **Source Attribution**: Ensure citations map back to the correct respective documents. ## 4. Implementation Steps 1. **Intent & Entity Extraction (Simple Version)**: - In `RagService`, add a step `detectComparisonIntent(query)`. - Return `subjects: string[]` (approximate filenames) and `dimensions: string`. 2. **Targeted Search**: - Use `elasticsearchService` to search *specifically* within the resolved file IDs (if we can map names to IDs). - Fall back to broad search if file mapping fails. 3. **Comparison Prompt**: - Update `rag.service.ts` to use a `comparisonPromise` if intent is detected. ## 5. Risks & limitations - **File Name Matching**: Mapping user spoken "Q1" to "2024_Q1_Report_Final.pdf" is hard without fuzzy matching or LLM resolution. - *Mitigation*: Use a lightweight LLM call or fuzzy search on the file list to resolve IDs. - **Latency**: Two searches + entity resolution might add latency. - *Mitigation*: Run searches in parallel. ## 6. MVP Scope - Automated detection of "Compare A and B". - Attempt to identify if A and B refer to specific files in the selected knowledge base. - If identified, restrict search scopes accordingly (or boost them). - Generate a table response.