shang-chunyu 3 tygodni temu
rodzic
commit
1426e2cf34

+ 68 - 0
docs/design/feat-auto-title-generation.md

@@ -0,0 +1,68 @@
+# Feature Design: Automatic Title Generation (feat-auto-title-generation)
+
+## 1. Overview
+This feature automatically generates meaningful titles for uploaded documents and chat sessions using AI. It aims to replace generic filenames and "New Conversation" labels with content-aware titles, improving user experience and organization.
+
+## 2. Requirements
+
+### 2.1 Document Title Generation
+- **Trigger**: Automatically triggered after text extraction (Fast or Precise mode).
+- **Process**:
+    1. Extract a sample of the document content (first 2,000 - 3,000 characters).
+    2. Send the content to the default LLM with a specific generation prompt.
+    3. Update the `KnowledgeBase` record with the generated title.
+- **Rules**:
+    - The title should be concise (less than 50 characters).
+    - It should be in the user's preferred language (defaulting to the detected document language if possible).
+    - Output should be "raw" (no preamble like "The title is...").
+
+### 2.2 Chat Title Generation
+- **Trigger**: Triggered after the first user message and its corresponding assistant response are recorded.
+- **Process**:
+    1. Collect the initial message pair.
+    2. Send the pair to the default LLM with a generation prompt.
+    3. Update the `SearchHistory` record's `title` field.
+- **Rules**: Same as document titles.
+
+## 3. Technical Design
+
+### 3.1 Data Model Changes
+- **KnowledgeBase Entity**: Add a `title` field (nullable, optional). If empty, fallback to `originalName`.
+- **SearchHistory Entity**: No changes required (has `title`).
+
+### 3.2 Backend Implementation
+
+#### KnowledgeBaseService
+- Add `generateTitle(kbId: string)` method.
+- Hook into `processFile` after `updateStatus(kbId, FileStatus.EXTRACTED)`.
+
+#### ChatService / SearchHistoryService
+- Add logic to check if the session title is still the default (usually the first message snippet) and trigger `generateTitle(historyId: string)` after the first assistant response.
+
+#### Prompt Design
+- **Document Prompt**:
+  ```text
+  You are a document analyzer. Read the provided text and generate a concise, professional title (max 50 chars). 
+  Return ONLY the title.
+  Language: {userLanguage}
+  Text: {contentSample}
+  ```
+- **Chat Prompt**:
+  ```text
+  Based on the following conversation snippet, generate a short, descriptive title (max 50 chars) that summarizes the topic.
+  Return ONLY the title.
+  Language: {userLanguage}
+  Snippet:
+  User: {userMessage}
+  AI: {aiResponse}
+  ```
+
+## 4. Verification Plan
+
+### Automated Tests
+- Integration tests in `KnowledgeBaseService` to verify the title field is updated after processing.
+- Mock LLM responses to ensure the title update logic works.
+
+### Manual Verification
+- Upload various files (PDF, Word, TXT) and verify the displayed title in the knowledge base list.
+- Start a new chat, send a message, and check the sidebar for the updated session title.

+ 59 - 0
docs/design/feat-cross-doc-comparison.md

@@ -0,0 +1,59 @@
+# Design: Cross-Document Comparison (Agentic Workflow)
+
+## 1. Background & Problem
+Users often need to compare multiple documents (e.g., "Compare the financial reports of Q1 and Q2" or "Differences between Product A and Product B specs").
+Standard RAG retrieves chunks based on semantic similarity to the query. While "Multi-Query" helps, standard RAG might:
+1.  Retrieve too many chunks from one document and miss the other.
+2.  Fail to align comparable attributes (e.g., comparing "revenue" in Doc A with "profit" in Doc B).
+3.  Produce a generic text answer instead of a structured comparison.
+
+## 2. Solution: Agentic Comparison Workflow
+We will implement a specialized workflow (or "Light Agent") that:
+1.  **Analyzes the Request**: Identifies the subjects to compare (e.g., "Q1 Report", "Q2 Report") and the dimensions (e.g., "Revenue", "Risks").
+2.  **Targeted Retrieval**:
+    -   Explicitly filters/searches for Doc A.
+    -   Explicitly filters/searches for Doc B.
+3.  **Structured Synthesis**: Generates the answer, potentially forcing a Markdown Table format for clarity.
+
+## 3. Technical Architecture
+
+### 3.1 Backend (`ComparisonService` or extension to `RagService`)
+-   **Intent Detection**: Modify `ChatService` or `RagService` to detect comparison intent (can utilize LLM or simple heuristics + keywords).
+-   **Planning**: If comparison is detected:
+    1.  Identify Target Files: Resolve file names/IDs from the query (e.g., "Q1" -> matches file "2024_Q1_Report.pdf").
+    2.  Dimension Extraction: What to compare? (e.g., "summary", "key metrics").
+    3.  Execution:
+        -   Run Search on File A with query "key metrics".
+        -   Run Search on File B with query "key metrics".
+        -   Combine context.
+-   **Prompting**: Use a prompt optimized for comparison (e.g., "Generate a comparison table...").
+
+### 3.2 Frontend (`ChatInterface`)
+-   **UI Trigger**: (Optional) specific "Compare" button, or just natural language.
+-   **Visuals**: Render the response standard markdown (which supports tables).
+-   **Source Attribution**: Ensure citations map back to the correct respective documents.
+
+## 4. Implementation Steps
+
+1.  **Intent & Entity Extraction (Simple Version)**:
+    -   In `RagService`, add a step `detectComparisonIntent(query)`.
+    -   Return `subjects: string[]` (approximate filenames) and `dimensions: string`.
+    
+2.  **Targeted Search**:
+    -   Use `elasticsearchService` to search *specifically* within the resolved file IDs (if we can map names to IDs).
+    -   Fall back to broad search if file mapping fails.
+
+3.  **Comparison Prompt**:
+    -   Update `rag.service.ts` to use a `comparisonPromise` if intent is detected.
+
+## 5. Risks & limitations
+-   **File Name Matching**: Mapping user spoken "Q1" to "2024_Q1_Report_Final.pdf" is hard without fuzzy matching or LLM resolution.
+    -   *Mitigation*: Use a lightweight LLM call or fuzzy search on the file list to resolve IDs.
+-   **Latency**: Two searches + entity resolution might add latency.
+    -   *Mitigation*: Run searches in parallel.
+
+## 6. MVP Scope
+-   Automated detection of "Compare A and B".
+-   Attempt to identify if A and B refer to specific files in the selected knowledge base.
+-   If identified, restrict search scopes accordingly (or boost them).
+-   Generate a table response.

+ 37 - 0
docs/design/feat-highlight-jump.md

@@ -0,0 +1,37 @@
+# Feature Design: Highlight Jump (Precise Sourcing)
+
+## Problem Statement
+Currently, when a user clicks a citation in the chat, they can see the source text in a drawer or open the PDF. However, the PDF opens to the first page (or just the file) without pinpointing the exact location of the referenced information. This forces the user to manually search for the content.
+
+## Proposed Solution
+Implement "Highlight Jump" functionality:
+1.  **Page Jump**: When opening a citation, the PDF viewer should immediately jump to the specific page number containing the chunk.
+2.  **Text Highlighting**: The specific text segment used in the citation should be highlighted visually on the PDF page.
+
+## Technical Implementation
+
+### Frontend
+
+#### 1. `PDFPreview.tsx`
+-   **Enable Text Layer**: Currently, `PDFPreview` renders only to a `<canvas>`. We must enable `pdf.js` **Text Layer** rendering on top of the canvas. This allows text selection and searching.
+-   **New Props**:
+    -   `initialPage`: Already exists? Need to verify it works reliably.
+    -   `highlightText`: A string (the chunk content) to search for and highlight.
+-   **Highlight Logic**:
+    -   On page load, if `highlightText` is provided, search for this text in the Text Layer.
+    -   Apply a visual highlight (e.g., yellow background) to the matching DOM elements in the text layer.
+    -   Scroll the highlighted element into view.
+
+#### 2. `SourcePreviewDrawer.tsx`
+-   Pass the `pageNumber` and `content` (as `highlightText`) to the `onOpenFile` callback.
+-   Update the "Open File" button to trigger this with the correct metadata.
+
+#### 3. `ChatInterface.tsx` / `ChatView.tsx`
+-   Ensure the state that manages the open PDF preview receives the `pageNumber` and `highlightText` from the source.
+
+### Backend
+-   **No changes required** if `RagSearchResult` already contains `pageNumber`. (Verified: It does).
+
+## Limitations
+-   **OCR Files**: If the file was indexed via OCR (images), `pdf.js` might not extract a text layer that matches exactly what Tika extracted, or might have no text layer. In this case, we fallback to just Page Jump.
+-   **Text Mismatch**: If the chunk text is slightly different from the PDF text layer (due to cleaning/normalization during indexing), exact string matching might fail. We will try to match a substring or a fuzzy match if possible, but exact match of the first ~50 chars is a good starting point.

+ 52 - 0
docs/design/feat-query-expansion-hyde.md

@@ -0,0 +1,52 @@
+# Feature Design: Query Expansion & HyDE Integration
+
+This document outlines the design for improving search relevance in Lumina using Query Expansion (Multi-Query) and Hypothetical Document Embeddings (HyDE).
+
+## Problem Statement
+The current search implementation relies on the user's original query. Simple vector search can sometimes fail to match relevant documents due to:
+1.  **Keyword Mismatch**: The user might use different terminology than the document.
+2.  **Semantic Gap**: The query might be too brief to capture the full semantic context required for a good vector match.
+
+## Proposed Solution
+
+### 1. Query Expansion (Multi-Query)
+We will use an LLM to generate 3 unique variations of the user's query. This helps to:
+- Capture different facets of the user's intent.
+- Increase the probability of hitting relevant segments in the knowledge base.
+
+### 2. HyDE (Hypothetical Document Embeddings)
+We will use an LLM to generate a brief "hypothetical" answer to the user's query.
+- Instead of embedding the question, we embed the hypothetical answer.
+- This often results in better vector matches because we are comparing "answer-like" vectors with "document-like" segments.
+
+## Technical Implementation
+
+### Backend Changes
+
+#### `RagService` (server/src/rag/rag.service.ts)
+- **New Methods**:
+    - `expandQuery(query: string, userId: string): Promise<string[]>`: Generates 3 variations of the query.
+    - `generateHyDE(query: string, userId: string): Promise<string>`: Generates a hypothetical document.
+- **Update `searchKnowledge`**:
+    - Add `enableQueryExpansion` and `enableHyDE` parameters.
+    - Implement logic to handle multiple search requests (concurrently) and deduplicate results.
+
+#### `ChatService` (server/src/chat/chat.service.ts)
+- Pass the new search options from user settings or request parameters.
+
+### Frontend Changes
+
+#### `types.ts` (web/types.ts)
+- Update `AppSettings` to include `enableQueryExpansion` and `enableHyDE`.
+
+#### `SettingsDrawer.tsx`
+- Add UI toggles for these new search enhancement features.
+
+## Verification Plan
+
+### Backend Logs
+- Verify that LLM calls for expansion and HyDE are being made.
+- Log the generated queries and hypothetical documents for debugging.
+
+### Manual Verification
+- Compare search results with and without these features enabled for complex queries.

+ 14 - 0
web/components/views/SettingsView.tsx

@@ -42,6 +42,7 @@ export const SettingsView: React.FC<SettingsViewProps> = ({
         modelId: 'llama3',
         name: '',
         dimensions: 1536,
+        apiKey: '',
         maxInputTokens: 8191,
         maxBatchSize: 2048
     });
@@ -488,6 +489,19 @@ export const SettingsView: React.FC<SettingsViewProps> = ({
                         <input className="w-full text-sm border rounded-md px-3 py-2 font-mono" value={modelFormData.baseUrl} onChange={e => setModelFormData({ ...modelFormData, baseUrl: e.target.value })} disabled={isLoading} autoComplete="off" />
                     </div>
 
+                    <div>
+                        <label className="block text-xs font-medium text-slate-500 mb-1">{t('mmFormApiKey')}</label>
+                        <input
+                            type="password"
+                            className="w-full text-sm border rounded-md px-3 py-2 font-mono"
+                            value={modelFormData.apiKey || ''}
+                            onChange={e => setModelFormData({ ...modelFormData, apiKey: e.target.value })}
+                            disabled={isLoading}
+                            placeholder={t('mmFormApiKeyPlaceholder')}
+                            autoComplete="off"
+                        />
+                    </div>
+
                     {modelFormData.type === ModelType.EMBEDDING && (
                         <div className="grid grid-cols-2 gap-4">
                             <div>

+ 6 - 0
web/utils/translations.ts

@@ -114,6 +114,8 @@ export const translations = {
     apiKeyValidationFailed: "API Key 验证失败",
     keepOriginalKey: "留空保持原 API Key,输入新值则替换",
     leaveEmptyNoChange: "留空不修改",
+    mmFormApiKey: "API Key",
+    mmFormApiKeyPlaceholder: "请输入 API Key",
 
     // 更多组件缺失的翻译
     reconfigureFile: "重新配置文件",
@@ -714,6 +716,8 @@ export const translations = {
     apiKeyValidationFailed: "API Key validation failed",
     keepOriginalKey: "Leave empty to keep original API Key, input new value to replace",
     leaveEmptyNoChange: "Leave empty to keep unchanged",
+    mmFormApiKey: "API Key",
+    mmFormApiKeyPlaceholder: "Enter API Key",
 
     // More missing translations
     reconfigureFile: "Reconfigure File",
@@ -1256,6 +1260,8 @@ export const translations = {
     apiKeyValidationFailed: "API Key検証に失敗しました",
     keepOriginalKey: "空のままにすると元のAPI Keyを保持、新しい値を入力すると置換",
     leaveEmptyNoChange: "空のままで変更なし",
+    mmFormApiKey: "API Key",
+    mmFormApiKeyPlaceholder: "API Key を入力してください",
 
     // さらに缺失している翻訳
     reconfigureFile: "ファイルの再設定",