CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Simple Knowledge Base is a full-stack RAG (Retrieval-Augmented Generation) Q&A system built with React 19 + NestJS. It's a monorepo with Japanese/Chinese documentation but English code.

Key Features:

Multi-model support (OpenAI-compatible APIs + Google Gemini native SDK)
Dual processing modes: Fast (Tika text-only) and High-precision (Vision pipeline)
User isolation with JWT authentication and per-user knowledge bases
Hybrid search (vector + keyword) with Elasticsearch
Multi-language interface (Japanese, Chinese, English)
Streaming responses via Server-Sent Events (SSE)

Development Setup

Prerequisites

Node.js 18+
Yarn
Docker & Docker Compose

Initial Setup

# Install dependencies
yarn install

# Start infrastructure services
docker-compose up -d elasticsearch tika libreoffice

# Configure environment
cp server/.env.sample server/.env
# Edit server/.env with API keys and configuration

Development Commands

# Start both frontend and backend in development mode
yarn dev

# Frontend only (port 13001)
cd web && yarn dev

# Backend only (port 3001)
cd server && yarn start:dev

# Run tests
cd server && yarn test
cd server && yarn test:e2e

# Lint and format
cd server && yarn lint
cd server && yarn format

Docker Services

Elasticsearch: 9200 (vector storage)
Apache Tika: 9998 (document text extraction)
LibreOffice Server: 8100 (document conversion)
Backend API: 3001
Frontend: 13001 (dev), 80/443 (production via nginx)

Architecture

Project Structure

simple-kb/
├── web/                    # React frontend (Vite)
│   ├── components/         # UI components (ChatInterface, ConfigPanel, etc.)
│   ├── contexts/          # React Context providers
│   ├── services/          # API client services
│   └── utils/             # Utility functions
├── server/                # NestJS backend
│   ├── src/
│   │   ├── ai/            # AI services (embedding, etc.)
│   │   ├── api/           # API module
│   │   ├── auth/          # JWT authentication
│   │   ├── chat/          # Chat/RAG module
│   │   ├── elasticsearch/ # Elasticsearch integration
│   │   ├── import-task/   # Import task management
│   │   ├── knowledge-base/# Knowledge base management
│   │   ├── libreoffice/   # LibreOffice integration
│   │   ├── model-config/  # Model configuration management
│   │   ├── vision/        # Vision model integration
│   │   └── vision-pipeline/# Vision pipeline orchestration
│   ├── data/              # SQLite database storage
│   ├── uploads/           # Uploaded files storage
│   └── temp/              # Temporary files
├── docs/                  # Comprehensive documentation (Japanese/Chinese)
├── nginx/                 # Nginx configuration
├── libreoffice-server/    # LibreOffice conversion service (Python/FastAPI)
└── docker-compose.yml     # Docker orchestration

Key Architectural Concepts

Dual Processing Modes:

Fast Mode: Apache Tika for text-only extraction (quick, no API cost)
High-Precision Mode: Vision Pipeline (LibreOffice → PDF → Images → Vision Model) for mixed image/text documents (slower, incurs API costs)

Multi-Model Support:

OpenAI-compatible APIs (OpenAI, DeepSeek, Claude, etc.)
Google Gemini native SDK
Configurable LLM, Embedding, and Rerank models

RAG System:

Hybrid search (vector + keyword) with Elasticsearch
Streaming responses via Server-Sent Events (SSE)
Source citation and similarity scoring
Chunk configuration (size, overlap)

Code Standards

Language Requirements

Code comments must be in English
Log messages must be in English
Error messages must support internationalization to enable multi-language frontend interface
API response messages must support internationalization to enable multi-language frontend interface
Interface supports Japanese, Chinese, and English

Testing

Backend uses Jest for unit and e2e tests
Frontend currently has no test framework configured
Run tests: cd server && yarn test or yarn test:e2e

Code Quality

ESLint and Prettier configured for backend
Format code: cd server && yarn format
Lint code: cd server && yarn lint

Common Development Tasks

Adding a New API Endpoint

Create controller in appropriate module under server/src/
Add service methods with Japanese comments
Update DTOs and validation
Add tests in *.spec.ts files

Adding a New Frontend Component

Create component in web/components/
Add TypeScript interfaces in web/types.ts
Use Tailwind CSS for styling
Connect to backend services in web/services/

Debugging

Backend logs are in Chinese
Check Elasticsearch: curl http://localhost:9200/_cat/indices
Check Tika: curl http://localhost:9998/tika
Check LibreOffice: curl http://localhost:8100/health

Environment Configuration

Key environment variables (server/.env):

OPENAI_API_KEY: OpenAI-compatible API key
GEMINI_API_KEY: Google Gemini API key
ELASTICSEARCH_HOST: Elasticsearch URL (default: http://localhost:9200)
TIKA_HOST: Apache Tika URL (default: http://localhost:9998)
LIBREOFFICE_URL: LibreOffice server URL (default: http://localhost:8100)
JWT_SECRET: JWT signing secret

Deployment

Development

docker-compose up -d elasticsearch tika libreoffice
yarn dev

Production

docker-compose up -d  # Builds and starts all services

Ports in Production

Frontend: 80/443 (via nginx)
Backend API: 3001 (proxied through nginx)
Elasticsearch: 9200
Tika: 9998
LibreOffice: 8100

Troubleshooting

Common Issues

Elasticsearch not starting: Check memory limits in docker-compose.yml
File upload failures: Ensure uploads/ and temp/ directories exist with proper permissions
Vision pipeline errors: Verify LibreOffice server is running and accessible
API key errors: Check environment variables in server/.env

Database Management

SQLite database: server/data/metadata.db
Elasticsearch indices: Managed automatically by the application
To reset: Delete server/data/metadata.db and Elasticsearch data volume

Documentation

README.md: Project overview in Japanese
docs/: Comprehensive documentation (mostly Japanese/Chinese)
DESIGN.md: System architecture and design
API.md: API reference
DEVELOPMENT_STANDARDS.md: Mandates English comments/logs and internationalized messages

When modifying code, always add English comments and logs as required by development standards. Error and UI messages must be properly internationalized. The project has extensive existing documentation in Japanese/Chinese - refer to docs/ directory for detailed technical information.

CLAUDE.md 7.0 KB Lịch sử Raw