# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project Overview Simple Knowledge Base is a full-stack RAG (Retrieval-Augmented Generation) Q&A system built with React 19 + NestJS. It's a monorepo with Japanese/Chinese documentation but English code. **Key Features:** - Multi-model support (OpenAI-compatible APIs + Google Gemini native SDK) - Dual processing modes: Fast (Tika text-only) and High-precision (Vision pipeline) - User isolation with JWT authentication and per-user knowledge bases - Hybrid search (vector + keyword) with Elasticsearch - Multi-language interface (Japanese, Chinese, English) - Streaming responses via Server-Sent Events (SSE) ## Development Setup ### Prerequisites - Node.js 18+ - Yarn - Docker & Docker Compose ### Initial Setup ```bash # Install dependencies yarn install # Start infrastructure services docker-compose up -d elasticsearch tika libreoffice # Configure environment cp server/.env.sample server/.env # Edit server/.env with API keys and configuration ``` ### Development Commands ```bash # Start both frontend and backend in development mode yarn dev # Frontend only (port 13001) cd web && yarn dev # Backend only (port 3001) cd server && yarn start:dev # Run tests cd server && yarn test cd server && yarn test:e2e # Lint and format cd server && yarn lint cd server && yarn format ``` ### Docker Services - **Elasticsearch**: 9200 (vector storage) - **Apache Tika**: 9998 (document text extraction) - **LibreOffice Server**: 8100 (document conversion) - **Backend API**: 3001 - **Frontend**: 13001 (dev), 80/443 (production via nginx) ## Architecture ### Project Structure ``` simple-kb/ ├── web/ # React frontend (Vite) │ ├── components/ # UI components (ChatInterface, ConfigPanel, etc.) │ ├── contexts/ # React Context providers │ ├── services/ # API client services │ └── utils/ # Utility functions ├── server/ # NestJS backend │ ├── src/ │ │ ├── ai/ # AI services (embedding, etc.) │ │ ├── api/ # API module │ │ ├── auth/ # JWT authentication │ │ ├── chat/ # Chat/RAG module │ │ ├── elasticsearch/ # Elasticsearch integration │ │ ├── import-task/ # Import task management │ │ ├── knowledge-base/# Knowledge base management │ │ ├── libreoffice/ # LibreOffice integration │ │ ├── model-config/ # Model configuration management │ │ ├── vision/ # Vision model integration │ │ └── vision-pipeline/# Vision pipeline orchestration │ ├── data/ # SQLite database storage │ ├── uploads/ # Uploaded files storage │ └── temp/ # Temporary files ├── docs/ # Comprehensive documentation (Japanese/Chinese) ├── nginx/ # Nginx configuration ├── libreoffice-server/ # LibreOffice conversion service (Python/FastAPI) └── docker-compose.yml # Docker orchestration ``` ### Key Architectural Concepts **Dual Processing Modes:** 1. **Fast Mode**: Apache Tika for text-only extraction (quick, no API cost) 2. **High-Precision Mode**: Vision Pipeline (LibreOffice → PDF → Images → Vision Model) for mixed image/text documents (slower, incurs API costs) **Multi-Model Support:** - OpenAI-compatible APIs (OpenAI, DeepSeek, Claude, etc.) - Google Gemini native SDK - Configurable LLM, Embedding, and Rerank models **RAG System:** - Hybrid search (vector + keyword) with Elasticsearch - Streaming responses via Server-Sent Events (SSE) - Source citation and similarity scoring - Chunk configuration (size, overlap) ## Code Standards ### Language Requirements - **Code comments must be in English** - **Log messages must be in English** - **Error messages must support internationalization** to enable multi-language frontend interface - **API response messages must support internationalization** to enable multi-language frontend interface - Interface supports Japanese, Chinese, and English ### Testing - Backend uses Jest for unit and e2e tests - Frontend currently has no test framework configured - Run tests: `cd server && yarn test` or `yarn test:e2e` ### Code Quality - ESLint and Prettier configured for backend - Format code: `cd server && yarn format` - Lint code: `cd server && yarn lint` ## Common Development Tasks ### Adding a New API Endpoint 1. Create controller in appropriate module under `server/src/` 2. Add service methods with English comments 3. Update DTOs and validation 4. Add tests in `*.spec.ts` files ### Adding a New Frontend Component 1. Create component in `web/components/` 2. Add TypeScript interfaces in `web/types.ts` 3. Use Tailwind CSS for styling 4. Connect to backend services in `web/services/` ### Debugging - Backend logs are in Chinese - Check Elasticsearch: `curl http://localhost:9200/_cat/indices` - Check Tika: `curl http://localhost:9998/tika` - Check LibreOffice: `curl http://localhost:8100/health` ## Environment Configuration Key environment variables (`server/.env`): - `OPENAI_API_KEY`: OpenAI-compatible API key - `GEMINI_API_KEY`: Google Gemini API key - `ELASTICSEARCH_HOST`: Elasticsearch URL (default: http://localhost:9200) - `TIKA_HOST`: Apache Tika URL (default: http://localhost:9998) - `LIBREOFFICE_URL`: LibreOffice server URL (default: http://localhost:8100) - `JWT_SECRET`: JWT signing secret ## Deployment ### Development ```bash docker-compose up -d elasticsearch tika libreoffice yarn dev ``` ### Production ```bash docker-compose up -d # Builds and starts all services ``` ### Ports in Production - Frontend: 80/443 (via nginx) - Backend API: 3001 (proxied through nginx) - Elasticsearch: 9200 - Tika: 9998 - LibreOffice: 8100 ## Troubleshooting ### Common Issues 1. **Elasticsearch not starting**: Check memory limits in docker-compose.yml 2. **File upload failures**: Ensure `uploads/` and `temp/` directories exist with proper permissions 3. **Vision pipeline errors**: Verify LibreOffice server is running and accessible 4. **API key errors**: Check environment variables in `server/.env` ### Database Management - SQLite database: `server/data/metadata.db` - Elasticsearch indices: Managed automatically by the application - To reset: Delete `server/data/metadata.db` and Elasticsearch data volume ## Documentation - **README.md**: Project overview in Japanese - **docs/**: Comprehensive documentation (mostly Japanese/Chinese) - **DESIGN.md**: System architecture and design - **API.md**: API reference - **DEVELOPMENT_STANDARDS.md**: Mandates English comments/logs and internationalized messages When modifying code, always add English comments and logs as required by development standards. Error and UI messages must be properly internationalized. The project has extensive existing documentation in Japanese/Chinese - refer to `docs/` directory for detailed technical information.