# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project Overview Lumina is a full-stack RAG (Retrieval-Augmented Generation) Q&A system built with React 19 + NestJS. It's a monorepo with Japanese/Chinese documentation but English code. **Key Features:** - Multi-model support (OpenAI-compatible APIs + Google Gemini native SDK) - Dual processing modes: Fast (Tika text-only) and High-precision (Vision pipeline) - User isolation with JWT authentication and per-user knowledge bases - Hybrid search (vector + keyword) with Elasticsearch - Multi-language interface (Japanese, Chinese, English) - Streaming responses via Server-Sent Events (SSE) ## Development Setup ### Prerequisites - Node.js 18+ - Yarn - Docker & Docker Compose ### Initial Setup ```bash # Install dependencies yarn install # Start infrastructure services docker-compose up -d elasticsearch tika libreoffice # Configure environment cp server/.env.sample server/.env # Edit server/.env with API keys and configuration ``` ### Development Commands ```bash # Start both frontend and backend in development mode yarn dev # Frontend only (port 13001) cd web && yarn dev # Backend only (port 3001) cd server && yarn start:dev # Run tests cd server && yarn test cd server && yarn test:e2e # Lint and format cd server && yarn lint cd server && yarn format ``` ### Docker Services - **Elasticsearch**: 9200 (vector storage) - **Apache Tika**: 9998 (document text extraction) - **LibreOffice Server**: 8100 (document conversion) - **Backend API**: 3001 - **Frontend**: 13001 (dev), 80/443 (production via nginx) ## Architecture ### Project Structure ``` lumina/ ├── web/ # React frontend (Vite) │ ├── components/ # UI components (ChatInterface, ConfigPanel, etc.) │ ├── contexts/ # React Context providers │ ├── services/ # API client services │ └── utils/ # Utility functions ├── server/ # NestJS backend │ ├── src/ │ │ ├── ai/ # AI services (embedding, etc.) │ │ ├── api/ # API module │ │ ├── auth/ # JWT authentication │ │ ├── chat/ # Chat/RAG module │ │ ├── elasticsearch/ # Elasticsearch integration │ │ ├── import-task/ # Import task management │ │ ├── knowledge-base/# Knowledge base management │ │ ├── libreoffice/ # LibreOffice integration │ │ ├── model-config/ # Model configuration management │ │ ├── vision/ # Vision model integration │ │ └── vision-pipeline/# Vision pipeline orchestration │ ├── data/ # SQLite database storage │ ├── uploads/ # Uploaded files storage │ └── temp/ # Temporary files ├── docs/ # Comprehensive documentation (Japanese/Chinese) ├── nginx/ # Nginx configuration ├── libreoffice-server/ # LibreOffice conversion service (Python/FastAPI) └── docker-compose.yml # Docker orchestration ``` ### Key Architectural Concepts **Dual Processing Modes:** 1. **Fast Mode**: Apache Tika for text-only extraction (quick, no API cost) 2. **High-Precision Mode**: Vision Pipeline (LibreOffice → PDF → Images → Vision Model) for mixed image/text documents (slower, incurs API costs) **Multi-Model Support:** - OpenAI-compatible APIs (OpenAI, DeepSeek, Claude, etc.) - Google Gemini native SDK - Configurable LLM, Embedding, and Rerank models **RAG System:** - Hybrid search (vector + keyword) with Elasticsearch - Streaming responses via Server-Sent Events (SSE) - Source citation and similarity scoring - Chunk configuration (size, overlap) ## Development Process (Documentation-Driven) Every feature development MUST follow this sequence: 1. **Design Phase**: Create a design document in `docs/design/feat-[feature-name].md` before writing any code. Outline the requirements, architecture, and UI/UX changes. 2. **Implementation Phase**: Develop the feature following the design and code standards. 3. **Verification Phase**: Create a test case document in `docs/testing/feat-[feature-name].md` documenting test results and edge cases. For complex features, refer to `docs/FEAT_ANALYSIS.md` for long-term roadmap and architectural guidance. ## Code Standards ### Naming Conventions - **Project Name**: Lumina - **Monorepo Packages**: Use `lumina-` prefix for all sub-packages (e.g., `lumina-web`, `lumina-server`). ### Language Requirements - **Code comments must be in Japanese** - **Log messages must be in Japanese** - **User Interface strings must follow i18n standards** (JA, ZH, EN). - **API error/response messages must support i18n** via translation keys. ### Testing - Backend uses Jest for unit and e2e tests - Frontend currently has no test framework configured - Run tests: `cd server && yarn test` or `yarn test:e2e` ### Code Quality - ESLint and Prettier configured for backend - Format code: `cd server && yarn format` - Lint code: `cd server && yarn lint` ## Common Development Tasks ### Adding a New API Endpoint 1. Create design doc in `docs/design/` 2. Create controller in appropriate module under `server/src/` 3. Add service methods with Japanese comments 4. Update DTOs and validation 5. Add tests in `*.spec.ts` files 6. Create test case doc in `docs/testing/` ### Adding a New Frontend Component 1. Create design doc in `docs/design/` 2. Create component in `web/components/` 3. Add TypeScript interfaces in `web/types.ts` 4. Use Tailwind CSS for styling 5. Connect to backend services in `web/services/` 6. Create test case doc in `docs/testing/` ### Debugging - Backend logs are in Chinese - Check Elasticsearch: `curl http://localhost:9200/_cat/indices` - Check Tika: `curl http://localhost:9998/tika` - Check LibreOffice: `curl http://localhost:8100/health` ## Environment Configuration Key environment variables (`server/.env`): - `OPENAI_API_KEY`: OpenAI-compatible API key - `GEMINI_API_KEY`: Google Gemini API key - `ELASTICSEARCH_HOST`: Elasticsearch URL (default: http://localhost:9200) - `TIKA_HOST`: Apache Tika URL (default: http://localhost:9998) - `LIBREOFFICE_URL`: LibreOffice server URL (default: http://localhost:8100) - `JWT_SECRET`: JWT signing secret ## Deployment ### Development ```bash docker-compose up -d elasticsearch tika libreoffice yarn dev ``` ### Production ```bash docker-compose up -d # Builds and starts all services ``` ### Ports in Production - Frontend: 80/443 (via nginx) - Backend API: 3001 (proxied through nginx) - Elasticsearch: 9200 - Tika: 9998 - LibreOffice: 8100 ## Troubleshooting ### Common Issues 1. **Elasticsearch not starting**: Check memory limits in docker-compose.yml 2. **File upload failures**: Ensure `uploads/` and `temp/` directories exist with proper permissions 3. **Vision pipeline errors**: Verify LibreOffice server is running and accessible 4. **API key errors**: Check environment variables in `server/.env` ### Database Management - SQLite database: `server/data/metadata.db` - Elasticsearch indices: Managed automatically by the application - To reset: Delete `server/data/metadata.db` and Elasticsearch data volume ## Documentation - **README.md**: Project overview in Japanese - **docs/**: Comprehensive documentation (mostly Japanese/Chinese) - **DESIGN.md**: System architecture and design - **API.md**: API reference - **DEVELOPMENT_STANDARDS.md**: Mandates Japanese comments/logs When modifying code, always add Japanese comments as required by development standards. The project has extensive existing documentation in Japanese/Chinese - refer to `docs/` directory for detailed technical information.