CLAUDE.md 7.0 KB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Simple Knowledge Base is a full-stack RAG (Retrieval-Augmented Generation) Q&A system built with React 19 + NestJS. It's a monorepo with Japanese/Chinese documentation but English code.

Key Features:

  • Multi-model support (OpenAI-compatible APIs + Google Gemini native SDK)
  • Dual processing modes: Fast (Tika text-only) and High-precision (Vision pipeline)
  • User isolation with JWT authentication and per-user knowledge bases
  • Hybrid search (vector + keyword) with Elasticsearch
  • Multi-language interface (Japanese, Chinese, English)
  • Streaming responses via Server-Sent Events (SSE)

Development Setup

Prerequisites

  • Node.js 18+
  • Yarn
  • Docker & Docker Compose

Initial Setup

# Install dependencies
yarn install

# Start infrastructure services
docker-compose up -d elasticsearch tika libreoffice

# Configure environment
cp server/.env.sample server/.env
# Edit server/.env with API keys and configuration

Development Commands

# Start both frontend and backend in development mode
yarn dev

# Frontend only (port 13001)
cd web && yarn dev

# Backend only (port 3001)
cd server && yarn start:dev

# Run tests
cd server && yarn test
cd server && yarn test:e2e

# Lint and format
cd server && yarn lint
cd server && yarn format

Docker Services

  • Elasticsearch: 9200 (vector storage)
  • Apache Tika: 9998 (document text extraction)
  • LibreOffice Server: 8100 (document conversion)
  • Backend API: 3001
  • Frontend: 13001 (dev), 80/443 (production via nginx)

Architecture

Project Structure

simple-kb/
├── web/                    # React frontend (Vite)
│   ├── components/         # UI components (ChatInterface, ConfigPanel, etc.)
│   ├── contexts/          # React Context providers
│   ├── services/          # API client services
│   └── utils/             # Utility functions
├── server/                # NestJS backend
│   ├── src/
│   │   ├── ai/            # AI services (embedding, etc.)
│   │   ├── api/           # API module
│   │   ├── auth/          # JWT authentication
│   │   ├── chat/          # Chat/RAG module
│   │   ├── elasticsearch/ # Elasticsearch integration
│   │   ├── import-task/   # Import task management
│   │   ├── knowledge-base/# Knowledge base management
│   │   ├── libreoffice/   # LibreOffice integration
│   │   ├── model-config/  # Model configuration management
│   │   ├── vision/        # Vision model integration
│   │   └── vision-pipeline/# Vision pipeline orchestration
│   ├── data/              # SQLite database storage
│   ├── uploads/           # Uploaded files storage
│   └── temp/              # Temporary files
├── docs/                  # Comprehensive documentation (Japanese/Chinese)
├── nginx/                 # Nginx configuration
├── libreoffice-server/    # LibreOffice conversion service (Python/FastAPI)
└── docker-compose.yml     # Docker orchestration

Key Architectural Concepts

Dual Processing Modes:

  1. Fast Mode: Apache Tika for text-only extraction (quick, no API cost)
  2. High-Precision Mode: Vision Pipeline (LibreOffice → PDF → Images → Vision Model) for mixed image/text documents (slower, incurs API costs)

Multi-Model Support:

  • OpenAI-compatible APIs (OpenAI, DeepSeek, Claude, etc.)
  • Google Gemini native SDK
  • Configurable LLM, Embedding, and Rerank models

RAG System:

  • Hybrid search (vector + keyword) with Elasticsearch
  • Streaming responses via Server-Sent Events (SSE)
  • Source citation and similarity scoring
  • Chunk configuration (size, overlap)

Code Standards

Language Requirements

  • Code comments must be in Japanese (updated from Chinese as per user requirement)
  • Log messages must be in Japanese
  • Error messages must support internationalization to enable multi-language frontend interface
  • API response messages must support internationalization to enable multi-language frontend interface
  • Interface supports Japanese, Chinese, and English

Testing

  • Backend uses Jest for unit and e2e tests
  • Frontend currently has no test framework configured
  • Run tests: cd server && yarn test or yarn test:e2e

Code Quality

  • ESLint and Prettier configured for backend
  • Format code: cd server && yarn format
  • Lint code: cd server && yarn lint

Common Development Tasks

Adding a New API Endpoint

  1. Create controller in appropriate module under server/src/
  2. Add service methods with Japanese comments
  3. Update DTOs and validation
  4. Add tests in *.spec.ts files

Adding a New Frontend Component

  1. Create component in web/components/
  2. Add TypeScript interfaces in web/types.ts
  3. Use Tailwind CSS for styling
  4. Connect to backend services in web/services/

Debugging

  • Backend logs are in Chinese
  • Check Elasticsearch: curl http://localhost:9200/_cat/indices
  • Check Tika: curl http://localhost:9998/tika
  • Check LibreOffice: curl http://localhost:8100/health

Environment Configuration

Key environment variables (server/.env):

  • OPENAI_API_KEY: OpenAI-compatible API key
  • GEMINI_API_KEY: Google Gemini API key
  • ELASTICSEARCH_HOST: Elasticsearch URL (default: http://localhost:9200)
  • TIKA_HOST: Apache Tika URL (default: http://localhost:9998)
  • LIBREOFFICE_URL: LibreOffice server URL (default: http://localhost:8100)
  • JWT_SECRET: JWT signing secret

Deployment

Development

docker-compose up -d elasticsearch tika libreoffice
yarn dev

Production

docker-compose up -d  # Builds and starts all services

Ports in Production

  • Frontend: 80/443 (via nginx)
  • Backend API: 3001 (proxied through nginx)
  • Elasticsearch: 9200
  • Tika: 9998
  • LibreOffice: 8100

Troubleshooting

Common Issues

  1. Elasticsearch not starting: Check memory limits in docker-compose.yml
  2. File upload failures: Ensure uploads/ and temp/ directories exist with proper permissions
  3. Vision pipeline errors: Verify LibreOffice server is running and accessible
  4. API key errors: Check environment variables in server/.env

Database Management

  • SQLite database: server/data/metadata.db
  • Elasticsearch indices: Managed automatically by the application
  • To reset: Delete server/data/metadata.db and Elasticsearch data volume

Documentation

  • README.md: Project overview in Japanese
  • docs/: Comprehensive documentation (mostly Japanese/Chinese)
  • DESIGN.md: System architecture and design
  • API.md: API reference
  • DEVELOPMENT_STANDARDS.md: Mandates Japanese comments/logs

When modifying code, always add Japanese comments as required by development standards. The project has extensive existing documentation in Japanese/Chinese - refer to docs/ directory for detailed technical information.