Simple Knowledge Base

Full-stack RAG Q&A System — Retrieval-Augmented Generation powered by React 19 + NestJS

React 19 NestJS RAG System TypeScript

Multi-Model Support

OpenAI-compatible APIs (OpenAI, DeepSeek, Claude) + Google Gemini native SDK with configurable LLM, Embedding, and Rerank models.

Dual Processing Modes

Fast Mode via Apache Tika for text extraction, and High-Precision Mode via Vision Pipeline for mixed image/text documents.

Hybrid Search

Vector + keyword search with Elasticsearch, source citation, similarity scoring, and configurable chunk size & overlap.

User Isolation

JWT authentication with per-user knowledge bases. Each user has isolated data and configurations.

Streaming Responses

Real-time streaming via Server-Sent Events (SSE) for smooth, low-latency chat interactions.

Multi-Language

Interface supports Japanese, Chinese, and English with full internationalization for error and API response messages.

1 Architecture Overview

Frontend Layer
React 19 + Vite
Port 13001 (dev) / 80 (prod)
Backend Layer
NestJS API
Port 3001
JWT Auth
Chat / RAG
Vision Pipeline
AI & Data Layer
OpenAI / Gemini
LLM + Embedding
Elasticsearch
Port 9200
Apache Tika
Port 9998
LibreOffice
Port 8100
SQLite
Metadata

Dual Processing Pipeline

Fast Mode (Tika)

Upload
Tika Extract
Embed
Store

Quick text extraction, no API cost

High-Precision Mode (Vision)

Upload
LibreOffice
PDF→Image
Vision Model

Preserves layout, charts, and images

2 Project Structure

simple-kb/
├── web/ # React frontend (Vite)
│ ├── components/ # UI components (ChatInterface, ConfigPanel, etc.)
│ ├── contexts/ # React Context providers
│ ├── services/ # API client services
│ └── utils/ # Utility functions
├── server/ # NestJS backend
│ ├── src/
│ │ ├── ai/ # AI services (embedding, etc.)
│ │ ├── api/ # API module
│ │ ├── auth/ # JWT authentication
│ │ ├── chat/ # Chat / RAG module
│ │ ├── elasticsearch/ # Elasticsearch integration
│ │ ├── import-task/ # Import task management
│ │ ├── knowledge-base/ # Knowledge base management
│ │ ├── libreoffice/ # LibreOffice integration
│ │ ├── model-config/ # Model configuration management
│ │ ├── vision/ # Vision model integration
│ │ └── vision-pipeline/ # Vision pipeline orchestration
│ ├── data/ # SQLite database storage
│ ├── uploads/ # Uploaded files storage
│ └── temp/ # Temporary files
├── docs/ # Documentation (Japanese/Chinese)
├── nginx/ # Nginx configuration
├── libreoffice-server/ # LibreOffice conversion service (Python/FastAPI)
└── docker-compose.yml # Docker orchestration

3 Development Setup

Prerequisites

  • Node.js 18+
  • Yarn package manager
  • Docker & Docker Compose

Quick Start

# Install dependencies
yarn install

# Start infrastructure services
docker-compose up -d elasticsearch tika libreoffice

# Configure environment
cp server/.env.sample server/.env

# Start both frontend and backend
yarn dev

Development Commands

# Frontend only (port 13001)
cd web && yarn dev

# Backend only (port 3001)
cd server && yarn start:dev

# Run tests
cd server && yarn test
cd server && yarn test:e2e

# Lint and format
cd server && yarn lint
cd server && yarn format

4 Docker Services & Ports

ServicePortPurpose
Elasticsearch9200Vector storage & hybrid search
Apache Tika9998Document text extraction
LibreOffice Server8100Document format conversion
Backend API3001NestJS REST API
Frontend (dev)13001Vite dev server
Frontend (prod)80 / 443Nginx reverse proxy

5 Environment Configuration

Key environment variables in server/.env

VariableDefaultDescription
OPENAI_API_KEYOpenAI-compatible API key
GEMINI_API_KEYGoogle Gemini API key
ELASTICSEARCH_HOSThttp://localhost:9200Elasticsearch URL
TIKA_HOSThttp://localhost:9998Apache Tika URL
LIBREOFFICE_URLhttp://localhost:8100LibreOffice server URL
JWT_SECRETJWT signing secret

6 Code Standards

Language Requirements

  • Code comments must be in English
  • Log messages must be in English
  • Error messages must support internationalization
  • API response messages must support i18n
  • Interface supports Japanese, Chinese, and English

Code Quality

  • Backend uses Jest for unit and e2e tests
  • ESLint and Prettier configured for backend
  • Format: cd server && yarn format
  • Lint: cd server && yarn lint

7 Common Development Tasks

Adding a New API Endpoint

  1. Create controller in appropriate module under server/src/
  2. Add service methods with English comments
  3. Update DTOs and validation
  4. Add tests in *.spec.ts files

Adding a New Frontend Component

  1. Create component in web/components/
  2. Add TypeScript interfaces in web/types.ts
  3. Use Tailwind CSS for styling
  4. Connect to backend services in web/services/

8 Deployment

Development

docker-compose up -d elasticsearch tika libreoffice
yarn dev

Production

# Build and start all services
docker-compose up -d

9 Troubleshooting

Elasticsearch not starting

Check memory limits in docker-compose.yml

File upload failures

Ensure uploads/ and temp/ directories exist with proper permissions

Vision pipeline errors

Verify LibreOffice server is running and accessible at port 8100

API key errors

Check environment variables in server/.env

Database reset

Delete server/data/metadata.db and Elasticsearch data volume

10 Debugging & Health Checks

# Check Elasticsearch
curl http://localhost:9200/_cat/indices

# Check Tika
curl http://localhost:9998/tika

# Check LibreOffice
curl http://localhost:8100/health