Simple Knowledge Base - System Overview

Multi-Model Support

OpenAI-compatible APIs (OpenAI, DeepSeek, Claude) + Google Gemini native SDK with configurable LLM, Embedding, and Rerank models.

⚡

Dual Processing Modes

Fast Mode via Apache Tika for text extraction, and High-Precision Mode via Vision Pipeline for mixed image/text documents.

Hybrid Search

Vector + keyword search with Elasticsearch, source citation, similarity scoring, and configurable chunk size & overlap.

User Isolation

JWT authentication with per-user knowledge bases. Each user has isolated data and configurations.

Streaming Responses

Real-time streaming via Server-Sent Events (SSE) for smooth, low-latency chat interactions.

Multi-Language

Interface supports Japanese, Chinese, and English with full internationalization for error and API response messages.

1 Architecture Overview

Frontend Layer

React 19 + Vite
Port 13001 (dev) / 80 (prod)

↕

Backend Layer

NestJS API
Port 3001

JWT Auth

Chat / RAG

Vision Pipeline

↕

AI & Data Layer

OpenAI / Gemini
LLM + Embedding

Elasticsearch
Port 9200

Apache Tika
Port 9998

LibreOffice
Port 8100

SQLite
Metadata

Dual Processing Pipeline

Fast Mode (Tika)

Upload

→

Tika Extract

→

Embed

→

Store

Quick text extraction, no API cost

High-Precision Mode (Vision)

Upload

→

LibreOffice

→

PDF→Image

→

Vision Model

Preserves layout, charts, and images

2 Project Structure

simple-kb/
├── web/ # React frontend (Vite)
│ ├── components/ # UI components (ChatInterface, ConfigPanel, etc.)
│ ├── contexts/ # React Context providers
│ ├── services/ # API client services
│ └── utils/ # Utility functions
├── server/ # NestJS backend
│ ├── src/
│ │ ├── ai/ # AI services (embedding, etc.)
│ │ ├── api/ # API module
│ │ ├── auth/ # JWT authentication
│ │ ├── chat/ # Chat / RAG module
│ │ ├── elasticsearch/ # Elasticsearch integration
│ │ ├── import-task/ # Import task management
│ │ ├── knowledge-base/ # Knowledge base management
│ │ ├── libreoffice/ # LibreOffice integration
│ │ ├── model-config/ # Model configuration management
│ │ ├── vision/ # Vision model integration
│ │ └── vision-pipeline/ # Vision pipeline orchestration
│ ├── data/ # SQLite database storage
│ ├── uploads/ # Uploaded files storage
│ └── temp/ # Temporary files
├── docs/ # Documentation (Japanese/Chinese)
├── nginx/ # Nginx configuration
├── libreoffice-server/ # LibreOffice conversion service (Python/FastAPI)
└── docker-compose.yml # Docker orchestration

3 Development Setup

Prerequisites

Node.js 18+
Yarn package manager
Docker & Docker Compose

Quick Start

        # Install dependencies

        yarn install

        # Start infrastructure services

        docker-compose up -d elasticsearch tika libreoffice

        # Configure environment

        cp server/.env.sample server/.env

        # Start both frontend and backend

        yarn dev

Development Commands

        # Frontend only (port 13001)

        cd web && yarn dev

        # Backend only (port 3001)

        cd server && yarn start:dev

        # Run tests

        cd server && yarn test

        cd server && yarn test:e2e

        # Lint and format

        cd server && yarn lint

        cd server && yarn format

4 Docker Services & Ports

Service	Port	Purpose
Elasticsearch	9200	Vector storage & hybrid search
Apache Tika	9998	Document text extraction
LibreOffice Server	8100	Document format conversion
Backend API	3001	NestJS REST API
Frontend (dev)	13001	Vite dev server
Frontend (prod)	80 / 443	Nginx reverse proxy

5 Environment Configuration

Key environment variables in server/.env

Variable	Default	Description
OPENAI_API_KEY	—	OpenAI-compatible API key
GEMINI_API_KEY	—	Google Gemini API key
ELASTICSEARCH_HOST	http://localhost:9200	Elasticsearch URL
TIKA_HOST	http://localhost:9998	Apache Tika URL
LIBREOFFICE_URL	http://localhost:8100	LibreOffice server URL
JWT_SECRET	—	JWT signing secret

6 Code Standards

Language Requirements

Code comments must be in English
Log messages must be in English
Error messages must support internationalization
API response messages must support i18n
Interface supports Japanese, Chinese, and English

Code Quality

Backend uses Jest for unit and e2e tests
ESLint and Prettier configured for backend
Format: cd server && yarn format
Lint: cd server && yarn lint

7 Common Development Tasks

Adding a New API Endpoint

Create controller in appropriate module under server/src/
Add service methods with English comments
Update DTOs and validation
Add tests in *.spec.ts files

Adding a New Frontend Component

Create component in web/components/
Add TypeScript interfaces in web/types.ts
Use Tailwind CSS for styling
Connect to backend services in web/services/

8 Deployment

Development

            docker-compose up -d elasticsearch tika libreoffice

            yarn dev

Production

            # Build and start all services

            docker-compose up -d

9 Troubleshooting

Elasticsearch not starting

Check memory limits in docker-compose.yml

File upload failures

Ensure uploads/ and temp/ directories exist with proper permissions

Vision pipeline errors

Verify LibreOffice server is running and accessible at port 8100

API key errors

Check environment variables in server/.env

Database reset

Delete server/data/metadata.db and Elasticsearch data volume

10 Debugging & Health Checks

        # Check Elasticsearch

        curl http://localhost:9200/_cat/indices

        # Check Tika

        curl http://localhost:9998/tika

        # Check LibreOffice

        curl http://localhost:8100/health