Simple Knowledge Base - Backend Service (Server)

A RAG (Retrieval-Augmented Generation) knowledge base backend system built with the NestJS framework. Provides file parsing, vector indexing, hybrid search, and multi-model management features.

🌟 Key Features

Intelligent Document Processing: Integrated with Apache Tika to support text extraction from various formats including PDF, Word, Markdown, and TXT.
Efficient Vector Search: Uses Elasticsearch as the vector database. Supports hybrid search combining KNN vector search and full-text search.
Flexible RAG Engine: Built on LangChain. Supports customization of chunking rules and reranking.
Multi-Model Provider: Supports dynamic connection to OpenAI, Google Gemini, and locally deployed LLM models.
Secure Management: Built-in JWT authentication and user permission management.

🛠️ Tech Stack

Framework: NestJS (TypeScript)
Database: SQLite (TypeORM)
Search Engine: Elasticsearch 8.x/9.x
AI Framework: LangChain
Libraries: RxJS, Class-Validator

📋 Prerequisites

Before running the project, ensure you have the following environment set up:

Node.js (v18 or higher recommended)
Yarn
Docker & Docker Compose (for infrastructure)

🚀 Quick Start

1. Start Infrastructure

Use the docker-compose.yml file in the project root (simple-kb/) to quickly start Elasticsearch and Tika.

# Run from project root directory
docker-compose up -d

After successful startup:

Elasticsearch: Listens on port 19200 (maps container's 9200 port)
Tika: Listens on port 9998

2. Install Dependencies

Navigate to the server directory and install packages:

cd server
yarn install

3. Environment Configuration

The project uses environment variables for configuration. Make sure the settings are correct (especially Elasticsearch address):

# Database path
DATABASE_PATH=server/data/metadata.db

# JWT secret
JWT_SECRET=your_secure_secret

# Elasticsearch settings (match docker-compose ports)
ELASTICSEARCH_HOST=http://localhost:19200
ELASTICSEARCH_INDEX=knowledge_base

# Tika settings
TIKA_HOST=http://localhost:9998

# File upload storage path
UPLOAD_FILE_PATH=./uploads

4. Start Services

# Development mode (recommended, with hot reload)
yarn run start:dev

# Build and run in production mode
yarn build
yarn run start:prod

The backend service runs on http://localhost:13000 by default, with API prefix /api.

🧪 Testing

# Unit tests
yarn run test

# E2E tests
yarn run test:e2e

⚠️ Notes and Tips

Database Initialization:
- On first run, TypeORM automatically creates metadata.db (or configured DB) under the server/data/ directory.
- In development mode, with synchronize: true, table structures are auto-synced.
Elasticsearch Connection:
- If you encounter a Connection refused error, check if Docker containers are running properly (docker ps).
- On service startup, an index named knowledge_base is automatically detected and created.
Default Account:
- If you reset the database, either register a new user or refer to existing admin data. It's recommended to use the frontend registration feature to create the first user.
File Parsing:
- When uploading large files, parsing by Tika may take a few seconds. Please wait while checking the frontend processing status.

README.md 3.6 KB 文件历史 原始文件