README.md 3.6 KB

Simple Knowledge Base - Backend Service (Server)

A RAG (Retrieval-Augmented Generation) knowledge base backend system built with the NestJS framework. Provides file parsing, vector indexing, hybrid search, and multi-model management features.

🌟 Key Features

  • Intelligent Document Processing: Integrated with Apache Tika to support text extraction from various formats including PDF, Word, Markdown, and TXT.
  • Efficient Vector Search: Uses Elasticsearch as the vector database. Supports hybrid search combining KNN vector search and full-text search.
  • Flexible RAG Engine: Built on LangChain. Supports customization of chunking rules and reranking.
  • Multi-Model Provider: Supports dynamic connection to OpenAI, Google Gemini, and locally deployed LLM models.
  • Secure Management: Built-in JWT authentication and user permission management.

🛠️ Tech Stack

  • Framework: NestJS (TypeScript)
  • Database: SQLite (TypeORM)
  • Search Engine: Elasticsearch 8.x/9.x
  • AI Framework: LangChain
  • Libraries: RxJS, Class-Validator

📋 Prerequisites

Before running the project, ensure you have the following environment set up:

  • Node.js (v18 or higher recommended)
  • Yarn
  • Docker & Docker Compose (for infrastructure)

🚀 Quick Start

1. Start Infrastructure

Use the docker-compose.yml file in the project root (simple-kb/) to quickly start Elasticsearch and Tika.

# Run from project root directory
docker-compose up -d

After successful startup:

  • Elasticsearch: Listens on port 19200 (maps container's 9200 port)
  • Tika: Listens on port 9998

2. Install Dependencies

Navigate to the server directory and install packages:

cd server
yarn install

3. Environment Configuration

The project uses environment variables for configuration. Make sure the settings are correct (especially Elasticsearch address):

# Database path
DATABASE_PATH=server/data/metadata.db

# JWT secret
JWT_SECRET=your_secure_secret

# Elasticsearch settings (match docker-compose ports)
ELASTICSEARCH_HOST=http://localhost:19200
ELASTICSEARCH_INDEX=knowledge_base

# Tika settings
TIKA_HOST=http://localhost:9998

# File upload storage path
UPLOAD_FILE_PATH=./uploads

4. Start Services

# Development mode (recommended, with hot reload)
yarn run start:dev

# Build and run in production mode
yarn build
yarn run start:prod

The backend service runs on http://localhost:13000 by default, with API prefix /api.

🧪 Testing

# Unit tests
yarn run test

# E2E tests
yarn run test:e2e

⚠️ Notes and Tips

  1. Database Initialization:

    • On first run, TypeORM automatically creates metadata.db (or configured DB) under the server/data/ directory.
    • In development mode, with synchronize: true, table structures are auto-synced.
  2. Elasticsearch Connection:

    • If you encounter a Connection refused error, check if Docker containers are running properly (docker ps).
    • On service startup, an index named knowledge_base is automatically detected and created.
  3. Default Account:

    • If you reset the database, either register a new user or refer to existing admin data. It's recommended to use the frontend registration feature to create the first user.
  4. File Parsing:

    • When uploading large files, parsing by Tika may take a few seconds. Please wait while checking the frontend processing status.