anhuiqiang 23432c6788 clean install bugfix 7 小時之前
..
scripts 558d96e5ec bugfix 2 周之前
src 23432c6788 clean install bugfix 7 小時之前
.dockerignore 50df79fabb feature 2 周之前
.env.sample 664e531d4c 国际化 1 周之前
.prettierrc 0702802317 init 3 周之前
Dockerfile 23432c6788 clean install bugfix 7 小時之前
README.md 664e531d4c 国际化 1 周之前
build_output.txt 57eed1b8e5 能力测试 1 周之前
check_schema.js 005974129a bug fix 2 周之前
chi_sim.traineddata 0702802317 init 3 周之前
database.sqlite 558d96e5ec bugfix 2 周之前
debug_es.js 005974129a bug fix 2 周之前
eng.traineddata 0702802317 init 3 周之前
es_results.txt 50df79fabb feature 2 周之前
eslint.config.mjs 0702802317 init 3 周之前
jpn.traineddata 0702802317 init 3 周之前
metadata.db 50df79fabb feature 2 周之前
nest-cli.json 0702802317 init 3 周之前
package.json 9ec81b3f1b feishu plugin 1 周之前
pdf_to_images.py c83f9eda43 fix merge errors 1 周之前
schema_output.txt 50df79fabb feature 2 周之前
test_db.py 558d96e5ec bugfix 2 周之前
test_output.txt 72ba7c21ab 飞书机器人对接,lint检查格式化 1 周之前
tsconfig.build.json 50df79fabb feature 2 周之前
tsconfig.build.tsbuildinfo 9ec81b3f1b feishu plugin 1 周之前
tsconfig.json 50df79fabb feature 2 周之前
tsconfig.tsbuildinfo e6b6d31452 fix merge error 1 周之前
yarn.lock 0702802317 init 3 周之前

README.md

Simple Knowledge Base - Backend Service (Server)

A RAG (Retrieval-Augmented Generation) knowledge base backend system built with the NestJS framework. Provides file parsing, vector indexing, hybrid search, and multi-model management features.

🌟 Key Features

  • Intelligent Document Processing: Integrated with Apache Tika to support text extraction from various formats including PDF, Word, Markdown, and TXT.
  • Efficient Vector Search: Uses Elasticsearch as the vector database. Supports hybrid search combining KNN vector search and full-text search.
  • Flexible RAG Engine: Built on LangChain. Supports customization of chunking rules and reranking.
  • Multi-Model Provider: Supports dynamic connection to OpenAI, Google Gemini, and locally deployed LLM models.
  • Secure Management: Built-in JWT authentication and user permission management.

🛠️ Tech Stack

  • Framework: NestJS (TypeScript)
  • Database: SQLite (TypeORM)
  • Search Engine: Elasticsearch 8.x/9.x
  • AI Framework: LangChain
  • Libraries: RxJS, Class-Validator

📋 Prerequisites

Before running the project, ensure you have the following environment set up:

  • Node.js (v18 or higher recommended)
  • Yarn
  • Docker & Docker Compose (for infrastructure)

🚀 Quick Start

1. Start Infrastructure

Use the docker-compose.yml file in the project root (simple-kb/) to quickly start Elasticsearch and Tika.

# Run from project root directory
docker-compose up -d

After successful startup:

  • Elasticsearch: Listens on port 19200 (maps container's 9200 port)
  • Tika: Listens on port 9998

2. Install Dependencies

Navigate to the server directory and install packages:

cd server
yarn install

3. Environment Configuration

The project uses environment variables for configuration. Make sure the settings are correct (especially Elasticsearch address):

# Database path
DATABASE_PATH=server/data/metadata.db

# JWT secret
JWT_SECRET=your_secure_secret

# Elasticsearch settings (match docker-compose ports)
ELASTICSEARCH_HOST=http://localhost:19200
ELASTICSEARCH_INDEX=knowledge_base

# Tika settings
TIKA_HOST=http://localhost:9998

# File upload storage path
UPLOAD_FILE_PATH=./uploads

4. Start Services

# Development mode (recommended, with hot reload)
yarn run start:dev

# Build and run in production mode
yarn build
yarn run start:prod

The backend service runs on http://localhost:13000 by default, with API prefix /api.

🧪 Testing

# Unit tests
yarn run test

# E2E tests
yarn run test:e2e

⚠️ Notes and Tips

  1. Database Initialization:

    • On first run, TypeORM automatically creates metadata.db (or configured DB) under the server/data/ directory.
    • In development mode, with synchronize: true, table structures are auto-synced.
  2. Elasticsearch Connection:

    • If you encounter a Connection refused error, check if Docker containers are running properly (docker ps).
    • On service startup, an index named knowledge_base is automatically detected and created.
  3. Default Account:

    • If you reset the database, either register a new user or refer to existing admin data. It's recommended to use the frontend registration feature to create the first user.
  4. File Parsing:

    • When uploading large files, parsing by Tika may take a few seconds. Please wait while checking the frontend processing status.