# Simple Knowledge Base - Backend Service (Server) A RAG (Retrieval-Augmented Generation) knowledge base backend system built with the [NestJS](https://nestjs.com/) framework. Provides file parsing, vector indexing, hybrid search, and multi-model management features. ## ๐ŸŒŸ Key Features - **Intelligent Document Processing**: Integrated with [Apache Tika](https://tika.apache.org/) to support text extraction from various formats including PDF, Word, Markdown, and TXT. - **Efficient Vector Search**: Uses [Elasticsearch](https://www.elastic.co/) as the vector database. Supports hybrid search combining KNN vector search and full-text search. - **Flexible RAG Engine**: Built on LangChain. Supports customization of chunking rules and reranking. - **Multi-Model Provider**: Supports dynamic connection to OpenAI, Google Gemini, and locally deployed LLM models. - **Secure Management**: Built-in JWT authentication and user permission management. ## ๐Ÿ› ๏ธ Tech Stack - **Framework**: NestJS (TypeScript) - **Database**: SQLite (TypeORM) - **Search Engine**: Elasticsearch 8.x/9.x - **AI Framework**: LangChain - **Libraries**: RxJS, Class-Validator ## ๐Ÿ“‹ Prerequisites Before running the project, ensure you have the following environment set up: - [Node.js](https://nodejs.org/) (v18 or higher recommended) - [Yarn](https://yarnpkg.com/) - [Docker](https://www.docker.com/) & Docker Compose (for infrastructure) ## ๐Ÿš€ Quick Start ### 1. Start Infrastructure Use the `docker-compose.yml` file in the project root (`simple-kb/`) to quickly start Elasticsearch and Tika. ```bash # Run from project root directory docker-compose up -d ``` After successful startup: - **Elasticsearch**: Listens on port `19200` (maps container's 9200 port) - **Tika**: Listens on port `9998` ### 2. Install Dependencies Navigate to the `server` directory and install packages: ```bash cd server yarn install ``` ### 3. Environment Configuration The project uses environment variables for configuration. Make sure the settings are correct (especially Elasticsearch address): ```env # Database path DATABASE_PATH=server/data/metadata.db # JWT secret JWT_SECRET=your_secure_secret # Elasticsearch settings (match docker-compose ports) ELASTICSEARCH_HOST=http://localhost:19200 ELASTICSEARCH_INDEX=knowledge_base # Tika settings TIKA_HOST=http://localhost:9998 # File upload storage path UPLOAD_FILE_PATH=./uploads ``` ### 4. Start Services ```bash # Development mode (recommended, with hot reload) yarn run start:dev # Build and run in production mode yarn build yarn run start:prod ``` The backend service runs on **http://localhost:13000** by default, with API prefix `/api`. ## ๐Ÿงช Testing ```bash # Unit tests yarn run test # E2E tests yarn run test:e2e ``` ## โš ๏ธ Notes and Tips 1. **Database Initialization**: - On first run, TypeORM automatically creates `metadata.db` (or configured DB) under the `server/data/` directory. - In development mode, with `synchronize: true`, table structures are auto-synced. 2. **Elasticsearch Connection**: - If you encounter a `Connection refused` error, check if Docker containers are running properly (`docker ps`). - On service startup, an index named `knowledge_base` is automatically detected and created. 3. **Default Account**: - If you reset the database, either register a new user or refer to existing admin data. It's recommended to use the frontend registration feature to create the first user. 4. **File Parsing**: - When uploading large files, parsing by Tika may take a few seconds. Please wait while checking the frontend processing status.