Jun 16, 2024 | By Duncan Gichimu |Chatbot| Python
Building a Chatbot - Python gRPC Service
In this guide, we'll create a gRPC service in Python that utilizes LLama Index to make requests to OpenAI. We'll dockerize this service, and deploy it to an AWS EC2 instance.
Step 1: Setup Python Environment
First, set up a Python environment. Using Jetbrain’s PyCharm IDE simplifies this process. Create a new project and install the required packages
shell
python -m pip install grpcio grpcio-tools llama-index python-dotenv
Step 2: Define gRPC Service
Create a protos directory in the root of your project and add a file named chat.proto:
plaintext
syntax = "proto3";
service Chat {
rpc answer_question(ChatRequest) returns (stream ChatResponse);
}
message ChatMessage {
string role = 1; // Role will be either USER or SYSTEM
string content = 2;
}
message ChatRequest {
repeated ChatMessage prompts = 1;
}
message ChatResponse {
string response = 1;
}
Generate the gRPC code from the proto file:
shell
python -m grpc_tools.protoc -I./protos --python_out=. --pyi_out=. --grpc_python_out=. ./protos/chat.proto
This will generate:
Step 3: Create gRPC Service
Create a .env file in the root directory. If you haven't already instantiate a git repository and add the .env to your .gitignore
plaintext
OPENAI_API_KEY=<Your OPENAI_API_KEY>
MODEL=gpt-3.5-turbo-0125
EMBEDDING_MODEL=text-embedding-3-small
TOP_K=3
SYSTEM_PROMPT=<Your SYSTEM_PROMPT>
Configure Chat Engine settings:
python
# settings.py
import os
from typing import Dict
from llama_index.core.settings import Settings
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
def llm_config_from_env() -> Dict:
from llama_index.core.constants import DEFAULT_TEMPERATURE
model = os.getenv("MODEL")
temperature = os.getenv("LLM_TEMPERATURE", DEFAULT_TEMPERATURE)
max_tokens = os.getenv("LLM_MAX_TOKENS")
config = {
"model": model,
"temperature": float(temperature),
"max_tokens": int(max_tokens) if max_tokens is not None else None,
}
return config
def embedding_config_from_env() -> Dict:
model = os.getenv("EMBEDDING_MODEL")
dimension = os.getenv("EMBEDDING_DIM")
config = {
"model": model,
"dimension": int(dimension) if dimension is not None else None,
}
return config
def init_settings():
llm_configs = llm_config_from_env()
embedding_configs = embedding_config_from_env()
Settings.llm = OpenAI(**llm_configs)
Settings.embed_model = OpenAIEmbedding(**embedding_configs)
Settings.chunk_size = int(os.getenv("CHUNK_SIZE", "1024"))
Settings.chunk_overlap = int(os.getenv("CHUNK_OVERLAP", "20"))
Create the server script:
python
# server.py
import asyncio
import logging
import grpc
from dotenv import load_dotenv
import chat_pb2_grpc
from chat_servicer import ChatServicer
from settings import init_settings
load_dotenv()
init_settings()
async def serve() -> None:
server = grpc.aio.server()
chat_pb2_grpc.add_ChatServicer_to_server(ChatServicer(), server)
server.add_insecure_port("[::]:15006")
await server.start()
await server.wait_for_termination()
if __name__ == "__main__":
logging.basicConfig(level=logging.INFO)
asyncio.run(serve())
Step 4: Dockerize the Application
Generate a requirements.txt file:
shell
pip freeze > requirements.txt
Create a Dockerfile:
plaintext
FROM python:3.10
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 15006
CMD ["python", "server.py"]