Jun 16, 2024 | By Duncan Gichimu |Chatbot| Python

Building a Chatbot - Python gRPC Service

Building a Chatbot - Python gRPC Service

In this guide, we'll create a gRPC service in Python that utilizes LLama Index to make requests to OpenAI. We'll dockerize this service, and deploy it to an AWS EC2 instance.

Step 1: Setup Python Environment

First, set up a Python environment. Using Jetbrain’s PyCharm IDE simplifies this process. Create a new project and install the required packages

shell
python -m pip install grpcio grpcio-tools llama-index python-dotenv

Step 2: Define gRPC Service

Create a protos directory in the root of your project and add a file named chat.proto:

plaintext
syntax = "proto3";

service Chat {
    rpc answer_question(ChatRequest) returns (stream ChatResponse);
}

message ChatMessage {
    string role = 1;  // Role will be either USER or SYSTEM
    string content = 2;
}

message ChatRequest {
    repeated ChatMessage prompts = 1;
}

message ChatResponse {
    string response = 1;
}

Generate the gRPC code from the proto file:

shell
python -m grpc_tools.protoc -I./protos --python_out=. --pyi_out=. --grpc_python_out=. ./protos/chat.proto

This will generate:

  • chat_pb2.py
  • chat_pb2.pyi
  • chat_pb2_grpc.py
  • Step 3: Create gRPC Service

    Create a .env file in the root directory. If you haven't already instantiate a git repository and add the .env to your .gitignore

    plaintext
    OPENAI_API_KEY=<Your OPENAI_API_KEY>
    MODEL=gpt-3.5-turbo-0125
    EMBEDDING_MODEL=text-embedding-3-small
    TOP_K=3
    SYSTEM_PROMPT=<Your SYSTEM_PROMPT>

    Configure Chat Engine settings:

    python
    # settings.py
    import os
    from typing import Dict
    from llama_index.core.settings import Settings
    from llama_index.embeddings.openai import OpenAIEmbedding
    from llama_index.llms.openai import OpenAI
    
    def llm_config_from_env() -> Dict:
        from llama_index.core.constants import DEFAULT_TEMPERATURE
    
        model = os.getenv("MODEL")
        temperature = os.getenv("LLM_TEMPERATURE", DEFAULT_TEMPERATURE)
        max_tokens = os.getenv("LLM_MAX_TOKENS")
    
        config = {
            "model": model,
            "temperature": float(temperature),
            "max_tokens": int(max_tokens) if max_tokens is not None else None,
        }
        return config
    
    def embedding_config_from_env() -> Dict:
        model = os.getenv("EMBEDDING_MODEL")
        dimension = os.getenv("EMBEDDING_DIM")
    
        config = {
            "model": model,
            "dimension": int(dimension) if dimension is not None else None,
        }
        return config
    
    def init_settings():
        llm_configs = llm_config_from_env()
        embedding_configs = embedding_config_from_env()
    
        Settings.llm = OpenAI(**llm_configs)
        Settings.embed_model = OpenAIEmbedding(**embedding_configs)
        Settings.chunk_size = int(os.getenv("CHUNK_SIZE", "1024"))
        Settings.chunk_overlap = int(os.getenv("CHUNK_OVERLAP", "20"))

    Create the server script:

    python
    # server.py
    import asyncio
    import logging
    import grpc
    from dotenv import load_dotenv
    import chat_pb2_grpc
    from chat_servicer import ChatServicer
    from settings import init_settings
    
    load_dotenv()
    init_settings()
    
    async def serve() -> None:
        server = grpc.aio.server()
        chat_pb2_grpc.add_ChatServicer_to_server(ChatServicer(), server)
        server.add_insecure_port("[::]:15006")
        await server.start()
        await server.wait_for_termination()
    
    if __name__ == "__main__":
        logging.basicConfig(level=logging.INFO)
        asyncio.run(serve())

    Step 4: Dockerize the Application

    Generate a requirements.txt file:

    shell
    pip freeze > requirements.txt

    Create a Dockerfile:

    plaintext
    FROM python:3.10
    
    WORKDIR /app
    
    COPY requirements.txt .
    
    RUN pip install --no-cache-dir -r requirements.txt
    
    COPY . .
    
    EXPOSE 15006
    
    CMD ["python", "server.py"]