Setup

In this section, we will go through the basic steps needed to set up your environment for working with generative AI models. This includes getting API credentials, installing the libraries you need, and setting up your development environment so it can talk to different AI service providers.

Setup API

To start prompting an AI model, you first need access to it. This usually means getting API credentials from a service provider such as OpenAI, Hugging Face, or others. Once you have your API key, you can use it to send requests to the model and get back the responses it generates.

Some API providers you can use:

OpenAI: https://openai.com/
Hugging Face: https://huggingface.co/
Anthropic: https://www.anthropic.com/
Google Gemini: https://ai.google.dev/gemini
Open Router: https://openrouter.ai/

OpenAI SDK Setup

To set up the OpenAI SDK, follow these steps:

Install the OpenAI SDK and Dotenv using pip:
Terminal window
```
pip install openai python-dotenv
```
Create a .env file in your project folder and add your OpenAI API key:
```
OPENAI_API_KEY=your_api_key_here
```

Load the API key in your Python script and set up the OpenAI client:

from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4.1-mini",
    messages=[
        {
            "role": "user",
            "content": "Explain Python in simple words"
        }
    ]
)

print(response.choices[0].message.content)

Gemini SDK Setup

To set up the Gemini SDK, follow these steps:

Install the Gemini SDK using pip:
Terminal window
```
pip install google-genai python-dotenv
```
Create a .env file in your project folder and add your Gemini API key:
```
GEMINI_API_KEY=your_api_key_here
```

Load the API key in your Python script and set up the Gemini client:

from google import genai
from dotenv import load_dotenv
import os

load_dotenv()

client = genai.Client(
    api_key=os.getenv("GEMINI_API_KEY")
)

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Explain FastAPI in simple words"
)

print(response.text)

OpenAI SDK for Any API provider

The OpenAI SDK is not only for OpenAI’s own models. You can also use it to talk to other API providers. To do this, you just need to point the SDK to the right endpoint and add any login details it needs.

Install the OpenAI SDK and Dotenv using pip:
Terminal window
```
pip install openai python-dotenv
```

Create a .env file in your project folder and add your API key and endpoint:

API_KEY=your_api_key_here
API_ENDPOINT=https://api.yourprovider.com/v1/generate

Load the API key and endpoint in your Python script and set up the OpenAI client:

from openai import OpenAI
from dotenv import load_dotenv
import os

load_dotenv()

client = OpenAI(
    api_key=os.getenv("API_KEY"),
    base_url=os.getenv("API_ENDPOINT")
)
response = client.chat.completions.create(
    model="your-model-name",
    messages=[
        {
            "role": "user",
            "content": "Explain the concept of machine learning in simple terms."
        }
    ]
)

print(response.choices[0].message.content)

Local Model

You can also set up and run generative AI models locally on your own machine, instead of using an online API. This can be done using Docker, Hugging Face’s Transformers library, or other tools that let you run AI models on your own system. Running models locally gives you more control over your setup, and it can be useful in cases where you are working with sensitive data.

Docker Setup

To set up a generative AI model using Docker, follow these steps:

Install Docker on your machine if you don’t already have it. You can download it from the official Docker website: https://www.docker.com/get-started
Pull and run the Ollama Docker image. This gives you a local server that you can use to download and run AI models:

Docker Image: https://hub.docker.com/r/ollama/ollama
Terminal window
```
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
```
Once the Docker container is running, you can talk to the model using the API endpoints it provides. For example, with Ollama, you can send requests to http://localhost:11434 to generate content.

Next, you need to pull and run the model you want to use locally. For example, if you want to use the llama3.2 model, you can pull it using this command:

Model Library: https://ollama.com/library

# pull model
docker exec -it ollama ollama pull <model-name>
# docker exec -it ollama ollama pull llama3.2
# run model
docker exec -it ollama ollama run <model-name>
# docker exec -it ollama ollama run llama3.2

You can also set up a UI for your local model using Open Web UI, which gives you an easy to use interface for talking to your local generative AI models. You can read more about it here:

Open Web UI: https://docs.openwebui.com/getting-started/quick-start/
Follow the instructions given by Open Web UI to connect it to your local model, and start generating content through the web interface.

You can also use FastAPI to build your own custom API endpoint for your local model. This makes it easier to plug the model into your own applications. You can read more about FastAPI here:

FastAPI: https://fastapi.tiangolo.com/

from fastapi import FastAPI
from pydantic import BaseModel
from ollama import Client

app = FastAPI()

MODEL_NAME = "llama3.2"

client = Client(host="http://localhost:11434")

class GenerateRequest(BaseModel):
    prompt: str

@app.post("/generate")
async def generate_content(data: GenerateRequest):
    response = client.chat(
        model=MODEL_NAME,
        messages=[
            {
                "role": "user",
                "content": data.prompt,
            }
        ],
    )

    return {
        "response": response["message"]["content"]
    }

Hugging Face Transformers Setup

To set up a generative AI model using Hugging Face’s Transformers library, follow these steps:

Install the Hugging Face CLI using pip:
Terminal window
```
pip install -U "huggingface_hub" #global install
```
Log in to your Hugging Face account using the CLI:
Terminal window
```
hf auth login
hf whoami
```
You need a Hugging Face account to log in. You can create one for free at https://huggingface.co/join and get your API token from https://huggingface.co/settings/tokens.
Install the Transformers and torch libraries:
Terminal window
```
uv add transformers torch
```
Load a pre-trained model and tokenizer in your Python script:
```
from transformers import pipeline
import json

pipe = pipeline("text-generation", model="Qwen/Qwen2.5-0.5B-Instruct")

messages = [{"role": "user", "content": "What is FastAPI?"}]

response = pipe(messages)

print(json.dumps(response, indent=2))
```
If the model is not already in your local cache, it will be downloaded from Hugging Face’s model hub and saved in a cache folder on your machine for later use. On Linux, this folder is ~/.cache/huggingface/hub, and on Windows, it is %USERPROFILE%\.cache\huggingface\hub.