Skip to content

LLMs & the Model Family

Role of LLMs in agents

The LLM is the brain of the agent. It:

  • Interprets the user's goal
  • Decides which tool to use (and with what arguments)
  • Synthesizes observations into a final answer

Model families

Family Examples Notes
Open-weight Llama 3, Mistral, Qwen Run locally or on HF Inference
Proprietary GPT-4o, Claude 3.5 API access only
Code-specialized DeepSeek-Coder, StarCoder Better for code tools

Special tokens

Chat models use special tokens to structure conversation turns. Example (Llama 3 format):

<|begin_of_text|>
<|start_header_id|>system<|end_header_id|>
You are a helpful assistant.<|eot_id|>
<|start_header_id|>user<|end_header_id|>
What is 2+2?<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>

Agents inject tool schemas and observations into these structured turns.

Accessing models via HF Hub

from huggingface_hub import InferenceClient

client = InferenceClient(model="meta-llama/Meta-Llama-3-8B-Instruct")
response = client.chat_completion(
    messages=[{"role": "user", "content": "Hello!"}],
    max_tokens=256,
)
print(response.choices[0].message.content)

Notes

Add your own notes and experiments here.