Skip to main content

Command Palette

Search for a command to run...

What is an AI Agent? A Beginner's Guide

Learn what AI agents are, how they work, and how they're already being used in production. A beginner-friendly guide from an AI engineer building real agents in Bangladesh.

Updated
8 min read
What is an AI Agent? A Beginner's Guide
K
AI Engineer building agents and LLM-powered software systems. Weekly insights on AI, automation & what's actually worth your attention.

Introduction

You've probably heard "AI agent" everywhere lately.

OpenAI is building them. Google is building them. Every tech company is announcing some kind of "agentic" product. But if you've tried to Google what an AI agent actually is, you've probably hit a wall of jargon that explained nothing.

So let me break it down the way I wish someone had explained it to me when I started building.

By the end of this post, you'll know:

  • What an AI agent actually is

  • How it's different from a regular chatbot like ChatGPT

  • How the agent loop works under the hood

  • What are the 4 core components of every agent

  • Real-world examples — including ones I've built

Let's get into it.


Chatbot vs. AI Agent — The Real Difference

Most people's first experience with AI is something like ChatGPT. You type a message, it replies. You type again, it replies again.

That's a chatbot — a reactive system. It waits. It responds. That's it.

An AI agent is fundamentally different. Instead of just responding, it:

  • Receives a goal

  • Breaks it into steps

  • Decides what actions to take

  • Uses tools to execute those actions

  • Checks its own output

  • Keeps going until the goal is done

Think of it this way:

Chatbot = A smart person sitting at a desk waiting for your questions

AI Agent = A smart person with a task list, a laptop, internet access, and the ability to make decisions and take actions on their own


What is an AI Agent? (The Clean Definition)

An AI agent is a system that uses a large language model (LLM) as its brain to perceive its environment, reason about what to do, and take actions to achieve a goal — with minimal human input.

The key word is actions. An agent doesn't just generate text. It does things:

  • Searches the web

  • Reads and writes files

  • Calls APIs

  • Books appointments

  • Sends emails

  • Queries databases

  • Talks to other agents


The Agent Loop — How It Actually Works

Every AI agent runs on a simple repeating loop. It's called the Think → Act → Observe cycle.

┌─────────────────────────
│             THE AGENT LOOP              │
│                                         │
│  1. THINK                               │ 
│     LLM reads the goal + context        │
│     Decides what to do next             │
│              ↓                          │
│  2. ACT                                 │
│     Calls a tool or takes an action     │
│     e.g. search web, query DB, call API │
│              ↓                          │
│  3. OBSERVE                             │
│     Reads the result of the action      │
│     Updates its understanding           │
│              ↓                          │
│  4. REPEAT OR STOP                      │
│     Is the goal achieved?               │
│     No  → back to step 1               │
│     Yes → return the final answer      │
└───────────────────────────

Let's make this concrete with a calling agent -

Goal: "A caller wants to book a physiotherapy appointment on Friday afternoon."

Here's the loop:

  1. Think — "I need to check what Friday afternoon slots are available"

  2. Act — Queries the appointment database → get_slots(day="Friday", time="afternoon")

  3. Observe — Gets back: 2:00 PM and 4:30 PM are open

  4. Think — "I should offer both options to the caller"

  5. Act — Speaks to the caller: "I have 2 PM and 4:30 PM available on Friday. Which works for you?"

  6. Observe — Caller says: "4:30 please"

  7. Think — "I need to confirm the booking and get their name"

  8. Act — Books the slot + sends SMS confirmation

  9. Done

A chatbot would have said: "I can't access real-time appointment data."

The agent went and got it. That's the difference.


The 4 Core Components of Every AI Agent

No matter how complex an agent is — a simple email classifier or a multi-step voice agent— it always comes down to these 4 components:

1. 🧠 Brain (The LLM)

The large language model is the reasoning engine. It reads the situation, decides what to do next, and generates responses. GPT-4o, Claude Sonnet, Gemini, LLaMA — any of these can be the brain.

The model you choose matters more than you think. In a later post I'll share the real cost and latency numbers from production agents I've built — including which models we use for which tasks and exactly why.

2. 🛠️ Tools

Tools are what the agent uses to interact with the world. Without tools, an agent is just a chatbot. Tools are what give it hands.

Examples:

  • Web search

  • Database queries

  • File reader/writer

  • Calendar / booking system

  • Email sender

  • API caller

  • Code interpreter

3. 💾 Memory

Memory lets the agent remember things — within a conversation and across sessions.

Two types:

  • Short-term memory — what happened earlier in this conversation

  • Long-term memory — stored knowledge that the agent can retrieve later, usually in a vector database

The agent remembers everything from the current call (short-term). We'll dedicate a full post to memory design; it's one of the trickiest parts to get right.

4. 📋 Planning

Planning is how the agent breaks a big goal into smaller steps. Some agents plan everything upfront. Others figure it out one step at a time as they go (this is called ReAct — Reason + Act).

Most production agents today use a hybrid: light upfront structure, flexible step-by-step execution.


Types of AI Agents You'll Build in This Series

ReAct Agents

The most common. Think → Act → Observe. Step by step. This is where we start.

Memory-Augmented Agents

ReAct agents with both short and long-term memory. Can remember past interactions, personalise responses, and improve over time.

LangGraph Agents

Agents designed as graphs — with conditional paths, loops, and checkpoints. Far more control than a linear chain.

Multi-Agent Systems

Multiple specialised agents working as a team. One plans, one researches, one executes. Like a company, not a single employee.

Voice Agents

Agents with a full speech pipeline — they listen (STT), reason (LLM), and speak (TTS). We'll build one from scratch in Posts 14–15.


Why This Matters Right Now

AI agents are not just a trend. They represent a fundamental shift in how software is built.

Instead of writing rigid code for every scenario, you define a goal and let the agent figure out the steps. This means:

  • Automating complex multi-step tasks that weren't automatable before

  • Building products solo that used to need a full team

  • Creating systems that adapt when things change, rather than break

Agents that would have required a 10-person team two years ago can now be built by one engineer with the right stack.

That's the opportunity. And that's exactly why I'm writing this series.


What's Coming in This Series (All 21 Posts)

Here's the full roadmap — from zero to production:

Module 1 — Foundations (Posts 1–4) Setting up your environment, LangChain basics, building a RAG system.

Module 2 — Memory & Tools (Posts 5–8) Short vs long-term memory, tool calling, MCP integration, a real personal AI assistant.

Module 3 — LangGraph Workflows (Posts 9–11) Graphs vs chains, StateGraph, human-in-the-loop patterns.

Module 4 — Multi-Agent Systems (Posts 12–13) Supervisor/worker patterns, building a research system with multiple agents.

Module 5 — Voice Agents (Posts 14–15) STT → LLM → TTS pipeline, building a real voice agent with Vapi & ElevenLabs.

Module 6 — Agentic Automation with n8n (Posts 16–17) No-code AI workflows, real automation systems.

Module 7 — Ship to Production (Posts 18–21) FastAPI, deployment, evals & observability, and the real cost of running agents in production.

Every post is hands-on. Real code. Real projects. Real numbers.


Next Up — Post 2

Setting Up Your AI Dev Environment

We'll install Python, configure VS Code, and set up API keys


Key Takeaways

  • An AI agent uses an LLM to reason, plan, and take actions to achieve a goal

  • The core loop is Think → Act → Observe → Repeat

  • Every agent has 4 components: Brain, Tools, Memory, Planning

  • Agents can do things chatbots can't — book appointments, search the web, send emails, query databases

  • This series will take you from zero to a deployed production agent in 21 posts


Subscribe for More

New post every Monday. If you want the weekly highlights — AI news, tool picks, and automation tips — subscribe to the newsletter:

👉 Subscribe to AI Fieldnotes on Substack →

Got questions or stuck on something? Drop a comment — I read every one.


Khair Ahammed is an AI Engineer who builds AI agents, Voice Agents, LLM apps, and agentic software solutions.

Building AI Agents

Part 1 of 2

A hands-on series for beginners learning to build real AI agents from scratch. Covers RAG systems, LLM apps, agentic workflows, and deployment — using tools like LangChain, LangGraph, FastAPI, and many more. No fluff, just things that actually work.

Up next

Setting Up Your AI Dev Environment — The Right Way

This post is the environment setup I wish someone had given me before I started. If you're following the AI Fieldnotes series, this is Post 2 — we're building the foundation before anything else. By t

More from this blog

A

AI Fieldnotes — by Khair Ahammed

2 posts

AI Fieldnotes is a weekly blog by Khair Ahammed, an AI Engineer building agents, LLM apps, and automation systems. Every post is a real note from the field: practical tutorials, tool breakdowns, and workflow guides on agents. No theory dumps. Just things that actually work.