Valuable insights
1.OpenAI Agents SDK for AI Development: The OpenAI Agents SDK provides a flexible yet uniform framework for building AI agents, offering features like agents, handoffs, guardrails, and session history management for rapid development.
2.Interior Design Agent Demo: A demonstration showcases an AI agent that generates interior design images based on a floor plan image and specified design style, highlighting practical application of agent capabilities.
3.Project Workflow Overview: The project workflow includes input triggers (floor plan, design style), guardrails for input validation, an agent as the core logic, and tools for executing actions like image generation.
4.Python Project Setup: Setting up the Python project involves installing dependencies like the OpenAI Agents SDK, Pydantic, Asyncio, and Streamlit, and organizing code into agent, tools, and resource files.
5.Basic Agent Creation and Math Example: A foundational agent is created to answer math questions, demonstrating the basic structure of defining an agent's name, instructions, and running it via a runner function.
6.API Key and Environment Variables: Securely managing API keys is crucial; they should be stored in environment variables (e.g., in a .env file) and loaded using libraries like python-dotenv to prevent exposure.
7.Enhancing Agents with Tools: Agents can be equipped with tools, such as the pre-built image generation tool from the OpenAI SDK, enabling them to perform complex tasks like creating visuals directly.
8.Displaying Generated Images: Generated images can be displayed by looping through tool call items, saving them to an output folder, and using functions to open or display them on a frontend.
9.Importance and Implementation of Guardrails: Guardrails validate agent inputs and outputs, protecting against malicious data or invalid use cases. Input guardrails check initial user input, while output guardrails monitor agent responses.
10.Creating Custom Tools for Agents: Custom tools are Python functions wrapped with OpenAI's function tool decorator, allowing agents to interact with external services or databases, like saving design data.
11.Structured Data Output with Pydantic: Pydantic models can define structured data formats for agent outputs, facilitating integration with databases or frontends, and ensuring data consistency for specific tasks.
12.Streamlit for User-Friendly Interfaces: Streamlit provides an easy way to build interactive frontends for Python projects, enabling a more user-friendly experience for interacting with AI agents without complex web development.
Intro: What We're Building
This video guides viewers through building their first AI agent using Python from scratch. The concepts learned will be applicable to developing various types of AI agents. The primary framework utilized is the OpenAI Agents SDK, chosen for its balance of flexibility and uniformity, making it efficient for quick agent development.
Why the OpenAI Agents SDK
The OpenAI Agents SDK is preferred for its robust feature set, which includes agents capable of using instructions and tools, handoffs for delegating tasks between agents, guardrails for input/output validation, and sessions for maintaining conversation history. For this project, the focus will be on agents and guardrails.
- Agents: LLMs equipped with instructions and tools.
- Handoffs: Allow agents to delegate tasks to other agents.
- Guardrails: Enable validation of agent inputs and outputs.
- Sessions: Automatically maintain conversation history across agent runs.
Demo: Interior Design Agent
A demonstration showcases an interior design agent designed to generate images of interior spaces based on a provided floor plan and a specific design style. This use case serves as a practical example, though the underlying principles apply broadly to any agent development project.
The demo involves uploading an example floor plan and selecting a design style, such as 'Tudor period design'. Upon running the agent, it processes the inputs and generates images representing the interior design concepts for the home, which are then presented as the final output.
Project Diagram & Workflow
The project's workflow begins with input triggers: a floor plan image and a text-based design style. These inputs first pass through an input guardrail, designed to verify the image is indeed a floor plan and protect the agent from unsuitable data. This ensures the agent receives relevant information.
Following the guardrail, the core agent acts as the 'brain', utilizing tools like image generation and a database saving function to execute the desired task. It processes the design style and floor plan to create interior design visualizations and save relevant data, ultimately producing confirmation or query responses.
Setting up the Python Project
The project setup involves creating a Python environment and installing necessary dependencies. Key libraries include the OpenAI Agents SDK for agent functionality, Pydantic for data modeling, Asyncio for asynchronous operations, and Streamlit for building user interfaces. These dependencies facilitate the development of sophisticated AI applications.
The project structure includes a `lib` folder containing `agent.py` for agent logic, `tools.py` for custom tool definitions, and a `files` folder for handling input/output files. A `main.py` file orchestrates the execution, primarily by running an asynchronous main function that contains the agent logic.
Creating a Basic Agent
To understand agent creation, a basic agent named 'my_call_agent' is defined. Its core instruction is to answer math questions submitted by the user. This simple example illustrates how to instantiate an agent and provide it with instructions, serving as a fundamental building block for more complex agents.
Running this basic agent involves creating a `run_agent` function. This function utilizes a runner utility to execute the agent with user-provided input, such as a math problem like 'what is 2 + 2?'. The result is then printed and returned, demonstrating the agent's response mechanism.
API Key Setup & Environment Variables
Connecting to OpenAI services requires an API key. This key can be generated from the OpenAI platform dashboard. It is crucial to manage API keys securely by storing them in environment variables, typically within a `.env` file, rather than directly in the code to prevent security risks.
To load these environment variables into the project, a library like `python-dotenv` is used. The OpenAI Agents SDK automatically accesses these variables, so no explicit configuration is needed within the agent code once the `.env` file is set up and loaded correctly. This ensures secure and seamless integration.
First Working Agent (Math Example)
The initial math-solving agent is executed, demonstrating its ability to correctly calculate '2 + 2 = 4' and '2 - 2 = 0'. While this use case is simple and doesn't strictly require an agent, it effectively showcases the fundamental structure and process of creating agentic functionality within a Python project.
Swapping to Interior Design Agent
The project shifts focus to the interior design agent. The agent's role is defined as generating design images for rooms based on a user-submitted floor plan. The prompt provides detailed instructions, guiding the agent through steps like identifying rooms, determining dimensions, planning layouts, and generating images for each room.
Crucially, the prompt specifies guidelines for image generation, such as relevance to the floor plan, appropriate camera perspective, and avoiding the addition of non-existent features like doors or windows. It also limits the total number of generated images to a maximum of five to manage processing time.
A key instruction is to save interior design details to a database only after image generation and to store the entire floor plan's design in a single database entry. The agent is instructed to return the final output, including generated images, without including text links to them.
Adding Inputs (Floor Plan + Design Style)
To enable the interior design agent, two primary inputs are required: a design style (string) and a floor plan image. The floor plan image should be a path to the image file located in a dedicated 'resources' folder within the project. This setup allows the agent to access the necessary visual data.
The input image is read from the 'resources' folder using a helper function and then encoded as a Base64 string. This encoded image, along with the design style, is formatted into a JSON array structure to be sent as the agent's input. This ensures the agent receives both the visual and textual preferences.
The agent's `run_agent` function is updated to accept a formatted input, which is a JSON array containing messages. This message structure includes roles like 'system' or 'user', and the content array can hold multiple items, such as the Base64 encoded image and the design style text.
Using the Image Generation Tool
The OpenAI Agents SDK includes several pre-built tools, including a powerful image generation tool. This tool allows agents to leverage OpenAI's image generation models directly without requiring external API calls, simplifying the process of incorporating visual creation capabilities into the agent.
To configure the image generation tool, specific parameters are required: `type` (set to 'image generation'), `output_format` (e.g., 'pngs'), `quality` (e.g., 'low' for faster testing), and `size` (e.g., '1024x1024'). These settings dictate the characteristics of the generated images.
By integrating the image generation tool into the agent's configuration, the agent can now create images. However, simply running the agent will not display these images directly; the output will describe them. A method is needed to visualize the results of the image generation process.
Displaying Generated Images
To view the images generated by the agent, a loop iterates through the items returned. If an item is identified as a tool call for image generation and it is an image, it is saved to an 'output' folder. The path to this image is then added to an `image_paths` array.
This approach allows for displaying images on a frontend. For the current demonstration, a function `open_file` is used to open each generated image directly. This requires creating the 'output' directory and ensuring necessary packages like `base64` are imported for handling image data.
Upon running the code with image display functionality, the agent successfully generates images for various rooms like the living room, kitchen, office, bathroom, and garage. These images are saved in the output folder, although initially, they might not appear to open automatically, potentially due to display settings or previous runs.
Why Guardrails Matter
Guardrails are essential for production-ready AI agents. Two critical missing components are a custom tool to save design information to a database and input/output guardrails. Without guardrails, an agent might attempt to process nonsensical inputs, like an image of a donut, leading to wasted resources and invalid results.
Guardrails act as validation layers. Input guardrails check initial user input before it reaches the main agent, while output guardrails monitor the agent's final responses. This prevents agents from generating inappropriate content, like swear words in customer service scenarios, or processing fundamentally incorrect data.
Building the Guardrail Agent
To implement guardrails, a new agent is created specifically for validation. This guardrail agent will output a boolean indicating if the input is disallowed. It uses Pydantic's BaseModel for defining its output structure, including a boolean field `is_not_allowed`.
The guardrail agent's instructions are to verify if the submitted image is a valid floor plan and if the design preference input is relevant and appropriate. It also checks for offensive or NSFW content, ensuring the main agent only processes safe and pertinent requests.
This guardrail agent is then wrapped in an `input_guardrail` function to signify its role. This function processes the input data and determines if the `is_not_allowed` flag should be triggered. If triggered, it prevents the main agent from running, saving processing power and avoiding erroneous outputs.
Additionally, a `reason` can be provided to explain why the input was disallowed, and a `trip_wire` boolean indicates if the guardrail was activated. This structured output ensures clarity on why an agent's execution might be halted by a guardrail.
Testing Guardrails with Invalid Input
Testing the guardrail involves submitting an invalid input, such as an image of a donut (`donut.jpg`) instead of a floor plan. The system is set up with a `try-except` block to catch the `InputGuardrailTriggered` error that occurs when the guardrail is activated, preventing the main agent from processing the invalid data.
When the donut image is submitted, the guardrail agent correctly identifies it as not a floor plan. The guardrail is triggered, printing a message indicating its activation and stopping the execution of the more complex interior design generation process. This confirms the guardrail is functioning as intended.
Swapping back to the valid floor plan image allows the process to continue. The guardrail agent validates the input, finds it acceptable, and permits the main interior design agent to proceed with generating the images and saving the design details. The results appear successfully, demonstrating the guardrail's protective role.
Creating a Custom Tool (Save to Database)
Custom tools for agents are essentially any Python function that can be made executable by the agent. This allows agents to perform actions beyond simple text generation, such as saving data to a database or calling external APIs. For this project, a mock database function is created.
The custom tool function, `save_design_data_to_database`, is designed to accept structured data, specifically using a Pydantic model (`DesignDatabaseEntry`) for input. This ensures the agent sends data in a predictable format. For demonstration purposes, this data is saved to a text file, simulating a database entry.
To make this Python function usable as an agent tool, it must be decorated with `function_tool` from the OpenAI Agents SDK. This wrapper transforms the function into a tool that the agent can recognize and invoke. The tool is then imported and added to the agent's list of available tools.
To ensure the agent uses this tool, the prompt is updated with an explicit instruction: 'You must use the tool save_design_database_to'. This helps guarantee the functionality is triggered. The agent's output is modified to print 'Design data saved' when the tool is called, and the design styling is changed to '1920s art deco' for visual confirmation.
Final Results & Improvements
The agent successfully generates 1920s art deco interior design images based on the floor plan. It also creates a `design_output.txt` file, acting as a mock database entry. This entry contains detailed information about the rooms, design style, color palette, and furniture included in the designs.
A minor observation is that the `save_database` function might be called multiple times, potentially leading to duplicate entries if not overwriting. Refining the prompt could address this to ensure the tool is called only once per design set. Overall, the agent is highly useful, taking simple inputs to produce complex, actionable outputs.
While there might be minor inaccuracies in elements like window placement, these are typically addressable through prompt engineering. The core functionality of taking an input, executing a task via tools, and producing a structured output is successfully demonstrated, fulfilling the project's objectives.
Adding a Streamlit Frontend
For a more user-friendly interaction, a frontend interface was developed using Streamlit. This interface allows users to input their style preferences and select a file directly from their computer, providing a seamless way to interact with the AI agent without needing to handle file paths or complex command-line inputs.
Wrapping up & GitHub Link
All the code demonstrated in this video, including the agent logic, custom tools, guardrails, and the Streamlit frontend, has been made publicly available on GitHub. Interested viewers can access the repository via the link provided in the video description to download, use, and adapt the code for their own projects.
Useful links
These links were generated based on the content of the video to help you deepen your knowledge about the topics discussed.