We Replaced a $500/Ad UGC Workflow With a $2 A
Oct 30, 2025Last year, brands spent $24 billion on influencer marketing and user-generated content. The process is a logistical nightmare. It involves hiring creators, shipping products, and waiting weeks, sometimes months, to get a handful of videos back. This is a broken, inefficient system.
Imagine generating all of that content instantly.

You upload a single image of your product. An AI workflow then creates dozens of realistic, influencer-style videos ready for Instagram, TikTok, and Facebook. No cameras. No actors. No shipping. Just a constant stream of scroll-stopping ads, on demand.
This is not a future concept; it’s a system we have built. We used OpenAI’s Sora 2 video generation model, orchestrated with n8n, to create an automated ad campaign for a jewelry client and their product, the “Elder Ring.” Here’s how it works.

A System for Instant Ad Generation
The workflow begins when we upload a product image and name. For our jewelry client, we used a photo of their Elder Ring.
Once submitted, the workflow kicks off. First, some data cleanup occurs. Then, the system uses the OpenAI Vision API to analyze the image. Its goal is to invent the ideal influencer persona to promote this specific product. For a minimalist wallet, the persona might be a tech-savvy professional. For the Elder Ring, the AI generated something different: a 28-year-old artisan storyteller living in a cozy, urban loft, drawn to symbols of endurance and ancient myths.
This persona provides the creative direction. The system then uses this profile, along with the product image, to generate multiple unique video scripts. Finally, it prepares the initial frame and sends everything to the Sora 2 API to generate the videos.
The output is a series of short, authentic-feeling ads. One generated script read:
“This just arrived. It’s the Elder Ring… the craftsmanship is so intricate. There’s literally zero bulk. I mean, look how elegant that is. Yeah, this is epic.”
Another focused on a different angle:
“Honestly, this is the Elder Ring, and it just feels timeless on your finger. I can wear it every day… it’s just legendary.”
This process isn’t limited to jewelry. We ran the same workflow for a shampoo. The AI understood the context, generating a script that emphasized a key benefit for hair products: “It’s like not heavy at all, which is my main thing.”
The system adapts the persona, script, and tone to match the product, creating relevant, targeted content every time.
Deconstructing the Workflow
This system is more complex than a simple image generator because video involves more inputs and a brand new model. The process is broken down into four key stages.
- Image Analysis and Persona Creation. The system receives the product photo and uses the OpenAI Vision API to analyze it. It doesn’t just see a ring; it understands the aesthetic. Based on this analysis, it generates a detailed profile of the ideal influencer to promote it.
- UGC Script Generation. With the persona defined, the workflow uses Gemini 2.5 Pro to write multiple UGC video scripts. These scripts take the product photo and the new persona into account, ensuring the tone and language are a perfect match.
- First Frame Generation. The Sora 2 API can use a reference image, but it must serve as the very first frame of the video. To ensure product consistency — correct colors, shape, and details — we generate this first frame using the product photo. We use Nano Banana to resize the image to the required vertical video format, preserving intricate details like engravings or gem facets.
- Video Generation and Delivery. The system sends the initial frame and the script to the Sora 2 API. Since we generate multiple scripts, the workflow loops over each one, creating a new video for each script and uploading the final files to Google Drive.
From Image to Persona
The automation begins with a simple form trigger where we upload the product photo and enter the product name. The file is converted into a base64 string, a format that gives us more flexibility when working with different APIs like Gemini and Nano Banana.

The core of the creative process happens in the OpenAI Vision API call. The prompt we use instructs the AI to act as an expert casting director and consumer psychologist.
Its sole task is to analyze the product in the image and generate a single, highly detailed profile of the ideal person to promote it in a UGC ad. The deliverable is a rich character profile that makes the person feel real, believable, and perfectly suited to be a trusted advocate for the product.
For the Elder Ring, the AI created Elara: a 28-year-old female fantasy novelist living in a vintage-inspired studio in Portland, with a passion for ancient myths. This level of detail provides a strong foundation for writing authentic and compelling scripts.
Crafting Authentic Scripts with Gemini
Next, we use Gemini 2.5 Pro to write the video scripts. The master prompt is engineered to produce content that feels completely natural.

The instructions are specific: “You are an expert at creating authentic UGC video scripts that look like someone just grabbed their iPhone and hit record. Shaky hands, natural, zero production value. No text overlays, no polish, just real.”
We provide the AI with the creator profile we just generated and the product name. We also provide anti-prompting cases, telling the model what to avoid: perfect framing, stable surfaces, or on-screen graphics. The goal is to create something that doesn’t feel like a manufactured ad.
The output is three distinct, natural-sounding scripts, each with a different angle — an analytical first impression, a casual recommendation, and so on. These are then extracted and structured into a JSON array, making them easy to process in the next steps of the workflow.
Preparing the First Frame for Sora 2
Working with the Sora 2 API has a specific technical requirement: if you provide a reference image, it must have the exact same dimensions as the final video output. Our product photo of the Elder Ring was square, but we needed a vertical video at 720x1280 pixels.

To solve this, we use Nano Banana as an intelligent image editor. We provide it with three inputs:
- Our product photo.
- A blank template image with the correct 720x1280 dimensions.
- A text prompt instructing it to adapt the product photo into the aspect ratio of the template image without distorting or stretching any elements.
Setting up this API call can be tricky. For those who aren’t as technical, we have the full instructions in our Corporate Automation Library (CAL). It includes the n8n code and the exact steps to get this running. Click Here to gain access to CAL. We add 2–4 new high-ROI corporate automations weekly.
After Nano Banana smartly scales the image, we run it through a final resize node to be 100% certain the dimensions are perfect. This prevents errors and ensures the video generation process runs smoothly.
Generating and Finalizing the Videos
With the script and the first frame ready, we make the API call to Sora 2. The request is sent as form-data, not JSON, because we need to include the binary data of our reference image.

Video generation is a compute-heavy process and doesn’t happen instantly. The API returns a video ID, not the video file itself. We must continuously check the status of the job until it’s complete. This practice is called polling. Our workflow waits 15 seconds, checks the status, and if the video isn’t ready, it waits another 15 seconds and checks again.
Once the status changes to “completed,” we make a final API call to download the MP4 file. The last step in the loop is to upload the finished video to a designated Google Drive folder, ready for review. The entire process then repeats for the next script until all videos are generated.
Current Limitations and Considerations
As with any new technology, there are limitations to be aware of.
- No Human Faces: OpenAI currently disallows the use of human faces, even AI-generated ones that are not real people. This is a safety precaution. As a result, the videos must focus on the product, hand movements, and voiceover storytelling.
- Copyright: You cannot generate videos with real public figures, copyrighted characters, or copyrighted music. All requests of this nature will be rejected.
- Pricing: OpenAI offers two models. The standard Sora 2 model is priced at $0.10 per second of video. The higher-quality Sora 2 Pro model is $0.30 per second. For the 12-second ads in this workflow, the cost is $1.20 per video with the standard model.
Moving Beyond Level 1 Ad Creation
This system is more than just a tool for creating videos. It represents a shift from a chaotic, manual process to a systematic, automated engine for ad creation. It’s the difference between being stuck at Level 1 of AI maturity — dabbling with disconnected tools — and moving to Level 2 and beyond, where you build scalable systems that deliver real ROI.
By automating the entire creative pipeline, you can test dozens of ad variations, discover what resonates with your audience, and dramatically improve your return on ad spend.
If your ad creation process is stuck in Level 1, relying on slow, manual, and expensive methods, it’s time to build a system.
Ritesh Kanjee | Automations Architect & Founder Augmented AI (121K Subscribers | 58K LinkedIn Followers)

From 80-Hour Weeks to 4-Hour Workflows
Get my Corporate Automation Starter Pack and discover how I automated my way from burnout to freedom. Includes the AI maturity audit + ready-to-deploy n8n workflows that save hours every day.
We hate SPAM. We will never sell your information, for any reason.
