How I Scraped 50,000 Leads for just $50

Aug 29, 2025

Oh my god, I just scraped over 50,000 leads in under a minute. And the cost? Less than $1.20 per thousand leads. That’s insane. So how did I pull this off? Let me walk you through the system I built — no fluff, just the real technical stuff.

First, you need to know who you’re targeting. In my last article, I talked about lead research and figuring out your ideal customer profile (ICP). But even AI can mess this up. My AI initially pulled automation companies instead of companies that need automation. Totally different audiences. So I had to refine the inputs.

Comment AUTOMATION to get the link to download this full n8n workflow.

 

Here’s what I settled on:

  • Keywords like digital marketing agencies, SaaS startups, consulting coaches, e-learning, info products, e-commerce, and automation. (I actually trimmed off some of the more niche terms like Zapier and lead gen content automations because they weren’t relevant.)
  • Locations set to United States, United Kingdom, Canada, Australia, and South Africa.
  • Company size filtered by employee count ranges.

These inputs form the backbone of the system, defining the scope of your lead search.

So where did I scrape leads from?

Now, Apollo.io — the B2B database via Apify I’m scraping — doesn’t just accept random queries. It uses search URLs with very specific parameters. So I built a GPT agent whose job is to take those structured inputs and convert them into a precise Apollo search URL via Apify’s scraper. No guesswork, no adding or inferring extra parameters — just a clean key-value mapping.

The task is simple but critical: take key-value data from the form submission and turn it into a precise Apollo search URL. Supported fields include person location (mapped to Apollo’s person location parameters), company size (mapped to organization range), and keywords. The system prompt enforces strict rules: no adding or inferring any other fields or parameters.

The URL template looks something like this:

https://app.apollo.io/#/search/results/person?field1=value1&field2=value2...

This URL then feeds into the next node, which is an HTTP node that calls Apify’s Apollo scraper API.

Here’s the thing: Apify access is limited to paid plans — free users can’t use it. I’m on the $39/month plan. The minimum batch size you can request is 500 leads, so I set the system to pull 500 leads per search. If I can convert just two sales, I break even on the subscription cost. Three sales and I’m making a profit. On a monthly basis, this makes a lot of sense. And scraping 50,000 leads? Even a small conversion percentage could mean serious revenue.

The API call includes flags like:

  • get_personal_emails=true
  • get_work_emails=true
  • total_records=500

The URL generated by the GPT agent plugs into this API call, and Apify returns a JSON payload with lead details.

But raw JSON isn’t sales-ready. So I built another GPT agent — a data processing engine — that extracts key fields like full name, email, LinkedIn URL, job title, company name, and website. It then synthesizes a concise 2–3 sentence professional summary tailored for my sales team. I also add context about my company, Augmented AI, and our product — the Corporate Automation Library.

How to capture the processed lead data

All this processed data flows into a Supabase database I set up. Why Supabase? It’s developer-friendly, easy to integrate, and lets me view raw data alongside AI-generated summaries. It becomes the single source of truth for all lead data. From there, I can build dashboards, trigger outreach automations, or export to Google Sheets if needed.

For testing, I limit output to the top 10 leads to save costs and speed up iterations. I’ve tried different GPT models: OpenAI’s GPT-4.1 mini is a bit flaky, while Gemini Flash is cheaper and more resilient — my go-to for now. The model choice affects cost, speed, and output quality, so testing what fits your workflow is key.

How I shall level up this automation

Right now, this is a Level 1 automation — it scrapes and structures leads based on a single ICP. The plan is to evolve it into a Level 2 system that runs multiple ICP variations automatically, tests which segments yield the best ROI, integrates with outreach tools for personalized campaigns, and uses AI to prioritize leads based on engagement signals.

This iterative approach will turn lead research from a manual chore into a smart, revenue-generating engine.

Along the way, I’ve created a new Supabase table called “lead_scraping,” which acts as the central database for all collected data. I can view the raw tables and AI summaries side by side, making it easy to manage and optimize.

Eventually, I want to automate lead research fully — building a system that cycles through hundreds of search terms and ICP variations, returning thousands of leads and identifying which profiles convert best. These experiments will shape the future of my automation workflows.

If you want to follow this journey and get access to the exact n8n workflows, system prompts, tools, and prompts I use then…

Want the Exact Workflow?

We’ve documented everything inside the Corporate Automation Library Pro — our private vault of tested automations.

Inside, you’ll find:

  • The full “One-Click Newsletter” workflow
  • 2–3 new systems added weekly (content, lead gen, sales, retention)
  • Frameworks proven in real businesses


šŸ‘‰ Explore Corporate Automation Library Pro

From 80-Hour Weeks to 4-Hour Workflows

Get my Corporate Automation Starter Pack and discover how I automated my way from burnout to freedom. Includes the AI maturity audit + ready-to-deploy n8n workflows that save hours every day.

We hate SPAM. We will never sell your information, for any reason.