Length:
7 min
Published:
April 22, 2026

Most AI advice in 2026 is still written from the outside looking in — conference talks, Twitter threads, vendor blog posts. Useful, but not the same as operating an AI product with paying customers, a live Meta Ads account, and a cost dashboard you're afraid to open on Mondays.
We've been running Plantory.ai — DX Heroes' in-house AI-native SaaS — long enough that the demos stop mattering and the production truths start showing. Here are seven.
Full context: the Plantory case study, the founder story, and the architectural playbook.
Every "wow, the model is so smart" moment we've had in Plantory traces back to better grounding, not better prompts.
The Gemini garden advisor doesn't feel useful because of prompt engineering. It feels useful because every call ships with the garden's canvas state, climate zone, soil type, sun exposure, and plant inventory. Strip that context away, and the same model gives generic forum-grade answers.
The practical rule: before you tune a prompt, audit what structured context the model is receiving. Nine times out of ten that's where the leverage is.
We deploy programmatically to Meta Ads and Google Ads. AI writes the creative, generates variants, sets targeting. That's a real productivity unlock.
What's not autonomous is the budget decision. Scaling a flight, reallocating spend across campaigns, killing a dog — those still get a human in the loop. Not because the AI can't technically do them, but because the cost of being wrong is high and the cost of human review is low.
Rule of thumb: automate the production; keep humans on the capital allocation.
It's tempting to route everything through the biggest, smartest model. It's also the fastest way to blow up your unit economics.
In Plantory:
The pattern: start cheap, escalate only when the output quality demands it. Review the routing quarterly.
For the first months, we leaned on "does this feel right?" to evaluate changes. That worked until a model update silently shifted behavior, we didn't notice for a week, and users noticed first.
Now every AI endpoint has a tiny eval set — 20 to 50 inputs with expected shapes of outputs. They run on every deploy. They're not fancy. They catch the dumb stuff fast.
Small evals on day one beat a perfect eval system that you build on day ninety.
Our Satori + Resvg + Gemini pipeline produces every social post image, every SEO hero, every article cover across eight locales. At the start it felt like a nice-to-have. Now it's the difference between shipping one marketing asset per locale per week and shipping dozens.
But here's the honest part: automation exposes workflow debt. Once you can generate assets cheaply, you immediately need a content calendar, a review gate, and a publishing pipeline — otherwise you produce volume and no coherence.
Build the human workflow before you ramp the generation. Not after.
The single biggest productivity unlock on the build side wasn't a clever prompt — it was building our own Claude Code plugin marketplace.
The plantory plugin ships 20+ skills: /plantory:spec-plan, /plantory:board-work, /plantory:blog-article, /plantory:paid-performance-review, /plantory:social-media-posting, and more. Each packages a workflow we used to do ad-hoc.
Why does this work? Because prompts are volatile — small wording changes produce different outputs, people forget the shape, new team members can't reproduce what the old team did. Skills make the workflow reviewable, versioned, and shareable. It's the same reason we write functions instead of pasting the same code in five places.
If you're running AI coding at scale, stop polishing prompts. Start shipping skills.
The hardest and most boring lesson.
Internal demos, test accounts, friendly beta users — they all let you tell yourself the product works. Real Stripe customers across eight countries in different languages with different expectations and different devices do not let you do that.
Every hard truth on this list came from production contact with paying users. The ad pipeline worked beautifully until it didn't. The advisor was "great" until a German user asked about a plant we hadn't localized the recommendations for. The eval set looked good until a silent model drift started degrading plant ID.
The only AI system that's real is one with paying users on it. Everything else is a rehearsal.
Plantory.ai isn't a client project. It's our own AI testbed, live in production, absorbing hits so we can hand clients a playbook that's been battle-tested rather than theorized.
If you want the story of why we built it: Why We Built Plantory. If you want the architecture: The AI-Native SaaS Playbook. If you want the polished case study: Plantory.ai — the case study.
If you want a team that's done this and wants to help you do it: talk to us.
Don't miss our best insights. No spam, just practical analyses, invitations to exclusive events, and podcast summaries delivered straight to your inbox.