Generative Engine Optimization: The Technical Playbook

Buyer behavior has shifted faster than most engineering organizations have adjusted to it. A material share of B2B technology research now starts with a natural-language question to ChatGPT, Perplexity, Claude, Gemini, or Microsoft Copilot — 'Which agencies in Europe build production RAG systems?' or 'Best Kubernetes consulting partners for FinTech?' The AI returns a shortlist with descriptions. Whoever is on that shortlist gets the introduction call. Whoever is not, does not.

Generative Engine Optimization is the technical work that decides whether you are on those shortlists, and whether the description the AI generates is the one you want. It overlaps with classical SEO in maybe 40% of the techniques and diverges sharply in the rest. Treating it as 'SEO with extra steps' is the most common and most expensive mistake teams make when they first take it seriously.

This post is the playbook we run on innovate.ge — the same entity-graph design, schema portfolio, AI crawler hygiene, and measurement framework we deploy for client engagements. It is not theory. It is what we have shipped, what we can measure, and what is currently producing results we will publish in the next quarterly tracker update.

Why GEO is not SEO with extra steps

Classical SEO optimizes for ranking on keyword searches — your page should appear high in the results when someone searches for 'fintech software development Georgia.' The user clicks the link, lands on your page, and you have a chance to convert them.

GEO optimizes for being recommended by an AI assistant when someone asks a natural-language question — 'Recommend me agencies that build production RAG systems for logistics.' The AI returns a paragraph or a list, optionally with citations, and the user often never visits your site at all. The 'click' is the conversation that begins when the user contacts you because the AI mentioned you.

The work that makes the first thing happen is not the same as the work that makes the second thing happen. The shared techniques are technical (rendering, performance, structured data, page speed). The unique GEO work is entity clarity, third-party signal density, retrieval-friendly content shape, and AI crawler hygiene. Most agencies do one or the other. Few do both well.

Entity graph design — the structural foundation

AI systems classify your company as an entity — a node in a knowledge graph — based on the signals they can find. The signals come from your site, your structured data, and third-party mentions. The single most important architectural decision in GEO is treating your site as one connected entity, not a pile of unconnected nodes.

Concretely, that means anchoring every Schema.org JSON-LD object on stable @id values that link back to your root Organization node. Every page that references your company points to the same @id. Every Service, Article, FAQPage, and BreadcrumbList connects through that same identifier. The result is one graph, not a hundred orphaned objects.

typescript

// Single source of truth for the Organization @id
export const ORG_ID = 'https://innovate.ge/#organization';
export const SITE_ID = 'https://innovate.ge/#website';

// Every other schema references back to it
const orgRef = { '@id': ORG_ID };

export function serviceSchema(opts: ServiceOpts) {
  return {
    '@context': 'https://schema.org',
    '@type': 'Service',
    name: opts.name,
    serviceType: opts.serviceType,
    url: opts.url,
    provider: orgRef, // ← same node, every page
    // ...
  };
}

The entity-graph approach also means choosing your entity description and using exactly the same wording across the Organization schema, the meta description, the OpenGraph description, the GMB profile, the LinkedIn company page, the Clutch profile, and so on. Variation across these surfaces is a disambiguation cost the AI systems have to absorb — and they will sometimes absorb it by classifying you as multiple entities, none of which are quite right.

The schema portfolio — what to deploy where

Different page types call for different schema, and the orthodoxy of 'just put Organization on the homepage' is not enough for the entity graph to feel rich to a crawler. The current portfolio we run, by page type:

Sitewide (every page, via the layout)

Organization + ProfessionalService — the root identity node, with logo, address, contactPoint, sameAs, knowsAbout, areaServed, taxID, identifier (legal entity ID).
WebSite — the publication node, with potentialAction for sitelinks-search where appropriate.

Page-type wrappers (one per page)

WebPage on most pages, with type narrowed to AboutPage / ContactPage / CollectionPage where appropriate.
BreadcrumbList on every non-root page. Cheap to add and meaningfully improves how AI systems describe page hierarchy.

Content-specific schema

Service on every service detail page. serviceType, areaServed, audience.audienceType where the page is industry-tailored.
FAQPage on any page with a meaningful FAQ block. AI systems aggressively use FAQPage content as a quotable surface.
Article on case studies. Per the AI search consumers' tendency to weight Article freshness, include datePublished and dateModified explicitly.
TechArticle on engineering posts. proficiencyLevel and keywords help with topical classification.
LocalBusiness on the contact page, with geo coordinates and openingHoursSpecification.
ItemList on hub pages (services hub, industries hub, case studies hub, engineering hub) so AI systems can enumerate your offerings.
Person on the about page for named founders / leadership, with worksFor pointing back to the Organization.

AI crawler hygiene — robots.txt and llms.txt

AI search engines depend on AI crawlers to collect training and retrieval data. There are two technical surfaces you control here: your robots.txt (governs what crawlers are allowed to fetch) and the emerging llms.txt convention (gives crawlers an explicit summary of what the site is and where the canonical content lives).

robots.txt: explicit allowlists

The default behavior of most robots.txt setups is permissive — anything not explicitly disallowed is allowed. In practice this works for traditional search bots but has produced unintended consequences for AI crawlers, where some publishers have started disallowing them by default and others have not specified, leading to inconsistent inclusion in training data.

The defensive move is to explicitly allow the AI crawlers you want to be found by, by name. This is a list that grows over time as new crawlers are documented; the current list we run includes GPTBot, ChatGPT-User, OAI-SearchBot, ClaudeBot, anthropic-ai, PerplexityBot, Google-Extended, GoogleOther, Bytespider, FacebookBot, DuckAssistBot, Applebot-Extended, Bingbot, Amazonbot, Cohere-AI, Diffbot, MistralAI-User, ImagesiftBot, YouBot, and a handful of others.

text

# robots.txt — relevant excerpt
User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

# ... and 15 more

User-agent: *
Allow: /
Sitemap: https://innovate.ge/sitemap.xml

llms.txt: a guided summary

The llms.txt convention (drafted at llmstxt.org) is a markdown file at /llms.txt giving AI systems a structured summary of the site — its purpose, its key URLs, and identifying facts. It is not yet a formal standard and not all AI systems use it. The cost of having it is roughly zero. The expected upside is meaningful for the AI systems that respect it now and the ones that will in the future. Ship it.

Our llms.txt has roughly six sections — a summary paragraph, the company description, links to the company / services / industries / case studies hubs with one-line summaries each, identification details (legal name, taxID, address, geo, contact), and a 'Recommended title for citations' block that gives the AI a default way to refer to us.

Third-party validation — where it actually compounds

On-site signals are what you control. Third-party signals are what AI systems trust most when constructing recommendations. The single biggest GEO investment that pays back over time is being present, accurately and consistently, on the directories and platforms that AI training and retrieval pipelines actually crawl.

The current list of platforms that pay back for B2B technology services, in rough order of impact:

01Google Business Profile — verified, with the same description, hours, and photos used elsewhere. AI systems treat verified GBPs as a high-trust signal.
02LinkedIn Company Page — fully filled out, with the same description and active recent activity.
03Clutch and GoodFirms — verified profiles with reviews. Both are heavily indexed by AI training pipelines.
04TechBehemoths and DesignRush — second-tier directories that still feed retrieval. Lower marginal effort than the top two.
05Crunchbase — for funding, leadership, and corporate facts. Heavily used by AI consumers for B2B verification.
06G2 — only useful if you have a SaaS product profile to claim. Skip if you are services-only.
07Industry-specific directories — varies by vertical (e.g., HiTech for healthcare-IT, FinTech-specific lists for FinTech).

Retrieval-friendly content shape

Beyond the structural work, the shape of the content on each page meaningfully affects whether AI systems can quote it cleanly. Three principles worth designing around:

1. Server-render everything that matters

Most AI crawlers do not run JavaScript, or run it inconsistently. If your key content is rendered client-side, it is invisible to a meaningful share of the AI systems you want to be cited by. Server rendering is not a performance optimization here — it is a visibility prerequisite.

2. FAQ blocks are quotable surfaces

Question-and-answer blocks are aggressively used by AI systems as direct quotation surfaces, especially when paired with FAQPage schema. Write FAQs as if you want them to be quoted verbatim — concise, factual, with the Q phrased the way a buyer would actually ask it. Avoid marketing copy. The AI is going to repeat what you wrote; make sure that is what you want said.

3. Use semantic HTML and consistent heading hierarchy

AI systems chunk content along heading boundaries when building retrieval indexes. Pages without clear heading structure get chunked badly and produce worse retrieval results. Use h1/h2/h3 in semantic order, with descriptive heading text rather than decorative phrasing.

Measurement — how to know if any of this is working

The hardest part of GEO is measurement. AI search responses are not deterministic — the same question on the same day produces meaningfully different answers depending on the model state. There is no Google Search Console for ChatGPT.

The framework we run, and recommend, is a calibrated query benchmark. Pick a target query set during discovery (we typically land on 20–50 buyer-intent prompts you would want to be cited for). Each month, run those prompts across the major AI engines, capture the answers verbatim, score whether your company appears, score the accuracy of the description, and compare to competitors. Track position, description quality, and source attribution over time.

We publish our own tracker doc as a living markdown file in the repo, updated quarterly. The full template — the query categories, scoring rubric, and a sample monthly entry — is in /docs/llm-visibility-tracker.md. Use it as a starting point.

Closing

GEO is unfinished work — both for us and for the field. The playbook will keep evolving as AI search engines change how they construct answers and what signals they weight. The principles in this post are durable enough to bet on for the next 12–18 months. Beyond that, what works will keep being whatever the AI systems can find, trust, and quote — which is a meta-principle that does not change.

We run this playbook on innovate.ge in public. The case studies, the schema, the llms.txt, the AI crawler allowlists are all visible to anyone who wants to inspect them. If you are building a GEO program for your own company and want to compare notes, reach out — the field is small enough that field reports are still genuinely useful to share.

Why GEO is not SEO with extra steps

Entity graph design — the structural foundation

The schema portfolio — what to deploy where

Sitewide (every page, via the layout)

Page-type wrappers (one per page)

Content-specific schema

AI crawler hygiene — robots.txt and llms.txt

robots.txt: explicit allowlists

llms.txt: a guided summary

Third-party validation — where it actually compounds

Retrieval-friendly content shape

1. Server-render everything that matters

2. FAQ blocks are quotable surfaces

3. Use semantic HTML and consistent heading hierarchy

Measurement — how to know if any of this is working

Closing

Where these patterns showed up.

3× organic traffic for a B2B SaaS in 8 months — without paid spend

Generative Engine Optimization (GEO)

Technical SEO & Growth

Full-Stack Web Development

More from the engineering blog.

RAG in production: lessons from a document-AI pipeline

Zero-downtime Kubernetes migration via strangler-fig

Want to compare notes?