Getting Cited by ChatGPT and Perplexity: AI Search in 2026, Toni Barisic

If you are a business owner and your last serious thought about SEO was "we need to rank on Google," you are already a year behind. The top of the funnel for most businesses is shifting fast. ChatGPT search, Perplexity, Google's AI Overviews, and Bing Copilot are answering questions before users ever reach a list of blue links. When they cite a source in their answer, they send a small but very high-intent flow of visitors. When they do not, you are invisible.

This post is what I do for clients in 2026 to make sites citable by AI search systems, what works, what is theatre, and what the new file called llms.txt actually does.

The 2026 reality

Three things have happened simultaneously over the last 18 months:

ChatGPT shipped real-time web search to all paid users and rolled it out free for many queries.
Perplexity grew from a curiosity to a meaningful research tool, particularly for technical and B2B queries.
Google's AI Overviews now appear above the organic results on most informational searches, summarising answers and citing sources directly in the result page.

The combined effect is that for any informational query, "what is the best react framework for ecommerce," "how do I file fiskalizacija in Croatia," "is Squarespace good for booking", a meaningful percentage of users get an answer without clicking any individual site. The site that gets cited in that answer keeps a thin slice of the click-through. Every other site loses visibility entirely.

Traditional SEO is not dead. The first page of Google still drives most click-through traffic. But "be on the first page" is no longer enough. You also need to be the source the AI quotes.

The four surfaces that matter

Different AI search systems have different crawler behaviours, different ranking signals, and different result formats. The four to think about:

ChatGPT search. Crawled by OpenAI's OAI-SearchBot (the search-specific crawler) and indexed alongside Bing data. Shows three to five citations per answer.

Perplexity. Crawled by PerplexityBot. Heavy reliance on schema markup and clearly structured passages. Shows five to ten citations per answer, prominently above the synthesised text.

Google AI Overviews. Built on Gemini, drawing from Google's existing index. No new crawler, your normal Googlebot crawl is the input. The optimisation here is mostly traditional SEO with a stronger focus on passage-level clarity.

Bing Copilot. Crawled by Bingbot. Microsoft's surface for AI-powered search, increasingly relevant as ChatGPT runs on similar infrastructure.

Each surface has slightly different mechanics, but the underlying optimisation is more similar than different.

GEO versus traditional SEO

GEO, Generative Engine Optimization, is the umbrella term for tactics aimed at AI search systems specifically. Most of GEO is good SEO done thoughtfully, plus a few new things specific to how language models pick what to quote.

Where traditional SEO and GEO overlap:

Fast, accessible, mobile-friendly sites.
Clear semantic HTML and proper heading hierarchy.
High-quality content with named entities and dated facts.
Strong author and publisher signals (E-E-A-T).
Clean, valid schema markup.

Where GEO adds to traditional SEO:

Passage-level citability. AI systems pull individual paragraphs, not whole pages.
llms.txt and AI-specific robots directives.
Brand mentions across the open web (more on this below).
Author entity continuity, same person across the web with the same details.
Direct answers in the first 100 words of the page.

What `llms.txt` is, and what it actually does

llms.txt is a file at the root of your domain (tonibarisic.com/llms.txt) that gives AI systems a clean, plain-text summary of your site. It was proposed in late 2024 as a convention for making sites legible to language models without forcing them to scrape and re-render the full HTML.

What it contains, in practice:

A short intro about the site, its owner, and its scope.
A list of the most important pages with one-line summaries.
Optionally, a list of canonical URLs for blog posts or product pages.

The honest situation as of 2026: llms.txt is not yet officially honoured by any major AI provider as a ranking input. It is a hint, a courtesy, and a low-cost insurance policy. Some smaller crawlers and a number of independent research tools do read it. The cost to maintain one is essentially zero, it can be auto-generated from your sitemap. The upside if it becomes standard is meaningful. So I ship one on every site I build.

This site has one at /llms.txt, auto-regenerated from the blog index on every build, listing each post with its description and canonical URL. That is the pattern I recommend.

Robots directives for AI crawlers

This is one place GEO actually requires explicit decisions. By default, you want most AI crawlers reading your content. A few you might want to limit. The crawlers that matter, with my default stance for client sites:

GPTBot (OpenAI's training crawler), allow if you want your content used to train future ChatGPT models.
OAI-SearchBot (OpenAI's search crawler), almost always allow. This is what feeds ChatGPT search citations.
ClaudeBot (Anthropic), allow.
PerplexityBot, allow.
Google-Extended (Gemini training), allow if you want to be in Gemini's training set; does not affect normal Google search.
Bingbot, allow (this is just Bing).
CCBot (Common Crawl), allow unless you have specific reasons not to; many AI tools use Common Crawl data.

A typical robots.txt for a business that wants AI search visibility:

User-agent: *
Allow: /
Disallow: /admin/

User-agent: GPTBot
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

Sitemap: https://yourdomain.com/sitemap.xml

The mistake I keep seeing is sites that disallow all unknown crawlers and then forget the AI bots are unknown. They quietly disappear from AI citations and never figure out why.

Schema markup that AI systems actually use

AI systems lean heavily on structured data because it is the cleanest signal of what a page is and who wrote it. The schema types that move the needle:

BlogPosting or Article on every blog post, with headline, datePublished, dateModified, author (linked entity), wordCount, and articleSection.
Person for the author, with name, email, jobTitle, knowsAbout, and sameAs linking to LinkedIn, Twitter, GitHub. This is where E-E-A-T lives.
Organization or LocalBusiness for the business itself, including address, geo, openingHours, and sameAs for verified social profiles.
FAQPage where you have actual Q&A content. Do not fake it for SEO, Google has been demoting FAQ schema added to pages that are not really FAQ pages.
Product with aggregateRating and offers for e-commerce.

The crucial detail: every entity should be properly linked with @id so the schema graph connects. The BlogPosting references its Person author via @id; the Person references the Organization they work for. AI systems use these graphs to understand authority.

You can validate schema with Google's Rich Results Test and Schema.org's validator. I run both before every blog post launch.

Writing for citation

The most under-discussed GEO tactic is writing. Language models pick passages to quote based on a few patterns:

A single sentence that directly answers a question. "The EU Accessibility Act became enforceable on June 28, 2025." Models love clean factual statements with dates and named entities.
Numbered lists with concrete items. AI systems pull these whole into answers.
Comparative claims with both sides named. "Wix scores 30, 55 on Lighthouse mobile; a custom Next.js site scores 95+." Both subjects, the metric, the numbers.
Definition sentences. "Fiskalizacija is the Croatian system for real-time fiscal receipt verification, required on all online sales to Croatian buyers since 2013." A model can drop this whole into an answer with attribution.

The opposite, long meandering paragraphs of opinion without named entities, dates, or numbers, does not get cited. Neither does content that buries the answer 800 words in.

The pattern I follow: state the answer in the first 100 words of the page; expand and qualify in the body. Use specific numbers, dates, names. Avoid hedging language unless the hedge is the actual answer.

E-E-A-T for service businesses

E-E-A-T, Experience, Expertise, Authoritativeness, Trustworthiness, was always a Google concept and is now an AI search concept too. For a service business or freelancer, the practical translation:

Real author name on every piece of content. Not "Marketing Team," not "Editor."
Author bio with specific qualifications and a photo.
Linked author entity in schema, with sameAs connections to LinkedIn, GitHub, professional directories.
About page that names the actual people with their roles and locations.
Real address and contact info in LocalBusiness schema.

For a one-person shop like mine, this is straightforward. For a small agency, it requires a decision: pick the authors, name them, build their entities. AI systems are increasingly good at noticing when a "Persona Marketing" byline is the only thing tying a thousand articles together.

Brand mentions versus backlinks

Traditional SEO is built on backlinks, other sites linking to yours. AI search systems care about backlinks too, but they also care about unlinked brand mentions. If your business is mentioned in news articles, blog posts, podcast transcripts, and Reddit threads, even without a hyperlink, AI systems aggregate those signals as evidence of real-world relevance.

The practical implication: PR, podcast appearances, conference talks, Reddit answers, and unlinked partner mentions all matter. They did before but they matter more now. Encouraging customers and partners to mention your business by full name, even casually, builds the same kind of authority an AI system uses to decide whose answer to quote.

Monitoring AI citations

The visibility tooling for AI search is still primitive. The methods I use as of 2026:

Manual queries weekly. Run your top informational queries through ChatGPT search, Perplexity, and Google AI Overviews. Note which sources get cited.
Brand-mention alerts (Google Alerts, free) for your name and your business name to catch new mentions.
Server logs, filter for OAI-SearchBot, PerplexityBot, ClaudeBot, etc. Confirm they are crawling, count their hits.
Schema-aware monitoring tools like Schema App or paid AI-search monitors are emerging but not yet must-haves.

For most small businesses, weekly manual checks are enough. The signal you are looking for: a query in your area being answered with your site cited at least sometimes.

Where to start tomorrow

If you want to act on this post in the next week, here is the order:

Verify your robots.txt allows the AI crawlers above.
Generate or audit your llms.txt. Auto-generate it from your sitemap if possible.
Validate your BlogPosting and Person schema on every post.
Rewrite the first 100 words of your top three pages to directly answer the question that brings traffic to them.
Set up weekly manual citation checks on your top five queries.

That is one to two days of work. The compound effect over six months is meaningful.

Work with me

If you want a clean GEO and SEO baseline on your business site, I run audit-and-fix engagements specifically for this. Email info@tonibarisic.com or use the contact form. The complementary reads are my realistic SEO checklist, Why Every Business Website Needs Proper SEO From Day One, and What Makes a Website Fast, together they cover the full traditional + AI search foundation.