Crawl budget is one of the most misunderstood ideas in technical SEO .

You will often hear it discussed as a critical ranking factor. In reality, most websites never need to think about it. Google is very good at crawling normal-sized sites without any help.

The problem appears when a site grows. For example, large ecommerce websites generate thousands, sometimes millions, of URLs. Product pages, category pages, filters, pagination, internal search pages and parameter variations all create new crawl paths.

When crawling is inefficient, search engines spend time on the wrong URLs. Important product and category pages may get crawled late, new products can take longer to appear in search, and stock changes or price updates may take longer to be reflected in the SERPs.

This article explains what crawl budget actually means, when it genuinely matters and how to optimise it on large ecommerce websites.

What is “crawling” in SEO?
What crawl budget actually means
Why search engines limit crawling
How crawl budget problems show up
How ecommerce sites waste crawl budget
How to optimise crawl budget

What is “crawling” in SEO?

Crawling is how search engines discover your website. Search engines send automated bots, often called crawlers or spiders, to scan your pages. These bots start with a list of known web addresses and follow links across your site to find additional pages. As they move through the site, they record what each page contains and how the pages connect.

This process allows search engines to understand your site structure and the content on each page. Once a page is discovered through crawling, it can then be indexed and considered for ranking in search results.

If search engines cannot crawl your site, they cannot see your pages. In practical terms, that means the content will not appear in search results, which limits your ability to attract organic traffic and revenue.

What crawl budget actually means

Crawl budget is simply the number of URLs a search engine will crawl on your site within a given period of time. Google describes it as the number of URLs it can crawl and wants to crawl.

In practical terms, it is the amount of attention Googlebot gives your site. Every crawl request costs time and resources, both for Google and for your servers. Because the web is effectively infinite, search engines have to prioritise where they spend that time.

Two factors determine your crawl budget:

Crawl capacity: This is how much crawling your website can technically handle. If your server is fast and stable, Google can crawl more URLs. If your site responds slowly or returns errors, Google reduces its crawl rate.

Crawl demand: This is how much Google wants to crawl your pages. Pages that are popular, frequently updated, or important within your site tend to be crawled more often.

It is also important to understand that crawling is not the same as indexing. Google may crawl a URL, analyse it, and then decide not to include it in the search index. This happens often on ecommerce sites where filters, duplicate URLs or low value pages create large volumes of similar content.

Finally, crawl budget is not just about HTML pages. Search engines also crawl other files, including JavaScript, CSS and PDFs. Every request counts towards the overall crawl activity on your site.

Why search engines limit crawling

Search engines cannot crawl the entire web continuously. The number of URLs online is effectively unlimited, and new ones appear every second. Due to this, search engines have to prioritise where they spend their crawling resources.

There is also a second constraint. Crawlers must avoid overwhelming websites with requests. If Googlebot aggressively crawled every page it discovered, many sites would struggle to handle the traffic.

Crawl limits exist to balance three things:

The search engine’s own computing resources
The capacity of the website’s server
The importance and freshness of the content

In simple terms, Google tries to crawl as much useful content as possible without causing problems for the site.

This is why server performance plays a role. If your site responds quickly and consistently, Google can crawl more URLs safely. If the server slows down, returns errors, or times out, the crawler backs off.

But server health is only one side of the equation. Search engines also make decisions about which pages deserve attention. Pages that are popular, frequently updated, or heavily linked tend to be crawled more often. Pages that rarely change or provide little value may be crawled less frequently.

This is where large ecommerce sites can run into problems. When a site generates huge numbers of low value URLs, crawlers spend time exploring those instead of the pages that actually matter. Over time, that reduces the efficiency of the whole crawling process. The goal of crawl budget optimisation, for ecommerce SEO in particular, is to make it easier for search engines to focus on the pages that deserve attention.

How crawl budget problems show up in real projects

Crawl budget issues rarely appear as a single obvious error. Instead, they show up as patterns. Pages take longer to be discovered, indexing slows down, and search engines spend time crawling the wrong parts of the site.

You will usually see the first signals in Google Search Console, server logs or indexing behaviour. Common signs include:

Pages stuck in “Discovered, currently not indexed”. Google knows the URL exists but has not crawled it yet. On large ecommerce sites this often happens when the crawler’s attention is spread across too many URLs.

New pages taking a long time to be indexed. New product or category pages should normally be crawled quickly. If indexing consistently takes weeks instead of days, crawlers may be busy exploring less important URLs.

Slow reflection of updates in search results. Changes to stock status, pricing or content take longer to appear in search. This usually means important pages are not being revisited frequently enough.

Crawlers focusing on the wrong parts of the site. Server logs often show Googlebot repeatedly hitting parameter URLs, filter combinations or internal search pages rather than core product or category pages.

Large numbers of discovered URLs compared to indexed pages. When the site generates huge numbers of crawlable URLs, search engines struggle to prioritise which ones deserve indexing.

None of these signals prove a crawl budget problem on their own. However, when several appear together, they usually point to the same underlying issue: search engines are spending time crawling URLs that should never have existed in the first place.

The biggest ways ecommerce sites waste crawl budget

Large ecommerce sites rarely run into crawl problems because Google cannot crawl them. They run into problems because the site generates far more URLs than it actually needs.

Many of these URLs exist for technical or navigational reasons, but from a crawler’s perspective they still look like pages worth exploring. Over time, this spreads crawl activity across large areas of the site that add little or no search value.

Faceted navigation and parameter URLs

Filters are one of the biggest drivers of URL growth on ecommerce sites. Filtering by size, colour, brand, price or sort order can create thousands of combinations. Each combination often generates a new crawlable URL. For example:

/shoes?colour=black
/shoes?colour=black&size=10
/shoes?colour=black&size=10&sort=price-desc

These pages usually show very similar products and rarely provide unique search value. However, crawlers will still attempt to explore them.

Duplicate and near-duplicate pages

Ecommerce platforms frequently generate multiple URLs that show the same or very similar content. Some common causes include:

URL parameters
Different sorting options
HTTP and HTTPS versions
Trailing slash variations
Category paths to the same product

When these variations exist, crawlers treat them as separate URLs. This spreads crawl attention across multiple versions of the same page.

Internal search result pages

Internal search pages often generate large numbers of URLs. For example:

/search?q=running+shoes
/search?q=red+shoes
/search?q=waterproof+jacket

These pages rarely offer unique value in organic search. However, if they are crawlable, search engines will still attempt to explore them. On large websites, this can create an almost unlimited number of crawl paths.

Redirect chains and broken links

Redirects are sometimes necessary, especially after product migrations or site changes. Problems appear when redirects stack on top of each other or when internal links point to outdated URLs.

Each redirect adds extra crawl requests. Broken links create dead ends that crawlers still attempt to fetch. Individually these issues are small. At scale they waste significant crawl activity.

Dead product pages and soft 404s

Ecommerce catalogues change constantly. Products go out of stock or are permanently removed.

If those pages remain live with thin content or empty listings, search engines may continue to crawl them. These are often called soft 404 pages because they behave like missing pages but still return a normal page response.

Returning the correct status codes for discontinued products helps prevent this ongoing crawl waste.

Weak internal linking

Internal links guide crawlers through your site. When important pages have few internal links, crawlers may discover them slowly or revisit them less frequently. At the same time, heavily linked filter or navigation pages may receive disproportionate attention. A clear and consistent internal linking structure helps crawlers focus on the pages that actually matter.

Most crawl budget problems on ecommerce sites come down to one underlying issue: the site simply creates too many URLs. The next step is to reduce unnecessary URLs and guide crawlers towards the pages that generate traffic and revenue.

How to optimise crawl budget for large ecommerce sites

In practice, optimising crawl budget means controlling which URLs exist, which ones can be crawled, and how crawlers move through the site. The most effective improvements usually fall into a few areas.

Control faceted navigation

Filters often generate the largest number of unnecessary URLs on ecommerce sites.

Not every filter combination deserves to be crawlable or indexed. In most cases, these pages add little search value and simply multiply the number of URLs crawlers attempt to explore. Common approaches include:

Preventing certain parameter URLs from being crawled using robots.txt
Limiting which filter combinations generate crawlable pages
Ensuring internal links do not expose unnecessary filter URLs

Clean up duplicate URLs

Many ecommerce platforms generate multiple URLs that show the same page. Reducing these duplicates helps concentrate crawl activity on the canonical version of each page. Typical fixes include:

Redirecting duplicate URL versions
Standardising URL formats
Using canonical tags correctly
Ensuring internal links always point to the preferred URL

This reduces crawl fragmentation and helps search engines prioritise the correct pages.

Keep XML sitemaps accurate

XML sitemaps guide crawlers towards the URLs that actually matter. They should only contain pages that are:

Indexable
Canonical
Returning a valid page response

Large ecommerce sites often benefit from splitting sitemaps by page type, such as:

Product pages
Category pages
Brand pages

This makes crawl patterns easier to monitor and helps search engines prioritise key content.

Fix crawl waste from errors and redirects

Broken links and unnecessary redirects create extra crawl requests. Over time these small inefficiencies add up, particularly on large catalogues. Priority fixes include:

Updating internal links to point directly to final URLs
Removing links to discontinued pages
Avoiding long redirect chains

These changes reduce wasted crawl activity and improve overall site performance.

Return the correct status codes

When products are permanently removed, the page should return the correct HTTP status code. Returning 404 (Not Found) or 410 (Gone) signals to search engines that the URL should eventually be dropped from crawling.

Leaving these pages live with thin or empty content causes crawlers to keep revisiting them unnecessarily.

Improve internal linking

Internal links help search engines understand which pages matter most. Important product and category pages should be easy to reach through clear navigation and contextual links.

Strong internal linking helps crawlers:

Discover important pages faster
Revisit key pages more frequently
Spend less time exploring low value URLs

Maintain good site performance

Server performance still plays a role in crawl efficiency. Fast response times allow crawlers to fetch more pages in the same amount of time. Slow pages or frequent server errors reduce crawl activity. In short, page speed optimisation is one action that helps ensure technical limits do not restrict crawling.

Crawl budget only becomes a real concern when a site generates more URLs than search engines can crawl efficiently. This is common on large ecommerce sites where filters, parameters and duplicate paths multiply the number of crawlable pages. The goal is not to increase crawling. It is to reduce unnecessary URLs so search engines focus on the pages that actually drive traffic and revenue.

What is Crawl Budget and Does it Even Matter?

Table of contents