GMC Error "image_link_broken": Why Google Can't Fetch Your Images and How to Fix It

You open Merchant Center Diagnostics and a wall of red is staring back at you:

Issue: Invalid image [image link]
Attribute: image link [image_link]

In the Content API view, the same problem surfaces under a different name:

itemLevelIssues:
  code: "image_link_broken"
  servability: "disapproved"
  detail: "Ensure the image is accessible and uses an accepted image format (JPEG, PNG, GIF)."

The URL works fine in your browser. It works fine for your customers. You can right-click and "Open image in new tab" on every single one. So why is Google's image fetcher choking?

image_link_broken is not the same as image_link_internal_error (covered in this post), even though the two get conflated constantly. image_link_internal_error means Google started the fetch and something went sideways inside their pipeline. image_link_broken is more definitive: Google made the HTTP request and the response was not a usable image. It's the harder of the two to bluff your way past, because Google is telling you exactly what it saw, and what it saw was wrong.

This post covers what Google checks when fetching image_link, the five most common reasons the fetch fails, and how to verify each fix before resubmitting.

TL;DR

image_link_broken means Google made the HTTP request to your image_link URL and the response was either an error code (404 / 403 / 5xx), HTML, or a body that did not decode as an image.
The most common cause is a CDN or WAF that blocks the Googlebot-Image/1.0 user agent. Browsers work; bots don't.
Second most common: image URLs that return HTTP 200 but serve an HTML error page when the underlying object is missing, so the Content-Type is text/html instead of image/*.
Third: expired signed URLs (Cloudinary, S3 presigned, imgix) that were valid when the feed was generated and dead by the time Google's fetch queue got to them.
Verify fixes with curl -A "Googlebot-Image/1.0", not from a browser. Don't trust browser results.

What `image_link_broken` actually means

Google's product data specification requires every product to include exactly one image_link URL pointing at a usable image. When Google's pipeline reads your feed, it queues the URL for fetching. The fetch worker downloads the body, inspects the Content-Type header, attempts to decode the bytes, runs Google's content-safety scan, and stores the result.

If any step before "stores the result" fails in a way that points clearly at the URL itself — a 404, a 403, a body that doesn't decode as an image — the issue is logged as image_link_broken and the product is disapproved.

Google's Content API severity mapping doc lists image_link_broken as an itemLevelIssue with servability disapproved. That means: not throttled, not pending, fully disapproved until Google re-fetches and the new response is valid. Resubmitting the same broken URL does nothing. You have to change something about the response Google receives.

The verbatim message a merchant sees varies slightly by Diagnostics UI version. The most common form in the new Merchant Center is "Invalid image [image link]" with the explanatory line "Ensure the image is accessible and uses an accepted image format." In the legacy Diagnostics UI you'll see "Image link [image_link]: There was a problem when fetching this image."

Why Google flags it

image_link_broken covers a specific failure mode: Google made the request and got back something that wasn't a downloadable image. Five categories of cause account for nearly all of them, ranked by frequency.

The CDN, WAF, or origin server blocks Google's user agent. Most common, especially for sites running Cloudflare with default bot-protection rules.
The URL returns HTTP 200 but the body is HTML, not an image. Common when the object behind a CDN is missing and the CDN substitutes a custom error page that still returns 200.
The image is reachable but the format isn't supported. HEIC from iPhone uploads, animated WebP, or a TIFF in a category that requires JPEG.
A signed or expiring URL is dead by the time Google fetches it. Especially Cloudinary signed URLs with short TTLs, S3 presigned URLs, and imgix transformation URLs on free tiers.
The URL is genuinely 404. A product was deleted from your CMS, the image was moved, or the path is misspelled in the feed.

I'll walk each one in order.

Fixes ranked by frequency

Fix 1: Verify Google's image fetcher is allowed past your CDN

Open a terminal and run the exact request Google's fetcher runs:

curl -I -A "Googlebot-Image/1.0" -L "https://cdn.example.com/products/sku-123.jpg"

What you want back: a single HTTP/2 200, a Content-Type: image/jpeg (or image/png, image/webp, image/gif), and a Content-Length greater than a few KB.

What you often see when this fix applies:

HTTP/2 403 Forbidden — your WAF blocked the request based on user agent
HTTP/2 429 Too Many Requests — your CDN is rate-limiting bots
HTTP/2 200 followed by a body that's an HTML challenge page (Cloudflare's "Just a moment...")

The fix depends on your CDN.

Cloudflare: under Security > Bots, set "Definitely automated" to "Allow" or carve out an exception for verified Google crawlers. Cloudflare's "Verified Bots" list is on by default but doesn't always include Googlebot-Image unless you explicitly enable it.

AWS CloudFront + WAF: check your AWS-AWSManagedRulesBotControlRuleSet configuration. The "TargetedBotControlRuleSet" can block image bots even if you've allowlisted the main Googlebot.

Custom Nginx / Apache: check your user_agent blocking rules. A common pattern is to block anything containing "bot" or "crawl" — which catches Googlebot-Image along with the scrapers you were trying to block.

Shopify private CDN apps: some hotlink-protection apps shipped via the Shopify App Store default to blocking non-browser user agents. Check the app settings and either allowlist Googlebot-Image/1.0 or disable the protection for the /cdn/ paths.

Google publishes the verifiable IP ranges for its crawlers. If user-agent allowlisting feels risky, allowlist by IP CIDR instead.

Fix 2: Confirm the response body is actually an image

Some configurations return HTTP 200 with an HTML error page in the body when the underlying object is missing. From a browser, you see a broken-image icon. From Google's fetcher, the response is a 200 OK with Content-Type: text/html and a body that doesn't decode as an image. Google's pipeline logs it as image_link_broken.

Check it:

curl -sI -A "Googlebot-Image/1.0" "https://cdn.example.com/products/sku-might-be-missing.jpg" \
  | grep -iE "^(HTTP/|content-type:|content-length:)"

If Content-Type is text/html, the URL is broken regardless of the status code. Common culprits:

An S3 bucket configured to serve index.html from the root on 404, instead of returning a real 404
A CloudFront origin response policy that swallows 404s and substitutes a custom page
An Nginx error_page 404 /not-found.html; directive that serves HTML on missing objects
A Shopify store with a custom 404 theme template that returns 200 on image paths

The fix is on the origin or the CDN: configure 404s to return actual 404 status codes with the correct content type, not HTML masquerading as a success. While you're cleaning up the infrastructure, scan your feed for these URLs by checking Content-Type on every image link in a script.

Fix 3: Stop submitting unsupported image formats

Google's image requirements accept JPEG, PNG, non-animated GIF, BMP, TIFF, and WebP. Anything else fails the decode step and surfaces as image_link_broken.

The format issues I see most often:

HEIC — iPhones save photos as HEIC by default. If a merchant uploads phone photos directly to their store, the CDN may serve them as HEIC even though the file extension is .jpg. Google's fetcher reads the body, not the extension.
Animated WebP or animated GIF — Google accepts static images. Animated frames either fail entirely or are clipped to the first frame. For apparel and electronics, the clipped result often fails the dimension or aspect check.
TIFF in unsupported color profiles — CMYK TIFFs and 16-bit-per-channel TIFFs decode in Photoshop but not in Google's fetcher.
SVG — not supported, period. Vector logos can't be product images.

Check what's actually being served:

curl -sIL -A "Googlebot-Image/1.0" "https://your-image-url" \
  | awk '/Content-Type:|HTTP\//' && \
curl -sL -A "Googlebot-Image/1.0" "https://your-image-url" | file -

The file - command reads the first bytes of the body and tells you the real format, regardless of what the headers or filename claim.

Fix 4: Replace expiring signed URLs with stable canonical URLs

Google fetches images on a delayed queue. Your signed URL might be valid when the feed is generated and expired six hours later when the fetch actually happens.

The most common offenders:

Cloudinary signed URLs with short ?expires= parameters. If the TTL is under 24 hours and your feed regenerates every six hours, every fetch is a race.
S3 presigned URLs typically default to 15-minute TTL. Useful for human-facing downloads, useless for image feeds.
imgix and other on-the-fly transformation services on free or low tiers can rate-limit per IP, and when limits hit, the response is a 403 instead of the image.

The fix: use stable, unsigned, long-lived URLs in the feed. If the platform doesn't support truly stable URLs, push the TTL to 7 days or longer and regenerate the feed often enough that no URL is ever close to expiry at fetch time.

For Shopify merchants: use the canonical https://cdn.shopify.com/s/files/... URL, not a custom-domain CDN that proxies through your store. Shopify's image CDN is well-behaved with Googlebot-Image and doesn't sign or expire its URLs.

Fix 5: Find and remove the genuinely 404'd URLs

Last on the list because it's the easiest to verify, but worth a check. Run a HEAD request across every image URL in your feed:

# Pull image_link column, send 10 parallel HEAD requests
xsv select image_link feed.csv | tail -n +2 | \
  xargs -I{} -P 10 curl -sI -A "Googlebot-Image/1.0" -o /dev/null -w "%{http_code} {}\n" {}

Any line starting with 404, 403, or 5xx is a confirmed broken URL. Patch the feed source so those URLs are removed, replaced, or never generated. Common origins of genuine 404s:

A product was deleted from the store but the feed cache still references the old image
A bulk image rename script changed the path but missed the feed-generation step
The image-hosting CDN was migrated and the old URLs were never updated

How to verify the fix

After making changes, validate before resubmitting.

1. Test the URL the way Google sees it. Forget your browser. Use curl -A "Googlebot-Image/1.0" from a machine outside your network. If you have the budget, run the check from a US-based VPS — some merchants accidentally geo-block US IPs.

2. Use Merchant Center's "Test image" feature. In the product detail view, the "Test image" button runs the same fetch Google's main pipeline runs and surfaces the underlying error code, not just the cleaned-up image_link_broken label. The detailed message will say something like "fetch failed: 403" or "decoded format unsupported" — much more actionable than the top-line diagnostic.

3. Force a re-fetch by changing the URL. Pushing the same feed with no changes will not trigger a re-fetch on Google's side. Either change the image_link value (even by adding ?v=2) or wait for the automatic recrawl, which runs every 1 to 7 days depending on your account's history.

4. Watch the disapproval count trend. After your fix, monitor the image_link_broken count in Diagnostics over the next 3 to 5 days. A real fix shows a clear downward slope. If the count is flat or goes up, the root cause is different from what you patched.

Re-review typically takes 24 to 72 hours after the next successful fetch. Don't repeatedly resubmit; each submission resets a soft clock on every product in the feed, not just the broken ones.

What not to do

Don't add cache-busting query parameters to every image URL. It triggers a re-fetch, sure, but it also marks every product in the feed as updated, which delays re-review on items that were already healthy.
Don't migrate image hosts on a hunch. Diagnose the actual response Google sees before moving infrastructure. Migrating from Shopify CDN to a custom S3 setup mid-incident usually makes things worse.
Don't only test from your office network. Some CDNs and WAFs whitelist office IPs and block everything else. The browser test passes; Google's fetch fails. Always test from a clean network or a cloud VPS.
Don't assume "it worked yesterday" means the URL is permanently stable. Signed-URL expirations, IP-based throttling, and bot rules can all flip the state of a URL without any code change on your side.

How SnowPipe handles this

SnowPipe syncs Shopify, WooCommerce, and BigCommerce catalogs to Google Merchant Center via the Merchant API v1. Two design choices in SnowPipe directly address the most common image_link_broken triggers:

Pre-flight Googlebot-Image fetch checks. Before any product reaches the Merchant API, SnowPipe issues a HEAD request against every image_link URL using the Googlebot-Image/1.0 user agent from outside the merchant's network. URLs that return 403, 404, non-image content types, or HTML bodies are flagged in the Products tab before submission. The bot-blocking and HTML-error-page cases get caught before they burn a Merchant Center review cycle.

Stable URL preference for Shopify sources. When syncing from Shopify, SnowPipe defaults to the canonical https://cdn.shopify.com/s/files/... URL, not the custom-domain CDN URL. This trades some transformation flexibility for predictable Googlebot fetch success and avoids the signed-URL expiration trap entirely.

You can see the per-product image validation result in the Products tab inside any GMC connection, with deep links back to the source store admin if the fix needs to happen at the Shopify or WooCommerce level.

Summary

image_link_broken is rarely Google's fault. The HTTP response Google receives is genuinely not a usable image — bot blocked, body is HTML, format is wrong, signed URL has expired, or the file is gone. Browser tests will lie to you, because browsers send different headers, get treated differently by CDNs, and silently follow redirects. Verify with curl -A "Googlebot-Image/1.0" from outside your network, fix the actual response Google is seeing, then wait 24 to 72 hours for re-review. Anything that loops you back into "resubmit and hope" is wasted time.

Tired of fighting Google Merchant Center image errors one product at a time?

Try SnowPipe free — connect your store and get pre-flight image validation plus accurate Google/Facebook syncs in minutes. Or, book a 15-min demo and I'll walk you through your specific setup.