While digging into the Google Search Console data for
pigweed.dev this month I discovered a peculiar problem:
we have a bunch of pages that are labeled as
Discovered - currently not indexed. This post documents my findings on how to fix this problem.
Disclaimer / disclosure: Although I work for Google, you will never get any “insider knowledge about the internal workings of Google Search” from me. First, obviously I would not be able to share that kind of stuff if I knew it. Second, I don’t know any of that stuff. Your guess is as good as mine. I work outside of the Search org, and the Search org maintains a super strict firewall with Googlers outside their org.
OK, let’s dig into the problem. On the bottom of my Pages tab there’s a row in the Why pages aren’t indexed table saying that 56 pages are Discovered - currently not indexed.
Apparently I’ve also got a bunch of pages with misconfigured canonical tags, but that’s perhaps a post for another day.
If I click that Discovered - currently not indexed row it takes me to a breakdown of the problem:
The LEARN MORE link points to Discovered - currently not indexed which doesn’t say much:
The page was found by Google, but not crawled yet. Typically, Google wanted to crawl the URL but this was expected to overload the site; therefore Google rescheduled the crawl. This is why the last crawl date is empty on the report.
The plot thickens, as they say. (Why, by the way? Is it a soup metaphor?) It doesn’t sound like I’ve misconfigured anything. It sounds like I just need to get Google Search to re-crawl the pages.
(Back in October I added sphinx-sitemap to start auto-generating pigweed.dev’s sitemap. Maybe Google Search discovered a bunch of new URLs through the sitemap and decided that it would overload the site if it tried to index them all?)
How to fix this? Previously there was a VALIDATE FIX button. I clicked that on 20 Jan 2024 but given that it’s now 29 Jan 2024 and nothing has changed, I’ll have to assume that that button doesn’t actually do anything. What else can I do?
When I scroll down this page a bit I see a list of the offending URLs:
Clicking one of the table’s rows shows an INSPECT URL option:
On the URL’s details page I see a REQUEST INDEXING button:
After clicking REQUEST INDEXING a modal pops up and tells me it’ll take a minute or two to kick off the manual indexing request and then it confirms that the request has been added to the manual request queue.
It seems that there’s a quota of 12 requests per day. That’s fine for me; I only have 56 unindexed pages so I’ll be able to get them all done in 5 days. When I initially tested this approach last week, I had 60 unindexed pages, whereas the number is now down to 56, so it seems to be working.
I’ll update this post if I find better ways to fix this problem.
Ahrefs has a post on the topic. For pigweed.dev I suspect that the issue is related to how we’ve configured redirects.
Update 2 (30 Jan 2024)
I checked back on one of the pages that got manually indexed yesterday. The details page now confirms that Google Search could not crawl the page because of a redirect issue. The LEARN MORE link points me to Redirect error.