Log #004: The Easter Egg Conundrum: tackling vague SEO problems without clear solutions

29 maart 2024

Hey SEO pros,

It's been a while since my last update. Took a break since January, got a new gig as an SEO specialist, whoohoo! Now I'm ready to dive back into the SEO world!

So, to recap, what was happening last time with Golden SEO Protocol. I distinctly remember quite a few archive pages having the status 'Discovered - currently not indexed.' As it turns out, this is still the case months later and remains unchanged:

Have you ever been inspired by a doormat?

In my previous log, I added an HTML sitemap to my website and, yes, I introduced a doormat feature (indeed, I was inspired by a doormat...)

Behold...

The element you see above the footer is also known as a doormat. Essentially, it's an extra footer (or menu, whatever you want to call it) to facilitate access to important pages. I got this idea from bol.com, and upon closer inspection, I discovered that it was called a doormat:

Funny, just adding a few links, slapping some CSS on it, and voilà, my own responsive menu!

But aside from that, I've found out, thanks to the SEO Community on Mastodon, that accessibility of the pages isn't the issue. Firstly, with the status 'discovered - currently not indexed,' accessibility isn't the problem (since the pages are discovered) but rather that they are not being crawled and indexed. Additionally, these pages were found through the XML sitemap, more on this in my previous log.

What means 'Discovered - currently not indexed'?

'Discovered - currently not indexed' means that Google has found the page, but it has not been crawled yet. Typically, this status indicates that Google intended to crawl the URL, but doing so was expected to overload the site; therefore, Google rescheduled the crawl.

Found by Google, the page was, yes;

But problems it may have, in the queue it now waits, rescheduled the crawl has been, hmm.

Know we do, about Google, not guarantee it does, that indexed everything on your website will be.

Mysteries many there are… yes, hmmm.

Sorry

Crawl budget issues?

Crawl budget issues might indeed be at play here. Essentially, pages end up in the 'Discovered' state when Google hits its crawl limit. If they remain there for a long time, it could be due to low crawl demand, meaning Google isn’t rushing to perform the rescheduled crawl. Googlebot avoids overloading the site by not making a server request if it's near the site's crawl limit. Google sets this limit based on the server's estimated 'capacity' to handle requests, derived from past crawl activities and considering other related hostnames on the same server, all to prevent server overload.

Still following? Essentially, Google takes a moment to think and considers many factors, which might unexpectedly affect the crawl rate limit.

Googlebot aims to be considerate of the web's health. Its main task is crawling, but it ensures this doesn't harm user experience on the site. This is managed through the "crawl rate limit," controlling how much Googlebot can fetch from a site without overloading it. This limit affects the number of parallel connections Googlebot uses and the wait time between fetches. The crawl rate adjusts based on site response times and errors; if the site is quick and error-free, the limit increases, allowing more extensive crawling. Conversely, if the site is slow or errors occur, the limit decreases.

Unfortunately, you can't usually increase Google's maximum crawl rate limit, though there are rare exceptions. Most websites, whether large, medium, or small, can encounter crawl budget issues. One strategy is to reduce unnecessary HTTP requests to your server, essentially excluding pages that waste Google's crawl effort. However, this isn't an issue with my website.

So, what's the issue then? Indexing problems?

Indexing also hinges on the page's content and metadata. Common indexing issues can include:

- Low-quality content;
- Robots meta tags that disallow indexing;
- Website design that complicates indexing.

After all this, we still don’t have an answer as to why our pages aren’t being indexed. Isn't that just wonderful! No sarcasm intended.

What do I know about my pages?

Where I can’t find any technical indexing issues, I’m focusing on content. I must admit that some of the content was AI-generated. While there are no direct penalties for using AI, and it’s perfectly fine to use as long as it adds value for the visitor, I can still take a closer look at my pages. Also, not that this has much to do with content, but these pages don’t have metadata either. The absence of meta tags doesn’t mean pages can’t be indexed. They primarily serve to provide information to search engines and users about a page’s content.

Times have changed…

Well, with developments like ChatGPT, I believe Google is busier than ever. The internet is rapidly filling up with more websites and pages (made easier and more accessible by AI). So, I can imagine that many websites are finding themselves at the back of the queue.

From this entire narrative, I think page quality is a critical factor. In the coming weeks, I’ll focus more on this so I can present a solid case for you

I suggest not eating too many eggs over the Easter weekend.

TO BE CONTINUED...