Cheap VPS & Xen Server

Residential Proxy Network - Hourly & Monthly Packages

Can You Now Trust Google To Crawl Ajax Sites?


Can You Now Trust Google To Crawl Ajax Sites?

Web designers and engineers love Ajax for building Single Page Applications (SPA) with popular frameworks like Angular and React. Pure Ajax implementations can provide a smooth, interactive web application that performs more like a dedicated desktop application.

With a SPA, generally, the HTML content is not loaded into the browser on the initial fetch of the web page. Ajax uses JavaScript to dynamically communicate with the web server to create the HTML to render the page and interact with the user. (There is a technique called “Server-Side Rendering” where the JavaScript is actually executed on the server and the page request is returned with the rendered HTML. However, this approach is not yet supported on all the SPA frameworks and adds complexity to development.)

One of the issues with SPA Ajax sites has been SEO. Google has actually been crawling some JavaScript content for a while. In fact, this recent series of tests confirmed Google’s ability to crawl links, metadata and content inserted via JavaScript. However, websites using pure SPA Ajax frameworks have historically experienced challenges with SEO.

Back in 2009, Google came up with a solution to make Ajax crawlable. That method either creates “escaped fragment” URLs (ugly URLs) or more recently, clean URLs with a Meta=”fragment” tag on the page.

The escaped fragment URL or meta fragment tag instructs Google to go out and get a pre-rendered version of the page which has executed all the JavaScript and has the full HTML that Google can parse and index. In this method, the spider serves up a totally different page source code (HTML vs. JavaScript).

With the word out that Google crawls JavaScript, many sites have decided to let Google crawl their SPA Ajax sites. In general, that has not been very successful. In the past year, I have consulted for a couple of websites with an Ajax Angular implementation. Google had some success, and about 30 percent of the pages in Google’s cache were fully rendered. The other 70 percent were blank.

A popular food site switched to Angular, believing that Google could crawl it. They lost about 70 percent of their organic traffic and are still recovering from that debacle. Ultimately, both sites went to pre-rendering HTML snapshots, the recommended Ajax crawling solution at the time.

And then, on Oct 14, Google said this:

We are no longer recommending the AJAX crawling proposal we made back in 2009.

Note that they are still supporting their old proposal. (There have been some articles announcing that they are no longer supporting it, but that is not true — they are simply no longer recommending that approach.)

In deprecating the old recommendation, they seemed to be saying they can now crawl Ajax.

Then, just a week after the announcement, a client with a newly launched site asked me to check it out. This was an Angular site, again an SPA Ajax implementation.

Upon examining Google’s index and cache, we saw some partially indexed pages without all the content getting crawled. I reiterated my earlier recommendation of using HTML snapshots or progressive enhancement.

This site was built with Angular, which does not yet support server-side rendering (again, in this case, the server initially renders a page to serve up the HTML document), so progressive enhancement would be difficult to support, and HTML snapshots are still the best solution for them.

She replied, “But why? Everything I read tells me Google can crawl Ajax.”

Can they? Let’s take a deeper look at the new recommendation in regard to Ajax.

Google’s New Ajax Recommendations

In explaining why they are deprecating the old recommendation, they say (emphasis mine):

We are generally able to render and understand your web pages like modern browsers.

Many people might be quick to conclude that they can now crawl Ajax without a problem. But look at the language: “generally able”? Would you bet your business revenue on the knowledge that Google is “generally able” to understand your page?

Could it be I am just picking on semantics? Let’s examine the announcement further. Later in their announcement, they state in regard to Ajax:

Since the assumptions for our 2009 proposal are no longer valid, we recommend following the principles of progressive enhancement.

They don’t spell it out in their announcement, but by recommending progressive enhancement (which loads some HTML for browsers that don’t support JavaScript), they are appear to be implicitly saying, “Don’t count on us crawling your JavaScript.” Why recommend this method if indeed Google can consistently crawl SPA Ajax sites?

I worried that I was perhaps overanalyzing Google’s words, but then…

John Mueller Confirms Google Still Has Trouble With Ajax 

On October 27 (less than two weeks after the Google announcement), John Mueller, on his Webmaster Central Hangout, confirmed that Google indeed still has problems with Ajax.

You can view the exchange at about around 1:08:00 into the video, where there was a question relating to a specific Angular implementation:

They still have trouble with rendering, and they expect to get better over time. John recommends some actions to help debug the issues.

Ultimately, he recommended using HTML snapshots until Google gets better at Ajax (Yes, the method that was just officially deprecated).

So, What To Do? 

  • Progressive enhancement. Server-side rendering would be required for progressive enhancement, and it is not yet supported by Angular. However, the upcoming Angular 2.0 will support server-side rendering. React does, in fact, support server-side rendering today.

    This is, however, more work than simply creating HTML snapshots. You need to make sure you render any required links so Google can crawl and index additional content that is loaded into the page.

    Nevertheless, for sites using an Ajax framework, this would be my recommended approach. (And, of course, it is Google’s recommended approach.)

  • Pre-rendering HTML snapshots. Again, don’t be confused if you have heard or read that Google no longer supports this method. They will continue to support it for the foreseeable future. They are just no longer recommending it.

    This method works; however, writing the code to pre-render and serve up the snapshots is not trivial. The good news is, there are several vendors out there such as prerender.io who will do the work for you at a relatively low cost. That is probably the simplest approach.

    This method is not ideal. Serving different source code to crawlers vs. browsers (HTML vs. JavaScript) can be problematic. It can be considered a cloaking technique, and it is not necessarily obvious what the bots are getting served. It’s important to monitor Google’s cache to make sure that they are not getting served the wrong page.

    Nevertheless, if you use a platform that does not support server-side rendering, then this may be your only solution.

Better Safe Than Sorry

Even if I had seen evidence that Google was consistently crawling Ajax sites, I would still be wary. It takes far more resources and much more time to fully render a page than to simply serve up HTML.

What will happen to sites with hundreds of thousands or millions of pages? How will it impact crawl budget? Will the crawl rate remain consistent?

Before recommending this approach, I’d rather wait and see strong evidence that Google can and does consistently crawl large, pure Ajax Single Page Applications with no negative impact on crawl rate, indexing and rankings. Please do share your own experiences.

Comments

comments