Google does not want to comment on a possible major leak in the documentation of its search algorithms

Google’s search algorithm is perhaps the most consistent system on the Internet, dictating which sites live and die and what content on the Internet looks like. But exactly how Google ranks websites has long been a mystery, pieced together by journalists, researchers, and search engine optimization practitioners.

Now, an explosive leak that claims to reveal thousands of pages of internal documents appears to offer unprecedented insight under the hood of how Search works — and suggests that Google hasn’t been completely honest about it for years. So far, Google has not responded to multiple requests for comment on the legitimacy of the documents.

Rand Fishkin, who has worked in SEO for more than a decade, says a source shared 2,500 pages of documents with him in the hope that reporting on the breach would counter the “lies” Google employees had shared about how the search algorithm works. The documents outline Google’s search API and outline what information is available to employees, Fishkin said.

The details shared by Fishkin are dense and technical, probably more readable for developers and SEO experts than for the layman. The contents of the leak also don’t necessarily prove that Google uses the specific data and signals it mentions for search results. Instead, the leak outlines what data Google collects from web pages, sites, and search engines and offers indirect hints to SEO experts about what Google appears to care about, as SEO expert Mike King wrote in his overview of the documents.

The leaked documents cover topics such as what kind of data Google collects and uses, which sites Google puts forward for sensitive topics like elections, how Google handles small websites, and more. Some information in the documents appears to conflict with public statements made by Google representatives, according to Fishkin and King.

“’Song’ is harsh, but it’s the only correct word to use here,” King writes. “While I don’t necessarily blame Google’s public representatives for protecting their proprietary information, I do take issue with their efforts to actively discredit people in the marketing, technology, and journalism worlds who have made reproducible discoveries .”

Google has not responded The edge’s requests for comment regarding the documents, including a direct request to refute their legitimacy. Fishkin told The edge in an email that the company did not dispute the veracity of the leak, but that an employee asked him to change some language in the post about how an event was characterized.

Google’s secretive search algorithm has spawned an entire industry of marketers who closely follow and implement Google’s public guidelines for millions of businesses around the world. The ubiquitous, often annoying tactics have led to the common narrative that Google’s search results are becoming increasingly poor and full of junk that website administrators deem necessary to make their sites visible. In response to The edgeIn previous coverage of SEO-driven tactics, Google representatives often fall back on a familiar defense: That’s not what the Google guidelines say.

But some details in the leaked documents cast doubt on the accuracy of Google’s public statements about how Search works.

An example cited by Fishkin and King is whether Google Chrome data is used in rankings at all. Google representatives have repeatedly stated that it does not use Chrome data to rank pages, but Chrome is specifically mentioned in sections about how websites appear in Search. In the screenshot below, which I took as an example, the links that appear under the main URL of vogue.com may have been partially created using Chrome data, according to the docs.

Chrome is mentioned in a section about how additional links are created.
Image: Google

Another question raised is what role, if any, EEAT plays in the ranking. EEAT stands for experience, expertise, authoritativeness and reliability, a Google metric used to evaluate the quality of results. Google representatives have previously said that EEAT is not a ranking factor. Fishkin notes that he hasn’t found much in the documents that mention EEAT by name.

However, King has detailed how Google appears to collect author information from a page and includes a field to indicate whether an entity on the page is the author. Some of the documents shared by King state that the field is “developed and tailored primarily for news articles… but is also used for other content (e.g., scientific articles).” While this doesn’t confirm that bylines are an explicit ranking metric, it does show that Google is at least tracking this attribute. Google representatives have previously insisted that author bylines are something website owners should do for readers, and not for Google, because it has no impact on rankings.

While the documents aren’t exactly a smoking gun, they provide an in-depth, unfiltered look at a closely guarded black box system. The US government’s antitrust case against Google – which centers on Search – has also led to internal documentation becoming public, providing further insight into how the company’s flagship product works.

Google’s general ignorance of how Search works has left websites looking the same as SEO marketers trying to outsmart Google based on the hints the company provides. Fishkin also cites the publications that credulously support Google’s public claims as truth, without much further analysis.

“Historically, some of the loudest voices and most prolific publishers in the search industry have been happy to repeat Google’s public statements uncritically. They write headlines like “Google says XYZ is true,” instead of “Google claims XYZ; Evidence suggests otherwise,” Fishkin wrote. “Please do better. If this leak and the DOJ process can bring about just one change, I hope this is it.”

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top