Paywalled content capture with structured data
The phenomenon of “paywalled content” has significantly affected the way users access online information. Websites and online platforms are increasingly placing their valuable content behind a pay wall, requiring users to pay to access certain articles, research reports, or other resources. This led to a search for alternative methods to still access this information (for the search engines).
What is structured data?
Structured data refers to any form of data that is organized and easily read by machines. In the context of Web development, structured data is often used to help search engines better understand the content of a Web page. This is done by marking specific parts of content with standardized labels or tags that indicate what that content represents, such as an article’s title, author, publication date, and more. For more on this, check out my article on structured data.
What form is used for paywalled content?
For marking content that is behind a paywall with structured data, you could use the following structured data placeholder, following schema.org guidelines:
{
“@context”: “http://schema.org”,
“@type”: “NewsArticle”,
“headline”: “Example of an Article Title”,
“datePublished”: “2024-02-08”,
“isAccessibleForFree”: “False”,
“hasPart”: {
“@type”: “WebPageElement”,
“isAccessibleForFree”:
“False”,
“cssSelector”:
“.paywallContent” }
}
This code marks an article whose specific parts (denoted by .paywall Content CSS selector) cannot be accessed for free. Customize this according to the specific structure and content of your website. This is a general example; consult Google’s official documentation and guidelines and schema.org for detailed implementation instructions.
These are often news articles, so you can use NewsArticle as a section and specifically use the isAccessibleForFree property.
Paywalled content and its impact on SEO at a glance
No time for a long article? No problem. Watch the video below for a summary of this article.
I read an article about this the other day, so I thought I’d use part of this article to go into this in more detail. The article on SERoundtable discusses how Google’s approach to paywalled and subscription-based content via structured data is not leaky, in the face of criticism from the community that the system still provides ways to access content behind paywalls.
Google’s solution, introduced in 2017, allows publishers to share entire content with Google’s crawler for better understanding and indexing, without making that content accessible to the public. Google’s Danny Sullivan stressed that the method is safe as long as publishers follow Google’s guidelines, including verifying Googlebot’s IP addresses and blocking cached copies if necessary.
Criticism of the system suggests that changing the user agent to Googlebot could lead to unauthorized access, but Sullivan refutes this by stating that IP authentication can block such attempts.
In my opinion, this is a fairly watertight method of monitoring content. I wouldn’t put my biggest secrets behind it, but when it comes to content that you can otherwise access for a ten euro a month subscription, this is a good method.
For more information, read the full article on SERoundtable at this link: Google Says Its Paywalled & Subscription Structured Data Method Is Not Leaky.
Scaling up implementation
Often this form of structured data will be used for a large section of a Web site. Implementing structured data on a large Web site, especially for paywalled content, requires a systematic approach. This includes:
- Analyze content structure: Understand how content is organized (e.g., articles, blogs, product pages) and identify which parts are paywalled.
- Template update: Update templates to automatically insert structured data on relevant pages, using properties such as
isAccessibleForFree
. - CMS modifications: Adapt the Content Management System (CMS) to support structured data fields so that editors can add the necessary markups.
- Automation: Use scripts or software tools to add structured data to existing pages in bulk, if possible.
- Validation and testing: Use Google’s Structured Data Testing Tool or Rich Results Test to verify correct implementation.
- Monitoring and maintenance: Monitor performance and adjust as needed, especially after updates to the website or changes in structured data guidelines.
A structured and automated approach is essential for scalability and consistency across a large Web site.
Summary
Content you want to protect I would really do it the way Google itself outlines. Giving access to the IP addresses listed by Google itself. Then, in principle, it should be quite possible to rank with paywalled content without immediately revealing it.
What is important to mention in this is that you have at least one big disadvantage in ranking this kind of content: a sky-high bounce rate and a lot of pogosticking because almost no one will go pay for your content to read it, while there are often plenty of other websites that price content for free. Good luck!