So! You’ve been reading up on digital sustainability and now you want to know just how much Co2 your website emits. But you’ve run into a problem…
Tools like websitecarbon.com, ecograder.com and digitalbeacon.co are great, but they only give you the result of one URL at a time. This is helpful for getting a baseline reading of no doubt the most popular webpage of any website – the home page – but it doesn’t account for the massive complexity of just how many pages there are on your website, let alone who is looking at them, be they human or robot.
In this article, (well, more of an essay!) I will outline how you can use Screaming Frog SEO and its various configuration settings to conduct highly custom, comprehensive crawls of any website, to calculate its full digital carbon emissions. Something that has challenged specialists in this field for a long while.
Requirements
- You will need a paid plan from Screaming Frog SEO to use all the features mentioned in this article.
- You will need to be running version 20.1 release or above.
Who are you, and what do you know about Digital Sustainability?
My name is Ben and I am an actor turned web professional. Throughout the pandemic, I taught myself to code and then found myself working as a Project Manager at Wholegrain Digital – one of the pioneers of digital sustainability and the creators of websitecarbon.com. Over my time there, I learnt a lot about digital sustainability – but the process of auditing an entire website was very cumbersome. Now I’m excited to share what I’ve learned and what I’ve observed from the Screaming Frog SEO software.
Let us begin…
What do we need?
Total Bytes Transferred
Calculating the digital carbon emissions of a webpage is done by retrieving the bytes of the fully loaded page (often through a tool like Google Pagespeed) and then making a calculation with that number. You can read more about the calculation on the Sustainable Web Design website here and in the Co2.js documentation but that’s broadly how it all works.
Is it on Green Hosting?
Then we need to know if the website is on Green Hosting – that is, we need to know if the server that powers it runs on renewable energy or that the company who manages it are at least offsetting their carbon emissions for the power it uses. You can find this out by using websitecarbon.com or on the Green Web Check tool on The Green Web Foundation’s website. This makes a difference to the calculation I mentioned earlier.
How many views?
We also need to know how many views each page has received over a given time period. Once we have the total bytes transferred of a singular webpage, it makes sense to multiply that number by the number of times that URL was viewed over a given time period. This way, we can account for each time that data was loaded. Of course this doesn’t take into account repeat views and caching – but I prefer to take a “worst case scenario” approach when carrying out this process. This way, we overestimate rather than attempt to be 100% accurate. We won’t be far off and this is always going to be an estimation rather than a fully accurate model.
You can get the views information from any tracking service. For this article, we’ll assume you use, and have access to Google Analytics, but this is applicable for any tracking service you use, as long as you can get the total views of each webpage over a given time period.
Objective: To account for the full page weight (that is the total bytes transferred) of every webpage and its assets, when fully loaded, over a given time period.
Why use Screaming Frog SEO for this?
Screaming Frog is, to my knowledge and at this time, the only website crawling software that includes the Co2.js package by The Green Web Foundation. Their software is able to retrieve the total transferred bytes of every URL that is returned in the crawl and from this, it calculates the Co2 emissions using the Co2.js package.
More importantly, it is also capable of executing custom Javascript on each page – allowing you to simulate user interaction with the webpage. This part is fundamentally crucial to loading the full assets of a webpage and thereby accounting for all of its assets – not just what’s loaded on the initial page load. Personally, I don’t know of any other performance tool that is able to do this at this point in time. You can read more about this problem, aptly termed the 1mm gap, here.
Finally, it also offers an integration with Google Analytics 4 which many consider to be the industry standard analytics platform. By utilising this, you are able to cross reference views or hits with every URL’s carbon emissions score.
How to configure the audit
Now we know the basics, we can get started. To begin, you’ll need to set up the crawl according to your specific use case. You will need to make some interesting and considered choices.
You will need to decide if you will…
Account for the weight of cookies and tracking pixels
Tracking cookies have a carbon cost. They are just more code! Code which has a file size which needs to be loaded onto the user’s device, which therefore has a carbon cost. With Screaming Frog, you have the option to count or discount these in your crawl under “Configuration > Extraction > Cookies”.
The case for counting them in your crawl is that arguably, most users will opt-in to cookies which will impact the overall size of their data transferred per view. I.e. this is the “worst case scenario”. The case against, is that if users don’t opt in or decline, then they won’t (and shouldn’t) be loaded in and therefore shouldn’t be counted.
You can toggle this by going to “Configuration > Crawl Config > Extraction > Cookies”.
Use the Green Hosting Calculation
This is a fairly straightforward one. If your website runs on Green Hosting, turn it on! If not, turn it off. You can do this under “Configuration > Crawl Config > Spider > Advanced”.
This feature also has an interesting use case in that you can see the carbon improvements that could be made if you switched to a green host. For example, if you’re not on a green host and are trying to persuade your boss or your client that they should move to a green web host – you could run the crawl once with it turned off and then again with it turned on. The scores will improve across the board because the Co2 calculation allows for a green web hosting variable to be set to true or false. It favours true and you will see the scores improve when this is turned on, therefore supporting your argument to change hosts to a green provider.
IMPORTANT: Scroll the page or simulate user interaction
This is the most important and perhaps impressive feature that Screaming Frog can offer. Until now, it was pretty much impossible to have a website carbon test which could iterate through a website’s pages at scale, and load the full assets of each page to make the most accurate measurement.
This is because most performance testing platforms simulate a bot visiting the site and initiating a page load, rather than a real human who may move their mouse, scroll the page or make another interaction before, during or after the page loads.
This human interaction loads additional assets which are not counted in these performance tests because they are simply not initiated. For example, a large image or video embed that is being lazy loaded (i.e. doesn’t load in until it’s actually visible) three full scrolls down the page will not be counted in the initial page load because it simply won’t be loaded in. Therefore, when measuring this page, the weight of said asset will not be counted in the total bytes transferred measurement.
This is a false reading and is why you may occasionally find a website that looks very carbon heavy (e.g. full of images, slow to load, maybe uses a video header), but is scoring A’s across the board on website carbon measurement tools. Again, this problem is detailed in this article about the 1mm gap.
But now with Screaming Frog, you are able to use custom Javascript snippets on a page by page basis in your crawl – including the ability to scroll the page and therefore load all those additional assets! This is very exciting.
I have confirmed with the team at Screaming Frog and verified myself that if you enable this feature, the readings from the total bytes transferred metric become larger and it is from this reading that the Co2 calculation can be made.
I’d encourage you to enable this setting in your crawl. This will allow the software to scroll each page it encounters a set number of times (which you can configure) to ensure you have accounted for as much of each page’s content as possible.
You can do this by:
- firstly, going to “Configuration > Crawl Config > Rendering” and changing the option to use Javascript rendering.
- going to “Configuration > Crawl Config > Custom > Custom Javascript > Scroll Page”.
Further to this, you can configure the number of times you wish to scroll by opening the snippet and changing the Javascript constant “numberOfTimesToScroll” from 5 to whatever you wish.
Finally, if you’re savvy with Javascript or know someone who is, you could also conduct a crawl where you measure the impact of any other sort of user interaction. For example, you could configure it to click play on all videos it encounters and watch them for 20 seconds before moving on, or you could simulate clicking through a checkout path on an e-commerce site.
Crawl the website as the a bot, or emulate a mobile, desktop or custom device
In the previous section, I recommend to use the custom Javascript snippets feature. This is only available to use if you change the rendering settings to enable Javascript rendering. Assuming you have this enabled, you can further customise your website carbon audit and crawl settings by selecting which device to emulate i.e. which device characteristics and screen widths will be used to load each page of your crawl.
When you enable Javascript rendering, the default settings are to use the “Googlebot Mobile: Smartphone” preset. You can read about how this works in Screaming Frog’s documentation here. But you are free to choose another device by selecting from the drop down list.
In theory, choosing different devices should result in different values being returned in the total bytes transferred metric. This will be because the viewport (i.e. the size) of the device will directly impact both the number and size of the assets loaded onto the page.
For example, an image on a mobile device may be rendered as 250px by 250px and weigh 180kb. But on a desktop device, that same image may be rendered as 600px by 600px and weigh perhaps 320kb, due to the increase in size. This is a very simplistic example, but the principle remains the same.
However, in practice, I have noticed some inconsistencies with how this is currently working in the software. In general, I’m still finding that the default Googlebot Mobile: Smartphone preset is returning the most accurate results – that is – the total transferred bytes metric being returned on a given webpage is closest to what I can verify the full weight of a loaded page is, through my Chrome browser’s developer tools when I use this preset.
Therefore, at this point in time, my recommendation is to stick with the default settings but feel free to experiment with this.
Again, if you notice that the majority of your audience are using desktops or tablets to view your website, it may be more appropriate to configure the crawl with that in mind.
You can alter all these settings in “Configuration > Crawl Config > Rendering”.
Crawl the website as the Screaming Frog SEO Spider User-Agent or emulate a different browser
Screaming Frog’s default settings use a User-Agent of “Screaming Frog SEO Spider”. This means that when it crawls your website, it does so as a bot and identifies itself to the web server as the “Screaming Frog SEO Spider” bot.
However, you can change this by using one of many of their available preset User-Agents.
This is useful because different browsers will render web pages differently and you may find variations in the total bytes transferred metric depending on which browser or bot you choose to emulate.
Screaming Frog recommended I use Chrome as some websites vary greatly depending on User-Agents, but to get the most realistic reading, I’d actually recommend choosing whichever User-Agent makes up the majority of your website traffic. You can get this information from your Google Analytics account by viewing the browser information of the majority of your users.
You can do this by going to “Configuration > Crawl Config > User-Agent”.
Respect or ignore robots.txt
The robots.txt file informs bots – both good and bad ones – which parts of the website they are allowed access to. There are sections which are just meant for bots and others that really aren’t. For example, in WordPress, you will often find that bots are not allowed access to the WordPress REST API for security reasons, but they’ll also be directed to the site’s sitemap so that search engines like Google can index your content. You can also use the robots.txt file to block particular bots e.g. to opt out of being crawled by OpenAI’s ChatGPT.
The decision to make here is whether you feel these mostly hidden parts of your website are valid URLs to account for in your website carbon audit – or not.
On the one hand, they do absolutely exist. They are still public facing pages, that can be found and accessed by anyone on the web, so they still have a carbon implication – albeit a small one as it’s usually only text based content. For example, here’s my sitemap.
On the other hand, usually, the only people that are even aware that such pages exist are developers and robots. It may be more helpful to your use case to only account for publicly facing URLs. It’s also usually the case that these sorts of URLs are not counted in tracking platforms like Google Analytics, so cross referencing the hits from visitors (or bots) becomes a lot harder in the final stages of this process.
In Screaming Frog, you are able to respect or ignore robots.txt. If you ignore it, all pages will be crawled, regardless of whether the crawler is meant to have access to those pages or not.
You can do this by going to “Configuration > Crawl Config > robots.txt”.
Configure the Google Analytics integration (optional)
You can optionally configure the Google Analytics integration natively supported by Screaming Frog.
If you enable this, you can later cross-reference the number of views metric with the “Co2 (mg)” metric which will be returned in the crawl on a per url basis. This will allow you to account for the total Co2 emitted by every page of your website in the given time period!
I will cover how to do this part later in the article.
To do enable the Google Analytics integration, follow the usual steps to connect your Google Analytics 4 account. Then navigate to the next tab and choose the time period you are interested in. For a first crawl, I’d recommend just 1 month to check it’s working properly, but you can select any available period you have data for in your GA4 account.
Go to “Configuration > Crawl Config > API Access > Google Analytics 4 > Account Information & Date Range”.
The “Views” metric will automatically be collected in the crawl if using the GA4 integration.
Again, you could get super custom with this by filtering down your audience segments and events in your crawl settings. You could, for example, isolate the data from GA4 so that you account for the carbon emissions emitted only by users in a certain audience e.g. purchasers, or perhaps only those who have triggered certain events e.g. watched a video.
Run the audit
Now that you have configured everything to your use case, it’s time to run the crawl. This is easy, just enter your URL and hit start!
Let it run until it completes.
Exporting the data
Export the entire crawl’s results as a CSV. Ensure you have all the data and metrics you are interested in and then hit the “Export” button.
For the purposes of website carbon auditing, it would be pertinent to include the the following:
- Address – (i.e. the URL)
- Content Type (optional)
- Size
- Transferred
- Total Transferred
- CO2 (mg)
- Carbon Rating
- Response Time (optional)
- Cookies
- GA4 Views (optional)
Since we are mostly interested only in the carbon emissions of each webpage (rather than asset e.g. an image or pdf), you could filter the data in Screaming Frog before you export it to only include URLs with a Content Type of HTML. But equally, you can do this in a spreadsheet and if you truly want to account for everything then each of your asset URLs will also have a carbon emission and carbon rating score attached to them i.e. all your images, PDFs, SVGs, etc.
Analysing the results
Congratulations! Now you have all the data! You’ll quickly be able to see which pages are the most problematic by sorting the CO2 (mg) column from ascending to descending, or sorting the Carbon Rating column if you find that more helpful.
The last step is to sum up the CO2 (mg) emissions of every URL in your data set. And this is now super easy – just sum the column and there you have it!
If you’re using the GA4 integration, this is still easy but has one extra step. You need to add a new column to the right in your spreadsheet and add this formula:
=[column_no. of CO2 (mg)]*[column_no. of GA4 Visits]
It should look something like this.
Apply this to every row of your data set and then sum those values.
You did it! Now you know:
- How much CO2 (mg) every page of your website emits every time one person visits it.
- How much CO2 (mg) every page of your website has emitted over a given date range (if you’re using the GA4 integration).
There is also opportunity to visualise this data in platforms like Google’s Looker Studio.
I’ve recently developed my own template which is capable of dynamically calculating the carbon emissions of every webpage in a site across the date range selected in the corresponding GA4 account. This means, for example, that I can see the emissions generated by the whole site in any given day, week, month or year and I can retrospecitvely see the impact that marketing campaigns have had, and perhaps forecast or anticipate the emissions that my future marketing efforts to drive traffic to the site may have.
I use this in my own client audits and I’m pleased to offer this as a free resource so that others can use it for their own audits. Feel free to repurpose and rebrand it to support your own service offerings too.
To use it, you’ll need to run your own crawl as outlined above and then pull in the GA4 data from the corresponding account. Next, you’ll need to create a Blended Data Set by linking the “Address” field of the Screaming Frog crawl with the “Full Page URL” in the corresponding GA4 data set.
Once you have configured this, the dashboard should reload with your data and you can create any number of interesting graphs, tables and charts to explore your site’s carbon emissions and strategise a plan to reduce them.
Disclaimer & Additional Notes
At time of writing, this is information is accurate as far as the limits of my knowledge go and my personal experience using the Screaming Frog SEO software. However, I’m still experimenting with the tool and there will likely be different results and anomalies in every crawl due to the sheer number of different configuration options there are, and the intricacies of different types of websites.
I encourage you to report anything unusual you notice to the Screaming Frog support team, whom I have found most helpful in discussing and designing this process.
Nevertheless, I am so impressed and excited at the possibilities this now presents. Never before have web professionals been able to measure digital carbon emissions at such scale, with such accuracy.
If you managed to get to the end of this article – congratulations and thank you!
If you’d like to discuss any of this with me, please do get in touch.
Acknowledgements
A big shout out to the teams at Wholegrain Digital for the inspiration and Screaming Frog SEO for these new features.
This is not a sponsored post, and is purely just me sharing my knowledge in the hope that others can use it to #green-the-web!