I Miss Referer
Posted
What is Referer?
The HTTP Referer
[sic] header has a couple of uses.
- For top-level navigation it contains previous URL the user was on.
- For in-page requests it contains the URL of the page that the request originated from.
See the spec for more info.
What is Happening to it?
Most modern browsers have changed the default Referrer-Policy
(spelled correctly) to strict-origin-when-cross-origin
. This means that for cross-origin scenarios only the referring origin will be sent. For example if you follow a link on https://source.example/123
to https://destination.example
the browser will send Referer: https://source.example
.
This is a good thing with very legitimate reasons however as the owner of a small personal website I will miss the full utility of this header.
What are the Advantages of Sending a Full Referer?
The main advantage is that as a website operator I can see what websites are sending traffic my way. Especially as my website content often contains personal opinions it is always interesting to see what people have to say. It also satisfies my curiosity to the number of people that come from a particular site.
Why Did the Default Need to be Changed?
From my point of view, the primary reason to change the default is to avoid leaking sensitive URLs. Although putting sensitive information in URLs should be avoided it is commonly done. For example Google Docs puts the document ID in the URL (ex: https://docs.google.com/document/d/1ScJODuif5gNW_khf_LCWZG7OKZD7AtUHjzRcxAhieNw/edit
). They also offer an access option “Anyone with the link” which allows anyone who knows the URL to access the document. This scheme is reasonably private because Google Docs’ URLs are securely random. Unless you give someone the URL it is effectively impossible to guess it. However it would be bad if following a link in a document or a comment shared the document URL with the linked site! Google was aware of this issue and doesn’t send Referer
s from https://docs.google.com
but it is easy to imagine a website forgetting to set an appropriate Referrer-Policy
and is bitten by the default behaviour. This is one example where the web is insecure by default!
The Referer
header can also be used for tracking. Personally I think this is less important but as browsers are starting to crack down on tracking surfaces closing the last leaks will become more and more important.
If you didn’t read it the first time, I’ll again link Mozilla’s list of problems with Referer
.
Are There Any Alternatives?
Linkbacks
Linkback is a general term for protocols notifying a site when it has been linked to. My site supports both Pingbacks and Webmentions. In fact this is where the comments and mentions below the post come from! If you link to one of my posts and notify my site a link back to your site should eventually appear below the post (subject to moderation).
The major downside to these approaches is that it requires the linking site and linked site to both support a specific protocol and requires the linking site to actively send notifications. Additionally Linkbacks provide no volume information, however paired with the origin information from the Referer
header volume can be tracked. (You won’t be able to tell different pages on the same origin apart though.)
Web Crawlers
The next option to find linking sites is asking a web crawler. While most search engines don’t provide a feature for finding which sites link to a specific site they do usually have this information from crawling the web. For example Google, Bing and Yandex will provide you with reports on incoming links for no charge. There are other crawlers, mostly aimed at SEO optimization that can give you some of this info as well.
It’s a shame that we need to rely on these massive companies to do us favours, but it isn’t feasible for the average person to crawl the web. Furthermore web crawling is always incomplete and has a significant update delay. Like linkbacks this also doesn’t provide volume information.
Possible Future Standards???
I wonder if there could be a way to help independent websites link the web together while preserving user privacy. I had a couple of ideas but none of them seem satisfactory.
- Hash the full referrer (maybe with salt?) and pass it in a
Referrer-Hash
header. This would allow operators to separate the traffic from known referrers. This solves the security issue but provides only minimal privacy benefits. However it still requires some crawling to find the exact referrer URL if it is unknown. - Aggregate end-of-day reports. If the user could pass the referral information to a trusted third party that party could forward it to the linked website. Possibly something like Tor could be used instead of a trusted party. This separates the link information from the user which has nice privacy benefits. However the trusted third party would need to have access to at least the domain name. (The other information could possibly be encrypted with the website’s TLS certificate.) This also provides some volume information which is nice.
- This would still have to be opt-in or hashed to avoid leaking sensitive URLs.
I’m not sure if it is really possible to accomplish. Ideally this would be done in browsers so that it works for every site, but it is very difficult to send this information from a browser without enabling tracking. So I guess the best options right now are:
- Support sending Pingbacks and Webmentions. You probably also want to support receiving them, but you should be careful about showing them on your site automatically as they are a spam vector.
- If your site doesn’t contain sensitive values in the URLs consider relaxing your
Referrer-Policy
. However I suspect that browsers will start disabling the more relaxed policies soon to prevent tracking.
Let me know if you have other ideas.