State of IPFS Websites

Posted on 2020-10-30

I’ve released a number of websites via IPFS and would like to share what I have learned. For this post I will be discussing static websites specifically. Using IPFS to distribute and share user content is also a very interesting use case but has completely different pros and cons from what I will discuss here.

The Basics

Putting content onto IPFS is very simple.

$ mkdir root
$ echo '<!DOCTYPE html><title>My Site</title><p>Hello from IFPS!' >root/index.html
$ ipfs add -r root
added QmNkKyrVybqzCaE8LTKdVcd4F7AN5tszMCeXBQ6B4hsg4o root/index.html
added QmSLTqrJz5bwxRb2KkaXajYB2CJesemqE1aitNKTAJqCvv root
$ xdg-open https://ipfs.io/ipfs/QmSLTqrJz5bwxRb2KkaXajYB2CJesemqE1aitNKTAJqCvv/

Tada! You have a website. And if you add a DNS entry _dnslink.mysite.example TXT dnslink=/ipfs/QmSLTqrJz5bwxRb2KkaXajYB2CJesemqE1aitNKTAJqCvv people can look up your website with a human-readable name such as ipns://mysite.example.

The Good

Archive-ability

The biggest reason that I use IPFS for static websites is that it allows others to easily archive or help host the website. Anyone can run ipfs pin add $(ipfs resolve /ipns/mysite.example) and now they will have their copy forever. Moreover if I stop hosting the site for any reason everyone will automatically use their copy, without any concern of the pinner modifying the content.

The only real concern is that if the site goes down suddenly no one may remember the hash of the last version. However if they have any previous version’s hash pinned at least they can continue to use that. I imagine that if IPFS becomes popular browsers would learn to store the hashes in their history, and maybe even provide an option to pin every page that you visit for a period of time.

For example think of web archives today. https://archive.is and https://archive.org are both popular, however if one of them goes down the content is lost. You may be able to find the page, or a similar version on the other. But you have no way of “pinning” the content so that your links will always be alive and referencing the exact same content (with cryptographic verification). It is easy to imagine that in an IPFS world the archives could easily pull offline webpages from each other, meaning that availability of dead websites can actually increase over time, instead of slowly fading.

Free CDN

There are many public gateways that will serve your IPFS website for free. This means that you have little worry about getting slashdotted as the gateway that you use will cache the files leading to low load on the origin server. Furthermore users who are accessing over IPFS directly will help serve the files as well.

Some of these gateways have weak or non-specified Terms of Service, however there are a number that provide a good enough guarantee for my sites. Of course if you were running a valuable service you can use your own or prepare fallbacks between the gateways.

This isn’t an IPFS exclusive benefit because there are many companies with generous free CDN allowances, but it is still nice to have. If the IPFS native protocol takes off this could also become a global benefit without relying on third-parties due to IPFS’s peer-to-peer sharing.

Easy Testing

It is very easy to distribute test versions of the site as you can just link to them by hash, instead of the domain name. For example if I considered our example very exciting I could share it as https://bafybeib3mj3aaov7cl2zkidrdh3yqmicewtt76goohlkavi37q7vucgtom.ipfs.dweb.link.

Many projects have “review apps” where patches submitted to their CI system will create an isolated deployment that you can link to, however the ability for anyone who clones my repo to generate and share these without requiring any infrastructure or permission from me is very cool.

The Bad

Gateway Woes

The IPFS gateway concept is incredibly cool! Gateways are a fairly elegant bridge between the supposed IPFS future and the ubiquitous HTTP of today. However they are not perfect.

No Header Control

IPFS gateways of today are built around the UnixFS data model. As the name suggests, this models a UNIX filesystem, not HTTP requests. This inevitably leads to some mismatches between the two abstractions, the most notable of which is the lack of headers. There are some headers that make sense to avoid, since everything is immutable in IPFS Cache-Control: immutable should be applied to everything. Some headers are more related to the protocol than the content, for example Date, Range and Via. However there are some headers that would be incredibly useful to have available such as Content-Security-Policy, Link, Location and most notably Content-Type. Right now IPFS gateways determine Content-Type by sniffing the extension or content. Not only is this ugly, but it also means that whether or not your static website works or not depends on the gateway—and the version of gateway software—that it is accessed from.

It would be an interesting proposal to try and make a more HTTP-focused data-model. For even if IPFS can replace many use cases of the HTTP protocol the model of the web is unlikely to change any time soon, so many of these HTTP features will continue to be useful.

There is a proposal for setting headers on top of UnixFS which would resolve this difficulty.

For now the best option is to ensure that all of your assets are inside of a directory. For example instead of using /ipfs/QmcniBv7UQ4gGPQQW2BwbD4ZZHzN3o3tPuNLZCbBchd1zh wrap it in a directory and use /ipfs/Qmdk6MqBXBY3AA4DgmJXyJF3iCNyZVYkAMmzA5VV7mbjVh/v.mp4. For binary file types with magic numbers this is less of a concern, however for text-based formats providing a well-known file extension will avoid flakey content sniffing.

IPNS Consistency

IPFS websites, like all websites, are a distributed system. When the user requests /ipns/mysite.example/index.html then discovers that they need /ipns/mysite.example/script.js there is no guarantee that the IPNS record hasn’t changed in the meantime. Furthermore most gateways aggressively cache IPNS so this issue can be more dramatic than your average static-site CDN. The standard solution of hashing your assets can lead to users who have loaded an old copy of the homepage being unable to load the files they require to display it. The solution is to avoid relative links (against the suggestion of every IPFS tutorial) for assets (like JavaScript and CSS) and instead use absolute links to the content. For more information see my other article You Don’t Want Atomic Deploys.

If you are serving your site at https://gateway.example/ipns/mysite.example this isn’t too big of a concern as you can just use /ipfs/{hash} however if you are serving the site at a custom domain you will need to hardcode the gateway host. In the future the ipfs: URL scheme will hopefully be widely supported and each user can use their preferred gateway (quite possibly a local one) however currently it has near-zero adoption. This means that you need to hardcode a gateway URL into your links. Fortunately some IPFS browser implementations will automatically detect popular gateways and rewrite requests to the gateway of the user’s choice (although this can also cause issues if the user’s preferred gateway uses different headers than the one that you tested on).

Most build tools don’t make this easy. As you will have to hash the file to calculate the URL. You also want to make sure that you still pin the assets. I do this by keeping the files in an /ipfs/ directory in the web root although they will never be accessed via this path. I wrote an assetgraph transform that makes this fairly simple.