Decentralized Applications via the Web Push API

Posted on 2022-11-02

Web Push API

Push notifications are often used for evil, but I think we can all agree that they can be invaluable when used correctly.

I’ve been a fan of the Web Push API for a long time. It provides a way to send push notifications to users that is browser-independent (despite Google’s efforts to require a registered gcm_sender_id). Not only that but it is end-to-end encrypted!

The “backend API” is standardized too and is a generic protocol that can be used for non-web applications. I think a lot of Google Play Services FCM alternatives should look into this instead of their home-rolled solutions that are usually not half as well thought-out.

In fact if you have a public IP (or a trusted proxy) you can even run an HTTP Web Push backend locally. So your device could be its own Push Service.

FilePush

Recently a thought came through my mind. While the Web Push API is designed to send push messages from server-to-client, why can’t we send them client-to-client? This would be a way to have interactive, communicating web apps without a backend service. Did someone say Web3? (Err, maybe it is best to avoid that term.)

With the technology in my head I just needed an app to build, a classic solution looking for a problem. Luckily I had just the problem!

Sending files between devices can be a real pain. You can send it to yourself via email, but that is slow, and you need to clean up afterwards to avoid wasting tons of your email space. AirDrop is quite nice for transfers between Apple hardware, but I want something hardware independent and cross-platform. I often end up using services such as Snapdrop, ShareDrop and FilePizza to transfer files device-to-device. However, none of these were a perfect fit for my need. The first two do discovery by public IP, so you can only share if you are on the same network. They also require opening the website on both devices, which is a minor inconvenience. FilePizza lacks end-to-end encryption and has a slow “Processing” step before transferring the file. (It is turning the file into a Torrent which is great for sharing to many people, but that is a different problem than the one I am targeting here.)

I figured that device-to-device file transfer would be an easy proof of concept. The idea is simple.

Out-of-band contact setup via URL sharing (link or QR code).
Select contact.
Select file.
Receiver gets a push notification.
Click notification, receive file.

The contact setup makes it a bit more work than the other solutions for the first transfer, but my target use case is repeated transfers between my own devices. So one-time contact setup pays off quickly.

I’ve thrown this together and it is live at filepush.kevincox.ca. It seems to work fairly well. The code is available as well.

Web Push

The first step is the Web Push API. This is used for signalling a WebRTC transfer (where the actual transfer happens).

Web Push requires a bit of setup. The receiver generates a PushSubscription which is then sent to the sender. The sender then uses this information to send push messages. It looks something like this:

endpoint: https://push.example/secretsubscriptionid
keys:
  p256dh: secrete2ekey
  auth: secretauthkey
expirationTime: null

endpoint is simply the place to send the notification, the push service should route messages sent to this endpoint to the user’s browser. This is generally a server operated by your browser vendor. keys.p256dh is the receiver’s public key. This is what provides end-to-end confidentiality for push messages. keys.auth is a shared authentication key, this ensures that the push service can’t inject messages. If the subscription has a set expiry it would be provided in expirationTime. Firefox and Chromium both appear to have indefinite subscriptions, so I have been ignoring this value for now.

This subscription is simply embedded in the “invite” URL. Once the other device gets this info it can send a push message with its own information to set up bidirectional communication.

Sender Authentication

There is one last wrinkle. Push service operators want to be able to identify senders to resolve issues. This is done via Voluntary Application Server Identification (VAPID) for Web Push. This is a keypair generated by “the server” and used to authenticate pushes. Presumably if your server was buggy or abusive they could attempt to contact you (via the email address in the VAPID signature) or block your key. Then in order to resume notifications you would need to generate a new key and recreate all of your subscriptions.

This creates a wrinkle in our plan since we aren’t supposed to share this key publicly …but the whole app is public. At first, I thought I could generate a keypair per sender, but this doesn’t work because the Push API only allows a single subscription per device, so all senders need to use the same VAPID key for a particular receiver. This effectively means that the key needs to be shared between devices.

For now, I have just hardcoded a keypair into the code. Maybe in the future each device could generate a keypair to be used for incoming notifications, then it could share that keypair with senders as part of the setup. However, I’m slightly worried that 100s of keypairs for a single web origin would set off flags for popular Push Services anyways, so maybe it is best to stick with one for now. I don’t think there is much potential for abuse anyways since the subscription is liked to the application’s origin, so all abusers could do is alert themselves.

Client-Side Push

Next step was sending a push notification. This was relative simple.

fetch(subscription.endpoint, {
	method: "POST",
	headers: {
		urgency: "high",
		authorization: `WebPush ${vapidToken}`,
		ttl: "300",
	},
	body: body.cyphertext,
});

The vapidToken is simply a JWT signed with our “server key” as discussed earlier.

CORS

But oh no, our request is rejected! This is due to a legacy kludge called Cross-Origin Resource Sharing (CORS). This is an opt-out protection system that blocks certain requests (basically anything that wasn’t already possible when the system was introduced). Since our request has headers that aren’t on the CORS safelist we need the server to explicitly approve it. Unfortunately neither Firefox nor Chromium’s Push Services support this opt-in.

But all is not lost. CORS was primarily to avoid issues with automatically-sent information such as Cookies. Since our request doesn’t require cookies CORS is both useless and easy to bypass. We just need a CORS proxy. This is simply a service that authorizes cross-origin requests by doing the CORS dance but otherwise just proxies requests (hopefully stripping cookies). By ensuring cookies are not sent CORS is no longer needed.

However, in addition to CORS the Firefox push service also blocks requests with an Origin header. Most publicly available CORS proxies leave this header in place, so I wrote my own simple proxy that strips this header as well. That was enough to get push working on Firefox and Chromium. I don’t see any reason why other browsers wouldn’t work as well.

Unfortunately this proxy does provide some server-side infrastructure and lowers our decentralization. However, it is incredibly simple and easy to replicate. It would be easy enough to provide an option for users to use a proxy of their choice to bring back full decentralization. Of course, it would be ideal if the browser’s Push Services were simply CORS enabled, maybe I’ll file some bugs with the browsers.

Encryption

You may have noticed that the last request had no body. This is the simple form of push notification that is just a ping. It isn’t clear that there is much value here since all you can tell the user is “Something happened!”. I guess the assumption is that your app will sync with the server to get the details before showing a notification to the user.

But we don’t have a server to sync with, so let’s add a body. Google has some recommended libraries for this however everything I found only supported Node.js. So I had to roll my own crypto!

After a lot of careful code and spec reading I have come up with what is, as far as I can tell, the first browser-based implementation of Web Push encryption. As scary as it is to roll your own crypto I am fairly confident it works correctly as the authenticated encryption would fail if I messed up the core protocol and I tried to avoid problems with edge cases. All of the hard decisions were made by the experts that designed the protocol, so I feel safe using it to transfer sensitive files. Of course review and feedback would be much appreciated.

WebRTC

The Push API only supports message payloads of up to 4078 bytes. While we can send “unlimited” messages I suspect that the Push Service operations wouldn’t be happy if we tried to transfer gigabytes of files. I also doubt that I would be writing home about the performance.

Luckily we have WebRTC. WebRTC is an easy-to-use API for direct peer-to-peer encrypted connections (or optionally relayed connections) with end-to-end encryption. It supports low latency and most importantly for us high bandwidth.

Signalling

WebRTC is a high-level API that manages most of the work for us. The biggest missing piece is connection setup. This is often called “signalling” in WebRTC, although we only need to do initial signalling. Once connected the browser will take care of the rest. Even this is mostly done for us, we just need to shuffle and “offer” from the initiator to the receiver and then shuffle an “answer” back. If all goes well that is enough for the two to establish a connection.

Since we already have Web Push set up it is trivial to coordinate the session setup. It is a bit tricky to juggle the possibilities of the receiver having 0, 1 or multiple tabs open but with a bit of logic we can handle those cases and set up connections.

Transfer

Next up is file transfer. This is fairly easy, just shove the bits into a pipe. WebRTC’s RTCDataChannel even differentiates between “strings” and “bytes” so we can abuse that to send our control data as JSON strings and file data as bytes to avoid needing to have extra framing or tagging there.

The one complication is buffering. RTCDataChannel.send() is very unhelpful. The docs contain great quotes like:

NetworkError DOMException

Thrown when the specified data would need to be buffered, and there isn’t room for it in the buffer. In this scenario, the underlying transport is immediately closed.

How big is the buffer? I can’t see any way to find out. But if you fill it you immediately have your connection closed. That sucks.

To “solve” this issue I used the incredibly scientific approach of playing Numberwang with parameters until it worked. I ended up sending ¼ MiB chunks and waiting until the currently buffered amount was ¼ MiB or less. It seems like the buffer size is about 1 MiB. Not satisfying, but works in Firefox and Chromium. I didn’t bother writing code to re-establish the connection with a smaller chunk size and resume after hitting the limit yet. Maybe the API will improve at some point.

The performance is decent, but not fantastic. However, I’m not sure that I can do much better with this many layers of abstraction between the network and the filesystem. Unless you are transferring very high quality movies you are unlikely to notice the slowness.

Download

Another complication is downloading the file. It surprised me that the web doesn’t have any proper API for starting a download from JavaScript. The standard solution is:

const LINK = document.createElement("a");
LINK.style.display = "none";
document.body.append(LINK);

function saveFile(name: string, blob: Blob) {
	var url = URL.createObjectURL(blob);
	LINK.setAttribute("href", url);
	LINK.setAttribute("download", name);
	LINK.click();
	URL.revokeObjectURL(url);
}

Yup, create a link and fake a click. It feels wrong but it works.

However, there is a major issue with this. It doesn’t appear possible to start a streaming download. Ideally we could start streaming to disk as soon as the transfer starts (like downloading a regular file over HTTP) but there isn’t an API for that. I’ve found that you can use a ReadableStream then use that to create a Response which you can then convert to a Blob using Response.blob(). However, Response.blob() doesn’t resolve until the entire response is downloaded. This seems to be required since Blob.size needs to be known to create the Blob and ReadableStream doesn’t know the size until the producer closes the controller. Response.blob seems to buffer the stream to disk then a new copy will be started when the file is “downloaded” outside of the browser. This means that the user effectively needs to wait for two copies of the file. It isn’t ideal but at least arbitrarily large files can be copied, and it is much cleaner (and probably much more efficient) than my previous approach that cached chunks to disk using IndexedDB.

Conclusion

That’s basically it. I think it is pretty cool that we can have cross-platform, decentralized device-to-device communication without a server (excluding the proxy).

The app itself could use a little polish, but the transfer works well. I’ve proved out the concept and made myself a useful tool.