YouTube's Wonky WebSub

Posted

Last updated

I was implementing WebSub for FeedMail and was disappointed that YouTube’s feeds don’t have support. While disappointing this was not much of a surprise, YouTube’s feeds are well-known to be very basic and under-loved.

But then I stumbled across https://developers.google.com/youtube/v3/guides/push_notifications. Wait a second! PubSubHubbub support? PubSubHubbub was the original name for WebSub, so it sounded like it should work, but why didn’t FeedMail detect support?

Not Advertised.

You are supposed to advertise support for WebSub by retuning a hub reference in the feed. This can either be a <link> element in the feed document or a Link: header. YouTube does neither. No wonder FeedMail thought that YouTube didn’t support it!

No problem, this should be easy to fix! Just special case YouTube feeds and inject the correct hub.

if url.as_str().starts_with("https://www.youtube.com/feeds/videos.xml?") {
	fetched_feed.links.push_value(
		hyperx::header::LinkValue::new("https://pubsubhubbub.appspot.com/")
			.push_rel(hyperx::header::RelationType::Hub));
}

Wrong Topic URL

Now FeedMail could subscribe, but it wasn’t receiving updates. There are more layers of weirdness to deal with.

WebSub has the concept of a Topic URL which is basically the url of the feed. Unfortunately I realized that YouTube required something special.

Set the topic URL to https://www.youtube.com/xml/feeds/videos.xml?channel_id=CHANNEL_ID, where CHANNEL_ID is the YouTube channel ID for which you want to retrieve push notifications.

So instead of using the feed URL (http://www.youtube.com/feeds/videos.xml?channel_id=CHANNEL_ID) you need to use a slightly different URL. Notice https:// and /xml/ that differ.

Ok, I guess we can swap that out, and swap it back when handling callbacks.

if self_link.as_str().starts_with("http://www.youtube.com/feeds/videos.xml?") {
	let mut topic = self_link.clone();
	topic.set_scheme("https").unwrap();
	topic.set_path("xml/feeds/videos.xml");
	Some(topic)
} else {
	None
}

Custom Payload

Now we were finally getting notified of new videos! But something was wrong, the feed title was messed up and the body of the email FeedMail sent me was empty.

In a PubSubHubbub or WebSub ping the body should be the content of the feed. The only exception for Atom feeds is that previously existing <entry> elements can be removed. YouTube does not do this, they send a custom payload which notably:

At this point I really reconsidered if it was worth supporting this largely proprietary setup. It may have been sunk-cost fallacy, but I had already done most of the work. At this point I decided that I would just use the “WebSub” ping from YouTube as a notification, I would fall back to fetching the source feed to actually send notifications.

+ if broken_push(&final_url) {
+ 	crate::fetch_feed(global, &req).await?
+ } else {
  	let feed = feed_rs::parser::parse_with_uri(
  		std::io::Cursor::new(body),
  		Some(final_url.as_str()))?;
  
  	crate::FetchedFeed{
  		feed: Some(feed),
  		links,
  		final_url,
+ 	}
+ }

Part of the reason that I decided to support YouTube is that this type of bug is likely to come up for other hubs as well, so it is nice to have the option to maintain a list of broken feeds that use WebSub for notification only.

Well there we go. Working push notifications for YouTube!

Caching and Visibility Issues

To make this all worse fetching the feed after a ping often returns a feed that doesn’t contain the indicated entry. The exact reason is only known to YouTube but I suspect that there are at least three causes.

  1. I think there is some sort of caching in play. Often the ping happens as the video is published but it is not present in the feed. It seems likely that a stale feed is returned that doesn’t yet have this video.
  2. I think pings are sent for private videos. For example, I often see pings significantly before a video is published. This is likely while the video is private before publishing. (I wonder if you can pick up unlisted videos this way?)
  3. Old videos that are no longer on the feed are sometimes pinged. My guess is that these videos were updated in some way so a ping is sent.

Implementation Curiosity

The thing that confuses me the most about this is that the hub YouTube uses (pubsubhubbub.appspot.com), while being owned by Google, is a general purpose hub. (In fact it is probably the most popular hub!) It seems like it would need custom code-paths to support this wonky YouTube WebSub. I guess there is a secret endpoint where YouTube can push the content for a specific URL, that way the push can be sent out without the content ever having actually been accessible at that URL.

Stub Curiosity

Even weirder is the stub at the URL you subscribe to.

% curl https://www.youtube.com/xml/feeds/videos.xml?channel_id=CHANNEL_ID
<?xml version="1.0" encoding="utf-8"?>
<!--
  This is a static file. The corresponding feed (identified in the
  <link rel="self"/> tag) should be used as a topic on the
  http://pubsubhubbub.appspot.com hub to subscribe to video updates.
-->
<feed xmlns="http://www.w3.org/2005/Atom">
  <link rel="hub" href="http://pubsubhubbub.appspot.com"/>
  <link rel="self" href="https://www.youtube.com/xml/feeds/videos.xml" />
  <title>YouTube video feed</title>
</feed>

This almost looks like a fake feed that you could subscribe to for notifications. However, as the comment suggests it is just a static file. Most notably the <link rel=self> doesn’t contain a channel ID! The comment seems to suggest that the URL is correct, but it clearly doesn’t contain the required information!

I think that if the channel ID was simply included in the self URL this would be a good enough fake feed that most RSS readers would allow you to subscribe to it, and if they supported WebSub you would even get updates! However, without the working link this file is just useless, I wonder if this is just to bypass some check in pubsubhubbub.appspot.com that would otherwise complain about 404s, but do you know what else would bypass 404s? JUST USING THE FEED URL AS THE TOPIC URL!

YouTube, please fix?

Looking at this it seems pretty easy to fix, it would just take a couple of changes and wouldn’t break the API for existing users.

  1. Add the hub link to the regular feed.
  2. Ping the new endpoint in addition to pushing the custom payload to the previous endpoint.

Then the old API can be removed according to YouTube’s usual deprecation policy if desired.

This would make YouTube channel feeds WebSub compatible without affecting existing users. If we can just get the attention of someone at YouTube this would be a nice 20% project. Surely there are no politics around here 😅