Ambrose Li

The future Internet is unsearchable, unarchivable, inaccessible. The future is here

I have stopped regularly reading normal websites for news. For almost half a year I have been getting my news almost solely through YouTube.

I have the La Presse and CBC apps installed on my phone, and used to read La Presse regularly, but now I rarely use them.

I used to not watch videos. That changed about half a year ago when I got so fed up with Facebook I virtually quitted cold-turkey. There went my connection to normal news sources since I only read news articles when friends post them on Facebook.

I started listening to political commentary from a critic I was following, and started searching for information on random topics on YouTube. To my surprise, a lot of stuff is on YouTube that normal search engines don’t know about.

It’s not just the search. So much of the material is uncaptioned or poorly or incorrectly captioned, even if text search worked you wouldn’t be able to find anything. Some content creators intentionally keep their videos uncaptioned; the critic I mentioned earlier does this, to control vandalism.

Referring back to videos you’ve watched is just impossible. Like I mentioned, you can’t really search them. All you can do is to keep notes; if you don’t have impeccable notes, you can spend hours looking for a short quote, in vain, even though you’re absolutely sure it’s in one of these videos that you’ve watched.

I’ve started experimenting with indexing, but indexing videos is a huge amount of work. The critic I mentioned earlier started vlogging about two and a half years ago only because he was cyberbullied, and started posting daily to literally prove he’s still alive. At first he posted daily, each video a few minutes long. Soon, he started posting twice a day, each video about 20 minutes long. For months he has been posting 2–3 videos per day (and he’s now experimenting with 4 videos per day), he was posting so much content his doctor told him to cut down on work to get some rest. To get a rough idea of how much content came from this one content creator, assuming it’s been twice per day over the past 2.5 years, we’re talking about over 600 hours of video content, all uncaptioned, all unsearchable.

To keeping doing my indexing experiment means I can no longer just listen to videos, because I’d have to index. Transcriptionists know transcribing a video takes 3x to 6x the video time; indexing might take less time, but you still need to pause, rewind, write things down, and repeat until you finish. If you‘re dealing with premieres you have to rewatch it just to index. Then you need to go over your index to tweak it.

You might think AI can solve this? If you’ve used auto captions (e.g., at conferences you know how inaccurate auto captions are. Ten years ago when I volunteered for Coursera things were terrible; today, after ten years, things are less terrible but still terrible.

I recently attended a conference that used some kind of captioning service (perhaps human captioners). The English presentations were captioned and the captions contained errors, as expected; the French presentations were completely uncaptioned.

There you go. The future of the Internet is videos: all unsearchable, unarchived, and unarchivable; and the future is already here.