Adding live coverage and solving topic fatigue

2023-09-24

Three problems bugged News Minimalist from the start: stale news, bad updates timing, and repetitive content. I think I solved them today.

Problems

Since the very beginning of the project, the news was grouped by days. The algorithm collected news for 24 hours, rated all articles, and showed the results for the next 24 hours. The system was built around a newsletter that was sent daily. But it had three major problems.

First, the news was becoming stale. Since the website was static for 24 hours, it considerably lagged behind the events by the end of that window. The problem was even worse than it appeared: since the website didn't change for 24 hours after the update, in the worst-case scenario you could read news with a 48-hour delay!

Second, the website is updated too late for many parts of the world. If you're in the Americas, the website was updated from 5 to 9 a.m. based on your timezone, letting you read fresh news in the morning. But for other parts of the world, the update came too late: around lunch for Europe and dinner in Asia.

The third problem was that deduplication only worked on news within 24 hours. The next day, the deduplication algorithm analyzed the entirely new batch of articles with no overlap. That led to people seeing news about the same event every day, leading to "topic fatigue" - where a single event is covered so much that it's not interesting anymore and causes frustration.

Solution

News Minimalist switches to live coverage - now the website is updated every hour, letting people learn about new events almost in real-time and when they find it convenient.

The deduplication algorithm has been improved. It now works continuously, analyzing a sliding 72-hour window of articles instead of 24-hour non-overlapping batches. This extends the purpose of the algorithm: from finding plain duplicates, to organizing articles into "stories" covering one event. Now, the important change here is that stories are sorted by their start time: new stories at the top, and older at the bottom. Once the oldest article in a story falls out of the 72-hour window, it gets replaced by a slightly newer but still old article, keeping the story at the bottom. This should solve the issue of "topic fatigue" - something entirely new and unexpected must happen for an article to break away from the story and form a new story that would appear at the top. The older story will still be available at the bottom and will continue to give ongoing coverage of the event.

I also made another change that some might find negative: the site now doesn't have historical data going further than 3 days back. Very few people ever browsed more than 3 days back (less than 0.5%). Maintaining it in the new system was too complex, so I decided to remove it. The removal considerably simplifies the architecture, making the site more light and letting me make any further improvements faster.

Closing notes

This was a major rewrite and I might have missed some bugs. If you find anything, please send me a quick note to hello@mail.newsminimalist.com, and I'll try to fix it as soon as possible.

This is a new chapter for News Minimalist. I hope you'll like it.

Thank you all,
Vadim