I made this by mining the “Related:” comments that people make. I figured they might be a way to link together the vast canvas of Hacker News to itself, and aggregates in one place discussions that spanned time and threads.
My main goal was seeing HN respond to stories that develop over time.
Technically, I just scanned the 46 million HN items and aggregated clusters via these related comments. That results in a small corpus of 36500 clusters. Also doing a quick bag-o-words sanity check to ensure titles were somewhat consistent.
There’s multiple ways to sort it, so you can explore. Hope you enjoy finding some stuff here!
Thanks bud! Yeah, I thought it surfaced some really interesting stories and it kind of reminded me of pagerank like focusing on the links between things.
I think I could take it even further by doing some kind of like betweeness centrality analysis to sort of rank the most interlinked stories within the hacker news cohort, but this was like the first cut to just demonstrate that these human intelligence annotations (the related comments) could usefully cluster the stories in a way that resulted in an interesting presentation. And I think we achieved it!
I made this by mining the “Related:” comments that people make. I figured they might be a way to link together the vast canvas of Hacker News to itself, and aggregates in one place discussions that spanned time and threads.
My main goal was seeing HN respond to stories that develop over time.
Technically, I just scanned the 46 million HN items and aggregated clusters via these related comments. That results in a small corpus of 36500 clusters. Also doing a quick bag-o-words sanity check to ensure titles were somewhat consistent.
There’s multiple ways to sort it, so you can explore. Hope you enjoy finding some stuff here!
this is fascinating! cool ship
Thanks bud! Yeah, I thought it surfaced some really interesting stories and it kind of reminded me of pagerank like focusing on the links between things.
I think I could take it even further by doing some kind of like betweeness centrality analysis to sort of rank the most interlinked stories within the hacker news cohort, but this was like the first cut to just demonstrate that these human intelligence annotations (the related comments) could usefully cluster the stories in a way that resulted in an interesting presentation. And I think we achieved it!