Researchers mine 2.5M news articles to prove what we already know


A group of British researchers has published the results of a data mining experiment that analyzed nearly 2.5 million articles from 498 newspapers on criteria such as topic selection, writing style and sensationalism, and found — no surprise — that tabloids are the easiest to read and reporters don’t often cover women’s sports. If these findings sound predictable, that was exactly what the researchers were aiming for.

The experiment’s techniques actually point to a future where researchers are spared the grunt work of poring through thousands of pages of news or watching hundreds of hours of programming, and can actually focus their energy of explaining. As the researchers note in their paper, the real ramifications of this research lie more in what it accomplished than in what it found.

Namely, they demonstrated that with new big data techniques such as machine learning and natural-language processing, it’s possible to accurately analyze millions…

