Automated Detection and Fingerprinting of Censorship Block Pages


One means of enforcing Web censorship is to return a block page, which informs the user that an attempt to access a webpage is unsuccessful. Detecting block pages can provide a more complete picture of Web censorship, but automatically identifying block pages is difficult because Web content is dynamic, personalized, and may even be in different languages. Previous work has manually detected and identified block pages, which is difficult to reproduce; it is also time-consuming, which makes it difficult to perform continuous, longitudinal studies of censorship. This paper presents an automated method both to detect block pages and to fingerprint the filtering products that generate them. Our automated method enables continuous measurements of block pages; we found that our methods successfully detect 95% of block pages and identify five filtering tools, including a tool that had not been previously identified “in the wild”.

In Proceedings of the 2014 Conference on Internet Measurement Conference