ICLab has been running and measuring Internet censorship since late 2016. We are happy to share the analyzed data that we use in our recent paper: ICLab: A Global, Longitudinal Internet Censorship Measurement Platform, accepted to the IEEE Symposium on Security and Privacy 2020.
Our data is hosted on several platforms for public access. Please contact us if you encounter any issue when downloading the data.
Our new public data format is CSV-format text files, encoded in UTF-8.
Fields are separated by commas and quoted with
". Caution: some
fields can contain commas; use a true CSV parser, don’t just split
filename- Name of the raw data file (for internal use)
server_t- Date and time when the measurement was conducted, in ISO 8601 format (e.g., 2017-01-01T00:03:55.797+00:00)
country- Country where the measurement was conducted, as an ISO 3166-1 alpha-2 country code
as_number- Autonomous System number of the network from which the measurement was conducted
schedule_name- Internal label for the list of URLs tested in this measurement (e.g.,
url- URL on the test list
domain- Domain name of the URL on the test list
final_url- URL reached by the client after following HTTP redirections
sanitized- Whether the location of the VPN is sanitized.
dns- Outcome of the DNS tampering analysis: one of the codewords
dns_reason- Details of the DNS tampering analysis
field_answers- DNS responses from the field measurement.
field_nameserver- The nameserver that the DNS query was sent to in the field measurement.
control_answers- DNS responses from the control measurement.
control_nameserver- The nameserver that the DNS query was sent to in the control measurement.
http_status- HTTP status code for the final page load in the redirection chain (e.g., 200, 451)
noneif no blockpage was detected, or an internal identifier for the regular expression that matches this type of blockpage
packet_injection- Outcome of the packet injection analysis: one of the codewords
probably censored, or
packet_field_category- Classification of any anomalous packets observed during this measurement.
ports- The port on which the packet anomaly is happening.
packet_control_category- Classification of any anomalous packets observed during a matching measurement of this site from a control mode.
censored- Final assessment of this measurement:
falsefor censored or not censored.
Caution: Our block-page detection regexps are known to trigger on some sites that
refuse access to clients from specific countries and/or when they detect use of a VPN,
as well as block pages actually injected by a censor in an intermediate network.
It is debatable whether refusal of access by a site for these reasons should be considered censorship; we are currently counting them as such in the
censored column and our summary statistics.
Our older data (prior to 2020) is in
CSV format with the following columns:
filename: name of raw data file (for internal use)
server_t: the timestamp of when the measurement was conducted (e.g., 2017-01-01T00:03:55.797Z)
country: country code ISO alpha-2
as_number: Autonomous System Number
schedule_name: web test lists( i.e., Alexa global top list, CitizenLab, or Berkman center)
dns_reason: true = manipulated, false = unmanipulated
block: true = blockpages, false = normal
packet_updated: true = injected, false = no injection
censored_updated: true = censored, false = uncensored