ICLab Data

ICLab

Last updated on Jan 20, 2020 3 min read

ICLab has been running and measuring Internet censorship since late 2016. We are happy to share the analyzed data that we use in our recent paper: ICLab: A Global, Longitudinal Internet Censorship Measurement Platform, accepted to the IEEE Symposium on Security and Privacy 2020.

Our data is hosted on several platforms for public access. Please contact us if you encounter any issue when downloading the data.

Google Drive
Internet Archive (For older dataset only)

Our new public data format is CSV-format text files, encoded in UTF-8. Fields are separated by commas and quoted with ". Caution: some fields can contain commas; use a true CSV parser, don’t just split lines on /,/.

filename - Name of the raw data file (for internal use)
server_t - Date and time when the measurement was conducted, in ISO 8601 format (e.g., 2017-01-01T00:03:55.797+00:00)
country - Country where the measurement was conducted, as an ISO 3166-1 alpha-2 country code
as_number - Autonomous System number of the network from which the measurement was conducted
schedule_name - Internal label for the list of URLs tested in this measurement (e.g., alexa-global, citizenlab-global)
url - URL on the test list
domain - Domain name of the URL on the test list
final_url - URL reached by the client after following HTTP redirections
sanitized - Whether the location of the VPN is sanitized.
dns - Outcome of the DNS tampering analysis: one of the codewords normal, tampered, or uncertain.
dns_reason - Details of the DNS tampering analysis
field_answers - DNS responses from the field measurement.
field_nameserver - The nameserver that the DNS query was sent to in the field measurement.
control_answers - DNS responses from the control measurement.
control_nameserver - The nameserver that the DNS query was sent to in the control measurement.
http_status - HTTP status code for the final page load in the redirection chain (e.g., 200, 451)
blockpage_reason - none if no blockpage was detected, or an internal identifier for the regular expression that matches this type of blockpage
packet_injection - Outcome of the packet injection analysis: one of the codewords censored, missing data, none, not censored, probably censored, or uncertain.
packet_field_category - Classification of any anomalous packets observed during this measurement.
ports - The port on which the packet anomaly is happening.
packet_control_category - Classification of any anomalous packets observed during a matching measurement of this site from a control mode.
censored - Final assessment of this measurement: true or false for censored or not censored.

Caution: Our block-page detection regexps are known to trigger on some sites that refuse access to clients from specific countries and/or when they detect use of a VPN, as well as block pages actually injected by a censor in an intermediate network.
It is debatable whether refusal of access by a site for these reasons should be considered censorship; we are currently counting them as such in the censored column and our summary statistics.

Our older data (prior to 2020) is in CSV format with the following columns:

filename: name of raw data file (for internal use)
server_t: the timestamp of when the measurement was conducted (e.g., 2017-01-01T00:03:55.797Z)
country: country code ISO alpha-2
as_number: Autonomous System Number
schedule_name: web test lists( i.e., Alexa global top list, CitizenLab, or Berkman center)
url
dns
dns_reason: true = manipulated, false = unmanipulated
dns_all
dns_reason_all
http_status
block: true = blockpages, false = normal
body_len
http_reason
packet_updated: true = injected, false = no injection
packet_reason
censored_updated: true = censored, false = uncensored

ICLab paper, censorship measurement

ICLab Data

ICLab

by Calipr Networking Group