Global Internet Freedom

About The Workshop

A workshop dedicated to NLP methods that potentially contribute (either positively or negatively) to the free flow of information on the Internet, or to our understanding of the issues that arise in this area.

NLP4IF Proceedings

The workshop is supported by the U.S. National Science Foundation, award No. #1828199 Students can apply for travel grants. For more information, please, contact Anna Feldman (

The topics of interest include (but are not limited) to the following:

  • Censorship detection: detecting deleted or edited text; detecting blocked keywords/banned terms;
  • Censorship circumvention techniques: linguistically inspired countermeasure for Internet censorship such as keyword substitution, expanding coverage of existing banned terms, text paraphrasing, linguistic steganography, generating information morphs etc.;
  • Detection of self-censorship;
  • Identifying potentially censorable content;
  • Disinformation/Misinformation detection: fake news, fake accounts, rumor detection, etc.;
  • Techniques to empirically measure Internet censorship across communication platforms;
  • Investigations on covert linguistic communication and its limits;
  • Identity and private information detection;
  • Passive and targeted surveillance techniques;
  • Ethics in NLP;
  • “Walled gardens”, personalization and fragmentation of the online public space;
  • We hope that our workshop will promote Internet freedom in countries where accessing and sharing of information are strictly controlled by censorship.


    Schedule Detail

    • 09:00-10:00

      Invited talk: Jennifer Pan (Stanford University): How the Chinese Government Fabricates Social Media Posts for Strategic Distraction, Not Engaged Argument slides

    • event speaker


      The effect of information controls on developers in China: An analysis of censorship in Chinese open source projects (Jeffrey Knockel, Masashi Crete-Nishihata and Lotus Ruan) slides,data

      By Jeff Knockel
    • event speaker


      Coffee break

    • event speaker


      Invited talk: Jed Crandall (University of New Mexico): How to Talk Dirty and Influence Machines slides

    • 12:00-12:30

      Linguistic Characteristics of Censorable Language on SinaWeibo (Kei Yin Ng, Anna Feldman, Jing Peng and Chris Leberknight) slides,data

    • event speaker



    • event speaker


      Invited Talk: Nancy Watzman (Dot Connector Studio): What do journalists really want from NLP researchers? How to help build trust in media and democracy by helping journalists make sense of big data slides

    • event speaker


      Creative Language Encoding under Censorship (Heng Ji and Kevin Knight) [slides available upon request]

      By Heng Ji
    • event speaker


      Coffee break

    • event speaker


      Panel: NLP and Disinformation (Moderator: Chris Brew; Panelists: Jed Crandall, Heng Ji, Veronica Perez-Rosas, Nancy Watzman)

    Our Speakers

    Dr. Jennifer Pan (Stanford University, CA)

    Assistant Professor

    Dr. Jedidiah Crandall (University of New Mexico)

    Associate Professor

    Nancy Watzman (Dot Connector Studio)

    Managing Editor, Television Archive


    Sante Fe Community Convention Center

    201 W Marcy St, Santa Fe, NM 87501, USA

    The NLP4IF Workshop is held in conjunction with 27th International Conference on Computational Linguistics (COLING 2018) that will take place in Santa Fe, New-Mexico, USA. COLING 2018 will be held at the Santa Fe Community Convention Center from August 20th through 26th 2018

    Important Dates

    Workshop submission deadline: May 25, 2018 notification: June 20, 2018 camera-ready submission deadline: June 30, 2018 workshop date: August 20, 2018.


    According to the recent report produced by Freedom House (, an “independent watchdog organization dedicated to the expansion of freedom and democracy around the world”, Internet freedom declined in 2016 for the sixth consecutive year. 67% of all Internet users live in countries where criticism of the government, military, or ruling family are subject to censorship. Social media users face unprecedented penalties, as authorities in 38 countries made arrests based on social media posts over the past year. Globally, 27 percent of all internet users live in countries where people have been arrested for publishing, sharing, or merely “liking” content on Facebook. Governments are increasingly going after messaging apps like WhatsApp and Telegram, which can spread information quickly and securely. Various barriers exist to prevent citizens of a large number of countries to access information. Some involve infrastructural and economic barriers, others violations of user rights such as surveillance, privacy and repercussions for online speech and activities such as imprisonment, extralegal harassment or cyberattacks. Yet another area is limits on content, which involves legal regulations on content, technical filtering and blocking websites, (self-)censorship. Large internet providers are effective monopolies, and themselves have the power to use NLP techniques to control information flow. Users are suspended or banned, sometimes without human intervention, and with little opportunity for redress. Users react to this by using coded, oblique or metaphorical language, by taking steps to conceal their identity such as the use of multiple accounts, raising questions about who the real originating author of a post actually is. This workshop should bring together NLP researchers whose work contributes to the free flow of information on the Internet.
    Submissions should be written in English and anonymized with regard to the authors and/or their institution (no author-identifying information on the title page nor anywhere in the paper), including referencing style as usual. Authors should also ensure that identifying meta-information is removed from files submitted for review. Submissions must use the Word or LaTeX template files provided by COLING 2018 and conform to the format defined by the COLING 2018 style guidelines. * Long paper submission: up to 8 pages of content, plus 2 pages for references; final versions of long papers: one additional page: up to 9 pages with unlimited pages for references * Short paper submission: up to 4 pages of content, plus 2 pages for references; final version of short papers: up to 5 pages with unlimited pages for references PDF files must be submitted electronically via the [START submission system]( The recommended style files are [available from the COLING repository]( Double submission policy: Parallel submission to other meetings or publications are possible but must be immediately notified to the workshop contact person. If accepted, withdrawals are only possible within two days after notification.
    To register, please go to
  • Chris Brew, Computational Research Scientist, Digital Operatives:
  • Anna Feldman, Professor of Linguistics and Computer Science at Montclair State University.
  • Chris Leberknight, Associate Professor of Computer Science at Montclair State University.
  • Joan Bachenko, Deception Discovery Technologies, NJ
  • Jedidiah Crandall, University of New Mexico, NM
  • Chaya Hiruncharoenvate, Mahasarakham University
  • Lifu Huang, Rensselaer Polytechnic Institute (RPI), NY
  • Zubin Jelveh, The University of Chicago
  • Judith Klavans, Columbia University, NY
  • Jeffrey Knockel, University of New Mexico, NM
  • Will Lowe, Princeton University
  • Rada Mihalcea, University of Michigan, Ann Arbor, MI
  • Prateek Mittal, Princeton University, NJ
  • Rishab Nithyanand, Data and Society, NY
  • Noah Smith, University of Washington
  • Thamar Solorio, University of Houston, TX
  • Mahmood Sharif, Carnegie Mellon University, PA
  • Evan Sultanik, Trail of Bits, NY
  • Svitlana Volkova, Pacific Northwest National Laboratory, WA
  • Brook Wu, NJIT, NJ
  • Mailing list for the workshop(!forum/nlp4if)