Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus

Jesse Dodge | Maarten Sap | Ana Marasović | William Agnew | Gabriel Ilharco | Dirk Groeneveld | Margaret Mitchell | Matt Gardner |

Paper Details:

Month: November
Year: 2021
Location: Online and Punta Cana, Dominican Republic
Venue: EMNLP |