NLPExplorer
  • Papers
  • Venues
  • Authors
  • Authors Timeline
  • Field of Study
  • URLs
  • ACL N-gram Stats
  • TweeNLP
  • API
  • Team

Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

Luca Soldaini | Rodney Kinney | Akshita Bhagia | Dustin Schwenk | David Atkinson | Russell Authur | Ben Bogin | Khyathi Chandu | Jennifer Dumas | Yanai Elazar | Valentin Hofmann | Ananya Jha | Sachin Kumar | Li Lucy | Xinxi Lyu | Nathan Lambert | Ian Magnusson | Jacob Morrison | Niklas Muennighoff | Aakanksha Naik | Crystal Nam | Matthew Peters | Abhilasha Ravichander | Kyle Richardson | Zejiang Shen | Emma Strubell | Nishant Subramani | Oyvind Tafjord | Evan Walsh | Luke Zettlemoyer | Noah Smith | Hannaneh Hajishirzi | Iz Beltagy | Dirk Groeneveld | Jesse Dodge | Kyle Lo |

Paper Details:

Month: August
Year: 2024
Location: Bangkok, Thailand
Venue: ACL |

Citations

URL

No Citations Yet

  • https://www
  • https://www.regulations
  • https://github.com/commoncrawl/
  • https://ssrn.com/abstract=4634513
  • https://creativecommons.org/
  • https://www.regulations.gov/
  • https://github.com/microsoft/
  • https://huggingface.co/spaces/
  • https://github
  • https://stability.ai/news/
  • https://github.com/

Field Of Study