NLPExplorer
  • Papers
  • Venues
  • Authors
  • Authors Timeline
  • Field of Study
  • URLs
  • ACL N-gram Stats
  • TweeNLP
  • API
  • Team

Tokenizer Choice For LLM Training: Negligible or Crucial?

Mehdi Ali | Michael Fromm | Klaudia Thellmann | Richard Rutmann | Max Lübbering | Johannes Leveling | Katrin Klug | Jan Ebert | Niclas Doll | Jasper Buschhoff | Charvi Jain | Alexander Weber | Lena Jurkschat | Hammam Abdelwahab | Chelsea John | Pedro Ortiz Suarez | Malte Ostendorff | Samuel Weinbach | Rafet Sifa | Stefan Kesselheim | Nicolas Flores-Herr |

Paper Details:

Month: June
Year: 2024
Location: Mexico City, Mexico
Venue: F | i | n | d | i | n | g | s | - | N | A | A | C | L |

Citations

URL

No Citations Yet

  • https://arxiv
  • https://oscar-project.org/
  • https://github.com/oscar-project/ungoliant
  • https://metatext.io/datasets/all-the-news-2
  • https://www.bundestag.de/dokumente/
  • https://www.bundesgerichtshof.de/DE/
  • https://pub.cl.uzh.ch/wiki/public/costep/
  • https://joint-research-centre.ec
  • https://www.dnb.de/DE/Professionell/Services/
  • https://researchdata.tuwien.ac.at/records/
  • https://pub.cl.uzh.ch/wiki/public/pacoco/
  • https://pub.cl.uzh.ch/wiki/public/pacoco/
  • https://opus.nlpl.eu/OpenSubtitles-v2018.php
  • https://www.opensubtitles.org/de/index.cgi
  • https://github.com/EleutherAI/

Field Of Study