Should I try multiple optimizers when fine-tuning a pre-trained Transformer for NLP tasks? Should I tune their hyperparameters?

Nefeli Gkouti | Prodromos Malakasiotis | Stavros Toumpis | Ion Androutsopoulos |

Paper Details:

Month: March
Year: 2024
Location: St. Julian’s, Malta
Venue: EACL |