Effects of Parameter Norm Growth During Transformer Training: Inductive Bias from Gradient Descent

William Merrill | Vivek Ramanujan | Yoav Goldberg | Roy Schwartz | Noah A. Smith |

Paper Details:

Month: November
Year: 2021
Location: Online and Punta Cana, Dominican Republic
Venue: EMNLP |

Citations

URL