Improving Deep Transformer with Depth-Scaled Initialization and Merged Attention

Biao Zhang | Ivan Titov | Rico Sennrich |

Paper Details:

Month: November
Year: 2019
Location: Hong Kong, China
Venue: EMNLP |
SIG: SIGDAT

Citations

URL