Performance
The forward! method is optimized for quick execution in the following ways:
- All operations (matrix multiplication, softmax, swiglu) are performed in-place. [#35]
- All operations use views (using the
@viewsmacro). [#35] - The RoPE loop for calculating the positional embeddings supports multithreading using the
@threadsmacro. [#37]
For benchmarks of these optimizations, check out the linked pull requests.