Getting Started
The model weights can be downloaded from Andrew Karpathy's HuggingFace tinyllamas repo. The default tokenizer, which is compatible with stories110M.bin, stories15M.bin, and stories42M.bin, can be downloaded from our repository or from the llama2.c repository.
Add the package to your local environment via Pkg by running
add https://github.com/kleincode/Llama2.jlImport the package using
julia> using Llama2Generate text
config, weights = read_karpathy("bin/transformer/stories15M.bin") # replace with path to model weights
state = RunState{Float32}(config)
transformer = Transformer{Float32}(config, weights, state)
tokenizer = Tokenizer("bin/tokenizer/tokenizer.bin", config.vocab_size) # replace with path to tokenizer
sampler = Sampler{Float32}(1.0f0, 0.9f0, 420)
prompt = "Once upon a"
generate(transformer, tokenizer, sampler, prompt)"Once upon a time, there was a little girl named Lily. She loved to help her mommy in the kitchen. One day, her mommy was making some yummy cookies and asked Lily to help her. Lily was so excited! \nShe put on her apron and stood on a stool so she could reach the cookies. Her mommy was so proud of her independent little girl. They mixed the ingredients together and put the cookies in the oven. \nAfter a little while, the cookies were ready and they smelled delicious. Lily's mommy let her have a slice and they both enjoyed the warm, tasty cookie. From that day on, Lily loved to help her mommy in the kitchen and help cook yummy treats."The generate method has several optional keyword arguments, check out its docs for more information.
Use the tokenizer
The tokenizer encodes text into vectors of tokens (integers) and decodes tokens back to text. Usually, the token translation table is read from a file like here:
julia> tokenizer = Tokenizer("bin/tokenizer/tokenizer.bin", 32000)
Tokenizer{Float32}(["<unk>", "\n<s>\n", "\n</s>\n", "<0x00>", "<0x01>", "<0x02>", "<0x03>", "<0x04>", "<0x05>", "<0x06>" … "ὀ", "げ", "べ", "边", "还", "黃", "왕", "收", "弘", "给"], Dict("âr" => 28727, " properly" => 6285, "chem" => 14970, " patients" => 22070, " Plan" => 8403, "<0x2A>" => 46, "рос" => 10375, "null" => 4305, "rę" => 15387, "ört" => 21069…), Float32[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 … -31731.0, -31732.0, -31733.0, -31734.0, -31735.0, -31736.0, -31737.0, -31738.0, -31739.0, -31740.0])
julia> encode(tokenizer, "Tokens are beautiful! :))")
8-element Vector{Int64}:
2
11891
576
527
9561
29992
585
877
julia> decode(tokenizer, 2, 11891)
"Tok"For more information and how to manually define a tokenizer, check out Tokenizer.
Use the sampler
The sampler's job is to sample (select) the next token from the raw output logits generated by the transformer. In the normal case, it applies a softmax to the logits and samples from the resulting multinomial probability distribution. If the temperature is zero, this is equivalent to just taking the maximum value and becomes deterministic.
julia> sampler_mult = Sampler{Float64}(0.5, 0.0, 1)
Sampler{Float64}(0.5, 0.0, Random.MersenneTwister(1))
julia> [sampler_mult([-0.5, 0.5, 0.2]) for i in 1:10]
10-element Vector{Int64}:
2
2
2
1
2
2
3
3
2
3
julia> sampler_det = Sampler{Float64}(0.0, 0.0, 1)
Sampler{Float64}(0.0, 0.0, Random.MersenneTwister(42))
julia> [sampler_det([-0.5, 0.5, 0.2]) for i in 1:10]
10-element Vector{Int64}:
2
2
2
2
2
2
2
2
2
2
julia> sampler_topp = Sampler{Float64}(1.0, 0.5, 1)
Sampler{Float64}(1.0, 0.5, Random.MersenneTwister(1))
julia> [sampler_topp([-0.5, 0.5, 0.2]) for i in 1:10]
10-element Vector{Int64}:
2
2
2
2
2
2
3
3
2
3The sampler can work with arbitrary Real types. For more information, check out Sampler.