Post

llama.cpp build

llama.cpp build

✅ Why ?

While commercial tools like ChatGPT and Copilot offer significant benefits, security concerns, especially when they contain confidential information, make them unsafe to use. Even with the RAG threat…
For fine tuning, llama.cpp is needed to quantize or transform the model.
This is especially true when using the GPU on your desktop at home rather than the GPU in CoLab.

✅ On Ubuntu 22.04 LTS

✅ Preparation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
### ubuntu update error
sudo rm -rf /var/lib/apt/lists/*
sudo apt clean
sudo apt update
sudo apt upgrade

### ubuntu update ok
sudo apt update && sudo apt upgrade -y

### dependencies
sudo apt install -y build-essential cmake git libcurl4-openssl-dev

### case1) NVIDIA 12.4
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-4
   
### case2) NVIDIA 11.x - don't do this.
sudo apt install -y nvidia-cuda-toolkit

### NVIDIA cuda archive download and install
mkdir repos
# this is for WSL based.
explorer.exe .
### download or copy from your repo.
# cudnn-linux-x86_64-8.9.7.29_cuda12-archive.tar.xz into repo directory

tar -xvf cudnn-linux-x86_64-8.9.7.29_cuda12-archive.tar.xz
cd cudnn-linux-x86_64-8.9.7.29_cuda12-archive
sudo cp include/* /usr/local/cuda/include
sudo cp lib/* /usr/local/cuda/lib64

.bashrc
export PATH=/usr/local/cuda/bin:$PATH
source ~/.bashrc

nvcc --version # 12.4.x
nvidia-smi

✅ LLama build

1
2
3
4
5
6
7
8
9
10
11
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
mkdir build && cd build
cmake .. -DGGML_CUDA=ON -DCMAKE_CUDA_STANDARD=17
cmake --build . --config Release -j$(nproc)

cd ..
cd models
wget https://huggingface.co/bartowski/Meta-Llama-3.1-8B-Instruct-GGUF/resolve/main/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf
cd ..

✅ RUN LLama

1
./build/bin/llama-server   -m ./models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf   --port 8080   --host 0.0.0.0   --threads $(nproc)   --n-gpu-layers 35

`````

This post is licensed under CC BY 4.0 by the author.