Loose-Info.com
Last Update 2026/02/15
TOP - 各種テスト - LLM - ローカルLLMの実測値比較 Llama 3.1 [英語プロンプト]

低スペック寄りのPCでローカルLLMを動作させた際の記録です。
LLM以外の仮想マシンなどが起動され、多少負荷がかかった状態で実行しています。
ベンチマークなどでLLMの性能を評価する内容ではありません。

検証用PC

OS

Debian GNU/Linux 12 (bookworm)

CPU

Intel(R) Core(TM) i5-14400F

GPU

GeForce RTX 3060 12GB

メモリ

DDR4 PC4-25600 32GB × 4

SSD

crucial P310 CT1000P310SSD8-JP


構築環境 : Docker + Ollama (特別な設定などは無い状態)

検証用プロンプト

Could you please recommend some great places in the US to see beautiful scenery? Around 10 places in all four directions.

Llama 3.1 [英語プロンプト]

GPU無し
llama3.1:8b-instruct-q4_K_M(8.37TPS)   llama3.1:8b-instruct-q5_K_M(7.26TPS)   llama3.1:70b-instruct-q4_K_M(0.93TPS)  
GPU使用
llama3.1:8b-instruct-q4_K_M(60.9TPS)   llama3.1:8b-instruct-q5_K_M(56.4TPS)   llama3.1:70b-instruct-q4_K_M(1.70TPS)  

・TPS(tokens/s) は eval_count / eval_duration により算出
・モデルロード済みの検証は省略

llama3.1:8b-instruct-q4_K_M(GPU無し)

Model architecture llama parameters 8.0B context length 131072 embedding length 4096 quantization Q4_K_M 2026-02-13 total_duration(合計時間) : 63924360819 (63.924s) load_duration(モデルのロード時間) : 3109494448 ( 3.109s) prompt_eval_count(評価されたプロンプトのトークン数) : 34 prompt_eval_duration(プロンプトの評価時間) : 1219880703 ( 1.220s) eval_count(生成トークン数) : 496 eval_duration(生成時間) : 59285927366 (59.286s) real 1m3.943s user 0m0.046s sys 0m0.012s メモリ使用量(RSS) : 5437780 KB

llama3.1:8b-instruct-q5_K_M(GPU無し)

Model architecture llama parameters 8.0B context length 131072 embedding length 4096 quantization Q5_K_M 2026-02-13 total_duration(合計時間) : 73449560134 (73.450s) load_duration(モデルのロード時間) : 3621821265 ( 3.622s) prompt_eval_count(評価されたプロンプトのトークン数) : 34 prompt_eval_duration(プロンプトの評価時間) : 1576060505 ( 1.576s) eval_count(生成トークン数) : 493 eval_duration(生成時間) : 67879482440 (67.879s) real 1m13.469s user 0m0.044s sys 0m0.017s メモリ使用量(RSS) : 6223936 KB

llama3.1:70b-instruct-q4_K_M(GPU無し)

Model architecture llama parameters 70.6B context length 131072 embedding length 8192 quantization Q4_K_M 2026-02-13 total_duration(合計時間) : 531644723586 (531.645s) load_duration(モデルのロード時間) : 17161674882 ( 17.162s) prompt_eval_count(評価されたプロンプトのトークン数) : 34 prompt_eval_duration(プロンプトの評価時間) : 11315010263 ( 11.315s) eval_count(生成トークン数) : 470 eval_duration(生成時間) : 502771484317 (502.771s) real 8m51.666s user 0m0.065s sys 0m0.057s メモリ使用量(RSS) : 42952936 KB

llama3.1:8b-instruct-q4_K_M(GPU使用)

Model architecture llama parameters 8.0B context length 131072 embedding length 4096 quantization Q4_K_M 2026-02-13 total_duration(合計時間) : 11301169878 (11.301s) load_duration(モデルのロード時間) : 3190650091 ( 3.191s) prompt_eval_count(評価されたプロンプトのトークン数) : 34 prompt_eval_duration(プロンプトの評価時間) : 33134246 ( 0.033s) eval_count(生成トークン数) : 473 eval_duration(生成時間) : 7762711667 ( 7.763s) real 0m11.312s user 0m0.021s sys 0m0.010s +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.261.03 Driver Version: 535.261.03 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce RTX 3060 On | 00000000:01:00.0 On | N/A | | 0% 51C P2 170W / 170W | 5705MiB / 12288MiB | 96% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 1239 G /usr/lib/xorg/Xorg 118MiB | | 0 N/A N/A 1914 G xfwm4 2MiB | | 0 N/A N/A 2426 G /usr/bin/x-www-browser 165MiB | | 0 N/A N/A 101142 C /usr/bin/ollama 5406MiB | +---------------------------------------------------------------------------------------+ メモリ使用量(RSS) : 735056 KB

llama3.1:8b-instruct-q5_K_M(GPU使用)

Model architecture llama parameters 8.0B context length 131072 embedding length 4096 quantization Q5_K_M 2026-02-13 total_duration(合計時間) : 12566435438 (12.566s) load_duration(モデルのロード時間) : 3205207353 ( 3.205s) prompt_eval_count(評価されたプロンプトのトークン数) : 34 prompt_eval_duration(プロンプトの評価時間) : 35527481 ( 0.036s) eval_count(生成トークン数) : 507 eval_duration(生成時間) : 8996829750 ( 8.997s) real 0m12.577s user 0m0.024s sys 0m0.006s +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.261.03 Driver Version: 535.261.03 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce RTX 3060 On | 00000000:01:00.0 On | N/A | | 0% 56C P2 169W / 170W | 6409MiB / 12288MiB | 97% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 1239 G /usr/lib/xorg/Xorg 112MiB | | 0 N/A N/A 1914 G xfwm4 2MiB | | 0 N/A N/A 2426 G /usr/bin/x-www-browser 163MiB | | 0 N/A N/A 101268 C /usr/bin/ollama 6118MiB | +---------------------------------------------------------------------------------------+ メモリ使用量(RSS) : 795276 KB

llama3.1:70b-instruct-q4_K_M(GPU使用)

Model architecture llama parameters 70.6B context length 131072 embedding length 8192 quantization Q4_K_M 2026-02-13 total_duration(合計時間) : 445856618680 (445.857s) load_duration(モデルのロード時間) : 2674379876 ( 2.674s) prompt_eval_count(評価されたプロンプトのトークン数) : 34 prompt_eval_duration(プロンプトの評価時間) : 2722225396 ( 2.722s) eval_count(生成トークン数) : 517 eval_duration(生成時間) : 440080046194 (440.080s) real 7m25.868s user 0m0.047s sys 0m0.037s +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.261.03 Driver Version: 535.261.03 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce RTX 3060 On | 00000000:01:00.0 On | N/A | | 33% 56C P2 53W / 170W | 10978MiB / 12288MiB | 18% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 1239 G /usr/lib/xorg/Xorg 118MiB | | 0 N/A N/A 1914 G xfwm4 2MiB | | 0 N/A N/A 2426 G /usr/bin/x-www-browser 162MiB | | 0 N/A N/A 101353 C /usr/bin/ollama 10682MiB | +---------------------------------------------------------------------------------------+ メモリ使用量(RSS) : 42991976 KB