Installation - miniVLLM

System requirements

Requirement	Version
Python	`>=3.11, <3.12` (exactly Python 3.11)
GPU	CUDA-capable (required for Triton kernels)
Package manager	`uv`

Python 3.12 and later are not supported. The pyproject.toml pins requires-python = ">=3.11,<3.12". Using another version will cause uv sync to fail.

Step-by-step installation

Install uv

miniVLLM uses uv for reproducible, isolated dependency management. Install it with the official installer:

curl -LsSf https://astral.sh/uv/install.sh | sh

Restart your shell or run source $HOME/.local/bin/env to make the uv command available.

You can verify the installation with uv --version.

Clone the repository

git clone https://github.com/Wenyueh/MinivLLM.git
cd MinivLLM

Sync dependencies

uv sync

uv sync reads pyproject.toml, resolves the dependency graph, and installs everything into a project-local virtual environment (.venv/). You do not need to create or activate a virtualenv manually.To also install the optional development dependencies:

uv sync --extra dev

Verify the installation

uv run python main.py

If the engine initializes and begins printing throughput statistics, installation is successful.

Dependencies explained

The core dependencies declared in pyproject.toml are:

Package	Purpose
`torch`	GPU tensor operations and the base for all model compute
`transformers`	Model tokenizers and config loading (e.g. `AutoTokenizer`)
`xxhash`	Fast hashing used by the block manager for KV cache prefix matching
`vllm>=0.15.0`	Provides reference kernels and utilities that miniVLLM builds on

Optional dev dependencies

Install these when contributing to or testing the project:

Package	Purpose
`pytest>=7.0`	Test runner
`black>=23.0`	Code formatter
`isort>=5.0`	Import sorter

Install with:

uv sync --extra dev

uv sync vs pip install

miniVLLM is designed to be used with uv sync, not pip install. The uv run prefix ensures commands run inside the managed virtual environment without needing to activate it manually. If you prefer a traditional workflow, you can activate .venv/ with source .venv/bin/activate and then run commands directly.

Do not use pip install -r requirements.txt — there is no requirements.txt. All dependencies are declared in pyproject.toml and managed exclusively through uv.

Troubleshooting

CUDA not found / no CUDA-capable device

miniVLLM requires a CUDA GPU. If you see an error like AssertionError: CUDA is not available, check:

Your machine has an NVIDIA GPU.
The CUDA toolkit is installed and on your PATH. Run nvidia-smi to confirm.
Your torch installation includes CUDA support. Run:

python -c "import torch; print(torch.cuda.is_available())"

If this prints False, reinstall PyTorch with the correct CUDA version from pytorch.org.

Wrong Python version

If uv sync reports a Python version conflict, check your active Python version:

python --version

You need Python 3.11 specifically. Install it via your system package manager or python.org, then tell uv to use it:

uv sync --python 3.11

uv command not found after installation

The uv installer adds itself to ~/.local/bin/. If your shell does not pick it up, add it to your PATH:

export PATH="$HOME/.local/bin:$PATH"

Add this line to your ~/.bashrc or ~/.zshrc to make it permanent.

ImportError: No module named 'myvllm'

miniVLLM’s source lives under src/. When running scripts directly with python instead of uv run python, the src/ directory may not be on your PYTHONPATH. Always use:

uv run python main.py

Or, if using an activated virtualenv, add src/ to your path manually:

export PYTHONPATH=$(pwd)/src:$PYTHONPATH

​System requirements

​Step-by-step installation

​Dependencies explained

​Optional dev dependencies

​uv sync vs pip install

​Troubleshooting

System requirements

Step-by-step installation

Dependencies explained

Optional dev dependencies

uv sync vs pip install

Troubleshooting