Skip to main content

System requirements

RequirementVersion
Python>=3.11, <3.12 (exactly Python 3.11)
GPUCUDA-capable (required for Triton kernels)
Package manageruv
Python 3.12 and later are not supported. The pyproject.toml pins requires-python = ">=3.11,<3.12". Using another version will cause uv sync to fail.

Step-by-step installation

1

Install uv

miniVLLM uses uv for reproducible, isolated dependency management. Install it with the official installer:
curl -LsSf https://astral.sh/uv/install.sh | sh
Restart your shell or run source $HOME/.local/bin/env to make the uv command available.
You can verify the installation with uv --version.
2

Clone the repository

git clone https://github.com/Wenyueh/MinivLLM.git
cd MinivLLM
3

Sync dependencies

uv sync
uv sync reads pyproject.toml, resolves the dependency graph, and installs everything into a project-local virtual environment (.venv/). You do not need to create or activate a virtualenv manually.To also install the optional development dependencies:
uv sync --extra dev
4

Verify the installation

uv run python main.py
If the engine initializes and begins printing throughput statistics, installation is successful.

Dependencies explained

The core dependencies declared in pyproject.toml are:
PackagePurpose
torchGPU tensor operations and the base for all model compute
transformersModel tokenizers and config loading (e.g. AutoTokenizer)
xxhashFast hashing used by the block manager for KV cache prefix matching
vllm>=0.15.0Provides reference kernels and utilities that miniVLLM builds on

Optional dev dependencies

Install these when contributing to or testing the project:
PackagePurpose
pytest>=7.0Test runner
black>=23.0Code formatter
isort>=5.0Import sorter
Install with:
uv sync --extra dev

uv sync vs pip install

miniVLLM is designed to be used with uv sync, not pip install. The uv run prefix ensures commands run inside the managed virtual environment without needing to activate it manually. If you prefer a traditional workflow, you can activate .venv/ with source .venv/bin/activate and then run commands directly.
Do not use pip install -r requirements.txt — there is no requirements.txt. All dependencies are declared in pyproject.toml and managed exclusively through uv.

Troubleshooting

miniVLLM requires a CUDA GPU. If you see an error like AssertionError: CUDA is not available, check:
  1. Your machine has an NVIDIA GPU.
  2. The CUDA toolkit is installed and on your PATH. Run nvidia-smi to confirm.
  3. Your torch installation includes CUDA support. Run:
python -c "import torch; print(torch.cuda.is_available())"
If this prints False, reinstall PyTorch with the correct CUDA version from pytorch.org.
If uv sync reports a Python version conflict, check your active Python version:
python --version
You need Python 3.11 specifically. Install it via your system package manager or python.org, then tell uv to use it:
uv sync --python 3.11
The uv installer adds itself to ~/.local/bin/. If your shell does not pick it up, add it to your PATH:
export PATH="$HOME/.local/bin:$PATH"
Add this line to your ~/.bashrc or ~/.zshrc to make it permanent.
miniVLLM’s source lives under src/. When running scripts directly with python instead of uv run python, the src/ directory may not be on your PYTHONPATH. Always use:
uv run python main.py
Or, if using an activated virtualenv, add src/ to your path manually:
export PYTHONPATH=$(pwd)/src:$PYTHONPATH