GenAI with LLiMa

LLiMa is the GenAI toolkit in Model Compiler for compiling, testing, benchmarking, deploying, and running LLM, VLM, and ASR models on Modalix.

LLiMa supports three input formats:

Hugging Face safetensors — standard LLM and VLM model directories
GGUF files — LLM models packaged in GGUF format
Compressed tensor models — pre-quantized GPTQ/AWQ-style safetensor models

SiMa.ai also publishes precompiled GenAI models on Hugging Face. Start there when a suitable model already exists.

For concrete GenAI demos, check our Neat Apps portal.

LLiMa Availability

LLiMa compilation tools are installed by default in Model Compiler. The LLiMa runtime is installed natively on Modalix as part of the Neat runtime. See Neat Framework installation for the runtime installation flow.

For concrete GenAI demos, check our Neat Apps portal.

Supported Models

The following table shows the supported model architectures and their capabilities:

Model Architecture	Type	Supported Sizes
Llama 2	LLM	7b
Llama 3.1	LLM	8b
Llama 3.2	LLM	1b, 3b
Gemma 1	LLM	2b, 7b
Gemma 2	LLM	2b, 9b
Gemma 3	LLM	1b, 4b
Phi 3.5 mini	LLM	3.8b
Qwen 2.5	LLM	0.5b, 1.5b, 3b, 7b
Qwen 3	LLM	0.6b, 1.7b, 4b, 8b
Mistral 1	LLM	7b
LFM 2	LLM	350m, 1.2b, 2.6b
Llava 1.5	VLM	7b
PaliGemma	VLM	3b
Gemma 3	VLM	4b
Qwen 2.5 VL	VLM	3b, 7b
Qwen 3 VL	VLM	2b, 4b, 8b
LFM 2	VLM	450m, 1.6b, 3b

Limitations

Limitation Type	Description
Model Architecture	Only models based on the architectures listed above are supported.
Model Parameters	Only models with parameter count less than 10B are supported.
HF Models	Models must be downloaded from Hugging Face and contain: `config.json`, `tokenizer.json`, `tokenizer_config.json`, `generation_config.json` and weights in safetensors format
GGUF Models	GGUF format is supported for LLMs only. VLMs must be compiled from the Hugging Face safetensors format. Note that performance may decrease compared to Hugging Face safetensor compilation.
Compressed Tensor Models	Pre-quantized safetensor models (GPTQ/AWQ) created with llm-compressor are supported for LLMs only. The model must use symmetric quantization. These models can achieve better accuracy than standard INT4 quantization while maintaining high performance.
Gemma3 VLM	Supported with modified SigLip 448 vision encoder
LLAMA 3.2 Vision	Vision models are not supported

LLiMa Availability​

Supported Models​

Limitations​

LLiMa Availability

Supported Models

Limitations