GenAI with LLiMa
LLiMa is the GenAI toolkit in Model Compiler for compiling, testing, benchmarking, deploying, and running LLM, VLM, and ASR models on Modalix.
LLiMa supports three input formats:
- Hugging Face safetensors — standard LLM and VLM model directories
- GGUF files — LLM models packaged in GGUF format
- Compressed tensor models — pre-quantized GPTQ/AWQ-style safetensor models
SiMa.ai also publishes precompiled GenAI models on Hugging Face. Start there when a suitable model already exists.
For concrete GenAI demos, check our Neat Apps portal.
LLiMa Availability
LLiMa compilation tools are installed by default in Model Compiler. The LLiMa runtime is installed natively on Modalix as part of the Neat runtime. See Neat Framework installation for the runtime installation flow.
For concrete GenAI demos, check our Neat Apps portal.
Supported Models
The following table shows the supported model architectures and their capabilities:
| Model Architecture | Type | Supported Sizes |
|---|---|---|
| Llama 2 | LLM | 7b |
| Llama 3.1 | LLM | 8b |
| Llama 3.2 | LLM | 1b, 3b |
| Gemma 1 | LLM | 2b, 7b |
| Gemma 2 | LLM | 2b, 9b |
| Gemma 3 | LLM | 1b, 4b |
| Phi 3.5 mini | LLM | 3.8b |
| Qwen 2.5 | LLM | 0.5b, 1.5b, 3b, 7b |
| Qwen 3 | LLM | 0.6b, 1.7b, 4b, 8b |
| Mistral 1 | LLM | 7b |
| LFM 2 | LLM | 350m, 1.2b, 2.6b |
| Llava 1.5 | VLM | 7b |
| PaliGemma | VLM | 3b |
| Gemma 3 | VLM | 4b |
| Qwen 2.5 VL | VLM | 3b, 7b |
| Qwen 3 VL | VLM | 2b, 4b, 8b |
| LFM 2 | VLM | 450m, 1.6b, 3b |
Limitations
| Limitation Type | Description |
|---|---|
| Model Architecture | Only models based on the architectures listed above are supported. |
| Model Parameters | Only models with parameter count less than 10B are supported. |
| HF Models | Models must be downloaded from Hugging Face and contain: config.json, tokenizer.json, tokenizer_config.json, generation_config.json and weights in safetensors format |
| GGUF Models | GGUF format is supported for LLMs only. VLMs must be compiled from the Hugging Face safetensors format. Note that performance may decrease compared to Hugging Face safetensor compilation. |
| Compressed Tensor Models | Pre-quantized safetensor models (GPTQ/AWQ) created with llm-compressor are supported for LLMs only. The model must use symmetric quantization. These models can achieve better accuracy than standard INT4 quantization while maintaining high performance. |
| Gemma3 VLM | Supported with modified SigLip 448 vision encoder |
| LLAMA 3.2 Vision | Vision models are not supported |