Skip to main content

GenAI with LLiMa

LLiMa is the GenAI toolkit in Model Compiler for compiling, testing, benchmarking, deploying, and running LLM, VLM, and ASR models on Modalix.

LLiMa supports three input formats:

  • Hugging Face safetensors — standard LLM and VLM model directories
  • GGUF files — LLM models packaged in GGUF format
  • Compressed tensor models — pre-quantized GPTQ/AWQ-style safetensor models

SiMa.ai also publishes precompiled GenAI models on Hugging Face. Start there when a suitable model already exists.

For concrete GenAI demos, check our Neat Apps portal.

LLiMa Availability

LLiMa compilation tools are installed by default in Model Compiler. The LLiMa runtime is installed natively on Modalix as part of the Neat runtime. See Neat Framework installation for the runtime installation flow.

For concrete GenAI demos, check our Neat Apps portal.

Supported Models

The following table shows the supported model architectures and their capabilities:

Model ArchitectureTypeSupported Sizes
Llama 2LLM7b
Llama 3.1LLM8b
Llama 3.2LLM1b, 3b
Gemma 1LLM2b, 7b
Gemma 2LLM2b, 9b
Gemma 3LLM1b, 4b
Phi 3.5 miniLLM3.8b
Qwen 2.5LLM0.5b, 1.5b, 3b, 7b
Qwen 3LLM0.6b, 1.7b, 4b, 8b
Mistral 1LLM7b
LFM 2LLM350m, 1.2b, 2.6b
Llava 1.5VLM7b
PaliGemmaVLM3b
Gemma 3VLM4b
Qwen 2.5 VLVLM3b, 7b
Qwen 3 VLVLM2b, 4b, 8b
LFM 2VLM450m, 1.6b, 3b

Limitations

Limitation TypeDescription
Model ArchitectureOnly models based on the architectures listed above are supported.
Model ParametersOnly models with parameter count less than 10B are supported.
HF ModelsModels must be downloaded from Hugging Face and contain: config.json, tokenizer.json, tokenizer_config.json, generation_config.json and weights in safetensors format
GGUF ModelsGGUF format is supported for LLMs only. VLMs must be compiled from the Hugging Face safetensors format. Note that performance may decrease compared to Hugging Face safetensor compilation.
Compressed Tensor ModelsPre-quantized safetensor models (GPTQ/AWQ) created with llm-compressor are supported for LLMs only. The model must use symmetric quantization. These models can achieve better accuracy than standard INT4 quantization while maintaining high performance.
Gemma3 VLMSupported with modified SigLip 448 vision encoder
LLAMA 3.2 VisionVision models are not supported