Skip to content

Models and Providers

Koharu uses both vision models and language models. The vision stack prepares the page; the language stack handles translation.

If you want the architecture-level explanation of how these pieces fit together, read Technical Deep Dive after this page.

Vision models

Koharu automatically downloads the required vision models when you use them for the first time.

The default stack includes:

Converted model weights are hosted on Hugging Face in safetensors format for Rust compatibility and performance.

What each vision model is

Model Model type Why Koharu uses it
PP-DocLayoutV3 layout detector finds text-like regions and reading order
comic-text-detector segmentation network produces a text mask for cleanup
PaddleOCR-VL-1.5 vision-language model reads cropped text into text tokens
lama-manga inpainting network reconstructs the image after text removal
YuzuMarker.FontDetection classifier / regressor estimates font and style hints for rendering

The important design choice is that Koharu does not use a single model for every page task. Layout, segmentation, OCR, and inpainting all need different output shapes:

  • layout wants regions and order
  • segmentation wants per-pixel masks
  • OCR wants text
  • inpainting wants restored pixels

Local LLMs

Koharu supports local GGUF models through llama.cpp. These models run on your machine and are downloaded on demand when you select them in the LLM picker.

In practice, the local models are usually quantized decoder-only transformers. GGUF is the file format; llama.cpp is the inference runtime.

Suggested local models for English output

  • vntl-llama3-8b-v2: around 8.5 GB in Q8_0 form, best when translation quality matters most
  • lfm2-350m-enjp-mt: very small and useful for low-memory systems or quick previews

Suggested local models for Chinese output

Suggested local model for broader language coverage

Remote providers

Koharu can translate through remote or self-hosted APIs instead of downloading a local model.

Supported providers include:

  • OpenAI
  • Gemini
  • Claude
  • DeepSeek
  • OpenAI-compatible APIs such as LM Studio, OpenRouter, or any endpoint that exposes /v1/models and /v1/chat/completions

Remote providers are configured in Settings > API Keys.

For a step-by-step setup guide for LM Studio, OpenRouter, and similar endpoints, see Use OpenAI-Compatible APIs.

Choosing between local and remote

Use local models when you want:

  • the most private setup
  • offline operation after downloads complete
  • tighter control over hardware usage

Use remote providers when you want:

  • to avoid large local model downloads
  • to reduce local VRAM or RAM usage
  • to connect to a hosted or self-managed model service

Note

When you use a remote provider, Koharu sends OCR text selected for translation to the provider you configured.

Background reading

For theory and diagrams behind the model categories on this page, see: