MCP Tools Reference¶
Koharu exposes MCP tools at:
http://127.0.0.1:<PORT>/mcp
These tools operate on the same runtime state as the GUI and HTTP API.
General behavior¶
Important implementation details:
- image-based tools can return text plus inline image content
open_documentsreplaces the current document set rather than appendingprocessstarts the full pipeline but does not itself stream progressllm_loadandprocesscurrently accept local-model-style parameters and do not expose every HTTP API field
Inspection tools¶
| Tool | What it does | Key parameters |
|---|---|---|
app_version |
get the application version | none |
device |
get ML device and GPU-related info | none |
get_documents |
get the number of loaded documents | none |
get_document |
get one document's metadata and text blocks | index |
list_font_families |
list available render fonts | none |
llm_list |
list translation models | none |
llm_ready |
check whether an LLM is currently loaded | none |
Image and block preview tools¶
| Tool | What it does | Key parameters |
|---|---|---|
view_image |
preview a whole document layer | index, layer, optional max_size |
view_text_block |
preview one cropped text block | index, text_block_index, optional layer |
Valid view_image layers:
originalsegmentinpaintedrendered
Valid view_text_block layers:
originalrendered
Document and export tools¶
| Tool | What it does | Key parameters |
|---|---|---|
open_documents |
load image files from disk and replace the current set | paths |
export_document |
write the rendered document to disk | index, output_path |
open_documents expects filesystem paths, not uploaded file blobs.
export_document currently exports the rendered image path only. PSD export is available through the HTTP API but does not currently have a dedicated MCP tool.
Pipeline tools¶
| Tool | What it does | Key parameters |
|---|---|---|
detect |
run text detection and font prediction | index |
ocr |
run OCR on detected blocks | index |
inpaint |
remove text using the current mask | index |
render |
draw translated text back onto the page | index, optional text_block_index, shader_effect, font_family |
process |
start detect -> OCR -> inpaint -> translate -> render | optional index, llm_model_id, language, shader_effect, font_family |
process is the coarse-grained convenience tool. If you need more control or easier debugging, use the stage tools separately.
LLM tools¶
| Tool | What it does | Key parameters |
|---|---|---|
llm_load |
load a translation model | id, optional temperature, max_tokens, custom_system_prompt |
llm_offload |
unload the current model | none |
llm_generate |
translate one block or all blocks | index, optional text_block_index, language |
llm_generate expects an LLM to already be loaded.
Text-block editing tools¶
| Tool | What it does | Key parameters |
|---|---|---|
update_text_block |
patch text, translation, box geometry, or style | index, text_block_index, optional text and style fields |
add_text_block |
add a new empty text block | index, x, y, width, height |
remove_text_block |
remove one text block | index, text_block_index |
The current update tool can change:
translationxywidthheightfont_familiesfont_sizecolorshader_effect
Mask and cleanup tools¶
| Tool | What it does | Key parameters |
|---|---|---|
dilate_mask |
expand the current text mask | index, radius |
erode_mask |
shrink the current text mask | index, radius |
inpaint_region |
re-inpaint a specific rectangle only | index, x, y, width, height |
These are useful when the automatic segmentation mask is close but still needs manual cleanup.
Suggested prompt flow¶
For reliable agent behavior, this sequence works well:
open_documentsget_documentsdetectocrget_documentllm_loadllm_generateinpaintrenderview_imageexport_document
If you need to inspect a problem block, use view_text_block before asking the agent to patch layout or translation.