简体中文 / [English]


Adding English Translation to My Blog! (Hexo Multilingual Setup with LLM Translation)

This article is currently an experimental machine translation and may contain errors. If anything is unclear, please refer to the original Chinese version. I am continuously working to improve the translation.

My GitHub projects have always had a non-negligible number of overseas users, and many people have sent me emails in English. However, my blog’s visitors are still predominantly from China — clearly because all the articles are in Chinese. Given that most of my writings are already niche, I’ve decided to add English support to attract potential international readers and expand my audience.

This post documents my journey of adding multilingual support to my blog, divided into two main parts: content translation and Hexo framework modifications.

Content Translation

First, to have an English blog, I need English content.

All blog posts are written in Markdown and need to be translated into English while preserving the original formatting. My initial idea was a more traditional approach: parse the Markdown, extract text segments, send them to a translation API, then reconstruct the Markdown.

However, this method risks losing context, causing inconsistent translations, and being overly constrained by Markdown syntax. Plus, traditional machine translation quality is, well, predictably mediocre.

So what’s the alternative? Bring in the LLMs, of course. I opted to use a large language model to translate the entire Markdown file in one go, preserving formatting. My initial prompt:

1
Please translate the above Markdown into English. No deep thinking needed. Preserve all formatting. Do not modify any links or code (only comments in code may be translated). Ensure your output remains valid Markdown. Do not wrap the result in a code block — output directly.

I tested various state-of-the-art models via OpenRouter, including international models like Gemini 2.5 Pro, Claude Sonnet 4.5, GPT-5, and domestic ones like Qwen3 235B A22B Instruct 2507, Qwen3 30B A3B Instruct 2507, DeepSeek V3.1 Terminus, DeepSeek R1, and GLM 4.6.

All models demonstrated sufficient prompt-following ability to understand my request and produce syntactically valid Markdown. However, every model capable of “deep thinking” ignored my instruction of “no deep thinking” and engaged in lengthy internal monologue before outputting the result, wasting a significant number of tokens. Using them would require additional measures to suppress such behavior.

In terms of translation quality, each model exhibited different linguistic styles, but I couldn’t clearly distinguish a winner. The only model I found unsatisfactory was DeepSeek R1, which showed noticeable hallucinations on long inputs and omitted substantial parts of the original text. Overseas models struggled with understanding Chinese internet slang and cultural references, sometimes producing unexpected or awkward translations. They were also significantly more expensive and lacked open weights.

After weighing the options, I chose Qwen3 235B A22B Instruct 2507, quantized with Q6_K and locally deployed on an EPYC 9654 server. I also refined the prompt:

1
Please translate the above Markdown into English. No deep thinking needed. Preserve all formatting. When encountering Chinese memes or jokes, feel free to adapt the translation to better fit English expression conventions. Otherwise, do not modify any links or code (only comments in code may be translated). Ensure your output remains valid Markdown. Do not wrap the result in a code block — output directly.

The final script (using Qwen3-recommended parameters temp=0.7; top_p=0.8; top_k=20) is as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
import os
import time

import requests
import json

dir_translate = {
'学习日记': 'Learning Notes',
'其他': 'Other',
'软件研究=)': 'Exploration',
# ... etc.
}

complete = 0
prompt = 0
t = time.time()

for dirpath, dirnames, filenames in os.walk('source/_posts'):
for filename in filenames:
if filename.endswith('.md'):
file_path = os.path.join(dirpath, filename)
en_path = file_path

for k, v in dir_translate.items():
en_path = en_path.replace(k, v)
print(en_path)

os.makedirs(os.path.dirname('en/' + en_path), exist_ok=True)
with open(file_path, 'r', encoding='utf-8') as file:
content = file.read()

if os.path.exists('en/' + en_path):
print('File already exists: ' + en_path)
continue

print("Translating: " + en_path)
response = requests.post(
url="http://localhost:8008/v1/chat/completions",
json={
"model": "Qwen3-235B-A22B-Instruct-2507-GGUF/Q6_K/Qwen3-235B-A22B-Instruct-2507-Q6_K-00001-of-00004.gguf",
"max_tokens": int(len(content)),
"temperature": 0.7,
"top_p": 0.8,
"top_k": 20,
"min_p": 0.00,
"messages": [
{
"role": "user",
"content": content + '\n\n' + '请将以上 Markdown 翻译成英文,不用深度思考,保留所有格式,但遇到一些中文的梗或玩笑时,可以灵活的调整翻译内容,使其更加符合英语表达习惯。除此之外,不要修改任何链接或代码(最多翻译代码中的注释),确保你的输出还是合法的 Markdown,不用放在一个代码块中,直接输出结果。'
}
],
}
).json()

if 'usage' not in response:
print(response)

print(response['usage'], time.time() - t)
t = time.time()
prompt += response['usage']['prompt_tokens']
complete += response['usage']['completion_tokens']
print(f'Input: {prompt}, Output: {complete}, Total: {prompt + complete}')
assert response['choices'][0]['finish_reason'] == 'stop'

with open('en/' + en_path, 'w', encoding='utf-8') as file:
file.write(response['choices'][0]['message']['content'])

I could have designed a benchmark to evaluate translation quality across models, but I decided to take an iterative approach — release a version first and improve later. If you have relevant experience, feel free to comment and share your thoughts.

In total, the translation consumed 223k input tokens and 198k output tokens. Even using online APIs, the cost is quite reasonable (estimated via OpenRouter pricing for Qwen3 235B A22B, costing less than 1 RMB).

Hexo Hackery

Hexo and my chosen Cactus theme don’t support i18n out of the box. I faced several options:

  • Build a new static site generator from scratch
  • Switch to a different SSG with i18n support
  • Modify Hexo to properly support multilingual blogs
  • Generate Chinese and English versions separately, then merge them manually

The first three would require substantial effort. I’ve been using Hexo for years and rely on several Hexo-specific plugins, so I果断 chose the fourth option.

The idea: first run hexo generate to build the original Chinese site, then replace the source posts with their English counterparts, generate the English site, place it under /en/, and add bidirectional links for language switching.

Build Script

I place the English site under the /en/ directory, so https://blog.lyc8503.net/en serves the English version. Hexo works fine under subdirectories (as long as _config.yml has the correct URL — many plugins depend on this).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# Build Chinese site
rm -rf db.json
npx hexo generate
mv public public_cn

# Build English site
cp en/_config.yml _config.yml
cp en/_config_theme.yml themes/cactus/_config.yml
cp -r source/friends source/categories source/search en/source/

rm -rf source
cp -r en/source .

rm -rf db.json
npx hexo generate
mv public public_cn/en
mv public_cn public

# Copy shared assets (images, etc. — en/source contains only translated posts, no binaries)
rsync -av --prune-empty-dirs --exclude='*.html' post/ en/post/
rsync -av img/ en/img/

One downside: shared assets like JS and images are duplicated under both / and /en/, potentially causing cache misses. Ideally, the English site would reference root assets directly, but Hexo doesn’t support that natively — so this is a temporary compromise.

Merge sitemap, update robots.txt

This setup generates two sitemaps. I merge them into one at the root so search engines can discover English pages.

1
2
3
4
5
6
cat sitemap.txt en/sitemap.txt > merged.txt
{ echo -e '<?xml version="1.0" encoding="UTF-8"?>\n<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">'; sed -n '/<url>/,/<\/url>/p' sitemap.xml en/sitemap.xml; echo '</urlset>'; } > merged.xml

mv merged.txt sitemap.txt
mv merged.xml sitemap.xml
rm en/sitemap.txt en/sitemap.xml

After merging, I verified the XML with online validators — the format is correct.

I also manually updated robots.txt to allow search engines to index all paths under /en.

With the English site functional, I can now access it by prepending /en to the path. But relying solely on SEO isn’t enough — I need a manual toggle.

Additionally, I inject <link rel="alternate"> tags into the <head> to help search engines understand the language relationship between pages, serve the correct version to users, and avoid duplicate content penalties.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
hexo.extend.filter.register('after_render:html', function(htmlString, data) {
if (data.page.path.startsWith('categories/') && data.page.path !== 'categories/index.html') {
// 404 otherwise
return htmlString;
}

const baseUrl = hexo.config.url.replace(/\/en/, '/');
const zhUrl = new URL(data.page.path, baseUrl).href;
const enUrl = new URL('en/' + data.page.path, baseUrl).href;

const hreflangTags = `
<link rel="alternate" hreflang="zh-CN" href="${zhUrl}" />
<link rel="alternate" hreflang="en" href="${enUrl}" />
<link rel="alternate" hreflang="x-default" href="${zhUrl}" />
`;

htmlString = htmlString.replace('</head>', hreflangTags.replaceAll('\n', '').trim() + '</head>');

const isEnglish = hexo.config.url.includes('/en');
let switcherContent = '';

if (isEnglish) {
const targetUrl = '/' + data.page.path;
switcherContent = `
<a href="${targetUrl}" style="color: #c9cacc; text-decoration: none;">Simplified Chinese</a>
<span style="color: #c9cacc;">/</span>
<span style="color: #2bbc8a;">[English]</span>
`;
} else {
const targetUrl = '/en/' + data.page.path;
switcherContent = `
<span style="color: #2bbc8a;">[Simplified Chinese]</span>
<span style="color: #c9cacc;">/</span>
<a href="${targetUrl}" style="color: #c9cacc; text-decoration: none;">English</a>
`;
}

const switcherContainerStyle = `
position: absolute;
top: 15px;
left: 15px;
z-index: 9999;
font-size: 14px;
font-family: Menlo, 'Meslo LG', monospace;
`.replace(/\s\s+/g, ' ').trim();

const switcherHtml = `<div style="${switcherContainerStyle}">${switcherContent.replace(/\s\s+/g, ' ').trim()}</div>`;
const finalHtml = htmlString.replace(/<body(.*?)>/i, `<body$1>${switcherHtml}`);
return finalHtml;
});

Add Warning Notice

Since I don’t have time to manually review and correct every translated article, and because LLMs will inevitably make mistakes, I chose to add a warning at the beginning of each post. Implementation is similar to how I previously added CC license notices at the end:

1
2
3
4
5
hexo.extend.filter.register('before_post_render', function(data) {
if (data.layout != "post") return data;
data.content = '> ⚠️ This article is currently an experimental machine translation and may contain errors. If anything is unclear, please refer to the original Chinese version. I am continuously working to improve the translation.\n\n' + data.content
return data;
}, 5);

Conclusion

That’s about it! This is the blog you’re reading right now. Now, let’s see how many overseas readers show up.

This article is licensed under the CC BY-NC-SA 4.0 license.

Author: lyc8503, Article link: https://blog.lyc8503.net/en/post/hexo-add-english/
If this article was helpful or interesting to you, consider buy me a coffee¬_¬
Feel free to comment in English below o/