This article is currently an experimental machine translation and may contain errors. If anything is unclear, please refer to the original Chinese version. I am continuously working to improve the translation.
My GitHub projects have always had a non-negligible number of overseas users, and many people have sent me emails in English. However, my blog’s visitors are still predominantly from China — clearly because all the articles are in Chinese. Given that most of my writings are already niche, I’ve decided to add English support to attract potential international readers and expand my audience.
This post documents my journey of adding multilingual support to my blog, divided into two main parts: content translation and Hexo framework modifications.
Content Translation
First, to have an English blog, I need English content.
All blog posts are written in Markdown and need to be translated into English while preserving the original formatting. My initial idea was a more traditional approach: parse the Markdown, extract text segments, send them to a translation API, then reconstruct the Markdown.
However, this method risks losing context, causing inconsistent translations, and being overly constrained by Markdown syntax. Plus, traditional machine translation quality is, well, predictably mediocre.
So what’s the alternative? Bring in the LLMs, of course. I opted to use a large language model to translate the entire Markdown file in one go, preserving formatting. My initial prompt:
1 | Please translate the above Markdown into English. No deep thinking needed. Preserve all formatting. Do not modify any links or code (only comments in code may be translated). Ensure your output remains valid Markdown. Do not wrap the result in a code block — output directly. |
I tested various state-of-the-art models via OpenRouter, including international models like Gemini 2.5 Pro, Claude Sonnet 4.5, GPT-5, and domestic ones like Qwen3 235B A22B Instruct 2507, Qwen3 30B A3B Instruct 2507, DeepSeek V3.1 Terminus, DeepSeek R1, and GLM 4.6.
All models demonstrated sufficient prompt-following ability to understand my request and produce syntactically valid Markdown. However, every model capable of “deep thinking” ignored my instruction of “no deep thinking” and engaged in lengthy internal monologue before outputting the result, wasting a significant number of tokens. Using them would require additional measures to suppress such behavior.
In terms of translation quality, each model exhibited different linguistic styles, but I couldn’t clearly distinguish a winner. The only model I found unsatisfactory was DeepSeek R1, which showed noticeable hallucinations on long inputs and omitted substantial parts of the original text. Overseas models struggled with understanding Chinese internet slang and cultural references, sometimes producing unexpected or awkward translations. They were also significantly more expensive and lacked open weights.
After weighing the options, I chose Qwen3 235B A22B Instruct 2507, quantized with Q6_K and locally deployed on an EPYC 9654 server. I also refined the prompt:
1 | Please translate the above Markdown into English. No deep thinking needed. Preserve all formatting. When encountering Chinese memes or jokes, feel free to adapt the translation to better fit English expression conventions. Otherwise, do not modify any links or code (only comments in code may be translated). Ensure your output remains valid Markdown. Do not wrap the result in a code block — output directly. |
The final script (using Qwen3-recommended parameters temp=0.7; top_p=0.8; top_k=20) is as follows:
1 | import os |
I could have designed a benchmark to evaluate translation quality across models, but I decided to take an iterative approach — release a version first and improve later. If you have relevant experience, feel free to comment and share your thoughts.
In total, the translation consumed 223k input tokens and 198k output tokens. Even using online APIs, the cost is quite reasonable (estimated via OpenRouter pricing for Qwen3 235B A22B, costing less than 1 RMB).
Hexo Hackery
Hexo and my chosen Cactus theme don’t support i18n out of the box. I faced several options:
- Build a new static site generator from scratch
- Switch to a different SSG with i18n support
- Modify Hexo to properly support multilingual blogs
- Generate Chinese and English versions separately, then merge them manually
The first three would require substantial effort. I’ve been using Hexo for years and rely on several Hexo-specific plugins, so I果断 chose the fourth option.
The idea: first run hexo generate to build the original Chinese site, then replace the source posts with their English counterparts, generate the English site, place it under /en/, and add bidirectional links for language switching.
Build Script
I place the English site under the /en/ directory, so https://blog.lyc8503.net/en serves the English version. Hexo works fine under subdirectories (as long as _config.yml has the correct URL — many plugins depend on this).
1 | # Build Chinese site |
One downside: shared assets like JS and images are duplicated under both / and /en/, potentially causing cache misses. Ideally, the English site would reference root assets directly, but Hexo doesn’t support that natively — so this is a temporary compromise.
Merge sitemap, update robots.txt
This setup generates two sitemaps. I merge them into one at the root so search engines can discover English pages.
1 | cat sitemap.txt en/sitemap.txt > merged.txt |
After merging, I verified the XML with online validators — the format is correct.
I also manually updated robots.txt to allow search engines to index all paths under /en.
Add Language Switcher & Alternate Links
With the English site functional, I can now access it by prepending /en to the path. But relying solely on SEO isn’t enough — I need a manual toggle.
Additionally, I inject <link rel="alternate"> tags into the <head> to help search engines understand the language relationship between pages, serve the correct version to users, and avoid duplicate content penalties.
1 | hexo.extend.filter.register('after_render:html', function(htmlString, data) { |
Add Warning Notice
Since I don’t have time to manually review and correct every translated article, and because LLMs will inevitably make mistakes, I chose to add a warning at the beginning of each post. Implementation is similar to how I previously added CC license notices at the end:
1 | hexo.extend.filter.register('before_post_render', function(data) { |
Conclusion
That’s about it! This is the blog you’re reading right now. Now, let’s see how many overseas readers show up.
This article is licensed under the CC BY-NC-SA 4.0 license.
Author: lyc8503, Article link: https://blog.lyc8503.net/en/post/hexo-add-english/
If this article was helpful or interesting to you, consider buy me a coffee¬_¬
Feel free to comment in English below o/