Running Your Own ISP Part.4 - Building Your Own Anycast Network

lyc8503

2024-09-12

This article is currently an experimental machine translation and may contain errors. If anything is unclear, please refer to the original Chinese version. I am continuously working to improve the translation.

Introduction

Let’s temporarily skip the various extensions mentioned at the end of the previous part (IGP/iBGP/DN42, etc.). These aren’t strictly necessary in a small AS, and readers who are interested can explore them on their own.

Instead, let’s dive into something else — the Anycast network. Many international services already use this technology — for example, CDN offerings from Cloudflare, Azure, and Google, or DNS resolvers like 1.1.1.1 and 8.8.8.8. ~~(Meanwhile, domestic providers, due to the inaction of the big three carriers, are still stuck with traditional DNS load balancing)~~

Currently, our AS only has one VPS node in Germany. As shown in the previous ping.pe test results, our service has very low latency in Europe, but much higher latency in the Americas and the Asia-Pacific region.

So let’s try adding a few more stateless nodes to our AS and build a global Anycast network.

Finding More Upstream Providers

Actually, building an Anycast network is incredibly simple: just find more upstream providers and announce the same IP ranges from each. ~~End of article. (x)~~

For diversity and to avoid single points of failure, let’s try a different provider this time — Vultr. They occasionally offer generous short-term trial credits for new users. When I signed up, I only had to add $5 via PayPal (or link a bank card) and received a $250 trial credit valid for 30 days. Let’s use that credit for our experiments.

First, register a Vultr account and go to the BGP tab in the Dashboard. There, you can upload your LOA and complete email verification. Once verified, a support ticket is automatically created. After manual approval, your ASN suffix will appear.

Now you can create VPS instances and start announcing your IP ranges. The steps are almost identical to Part 2 of this series. For provider-specific details, refer to Vultr’s documentation.

For this setup, I chose the Full table mode so Vultr provides the complete Internet routing table. There’s a known issue where multihop BGP might cause all routes to become unreachable. This isn’t mentioned in Vultr’s docs — you need to manually add a static route to the default gateway.

My final BIRD configuration looks something like this:

# Other parts are the same as in part2, omitted here
protocol static static_v6 {
    ipv6;
    route 2a14:7c0:4d00::/40 unreachable;
    route 2001:19f0:ffff::1/128 via fe80::fc00:5ff:fe17:ba85%enp1s0;  # Manually specify the default gateway seen via ip -6 route
}

protocol bgp bgp_vultr_v6 {
    local 2001:19f0:6001:58ee:5400:05ff:fe17:ba85 as ASN;
    neighbor 2001:19f0:ffff::1 as 64515;
    ipv6 {
        import filter import_filter_v6;
        export filter export_filter_v6;
        export limit 10;
    };
    multihop 2;  # Vultr requires multihop
    password "114514";  # This is a user-defined MD5 password
    graceful restart;
}

I deployed two VPS instances — one in Los Angeles, USA, and one in Singapore — configured them, and started announcing our prefix. (p.s. If you have more hosts or complex deployment needs, consider using automation tools like Ansible. I’m being lazy and doing it manually this time.)

After waiting a bit for BGP convergence, we can see the new upstreams on BGP Tools:

AS20473 is our newly added upstream

Anycast for Low Latency

Now let’s ping 2a14:7c0:4d00::1 again using ping.pe. We can clearly see that latency in the US and Singapore has dropped significantly, while performance in Europe remains solid. MTR traces also clearly show the routing differences.

ping.pe results before Anycast (from Part 2)

ping.pe results after Anycast

MTR trace from San Francisco:

0.  localhost                   Loss%  Snt   Last  Avg   Best  Wrst  StDev AS Name                         PTR
1   ???                         100.0  20    0.0   0.0   0.0   0.0   0.0   -
2   fd00:0:9::26a               0.0%   20    0.6   0.6   0.4   2.5   0.4   -
3   2604:a880:ffff:8::1:34      0.0%   20    2.3   3.1   0.8   38.7  8.4   -
4   2a03:b0c0:fffd::44          0.0%   20    0.4   0.5   0.3   0.8   0.0   -
5   2a03:b0c0:fffe::14e         0.0%   20    1.4   1.2   1.1   1.6   0.0   -
6   2a03:b0c0:fffe::131         0.0%   20    0.9   1.1   0.8   3.4   0.5   -
7   2001:418:0:5000::56c        0.0%   20    1.1   3.1   0.9   22.4  4.9   2914  US NTT-LTD-2914     ae-38.a04.snjsca04...
8   2001:418:0:2000::260        0.0%   20    1.7   12.7  1.1   47.9  15.8  2914  US NTT-LTD-2914     ae-0.r24.snjsca04...
9   2001:418:0:2000::112        50.0%  20    10.4  14.8  10.0  35.2  8.7   2914  US NTT-LTD-2914     ae-3.r25.lsanca07...
10  2001:418:0:2000::c5         0.0%   20    9.8   10.4  9.8   19.1  2.0   2914  US NTT-LTD-2914     ae-1.a03.lsanca07...
11  2001:418:0:5000::1012       0.0%   20    10.9  13.8  10.1  41.9  9.5   2914  US NTT-LTD-2914     xe-3-5-0-1.a03.lsa...
12  ???                         100.0  20    0.0   0.0   0.0   0.0   0.0   -
13  2001:19f0:6000::a44:c2      0.0%   20    11.5  11.5  11.3  11.9  0.0   20473 US CHOOPA           
14  ???                         100.0  20    0.0   0.0   0.0   0.0   0.0   -
15  2a14:7c0:4d00::1            0.0%   20    11.3  11.3  11.2  12.1  0.0   214775 CN

MTR trace from Paris:

0.  localhost                   Loss%  Snt   Last  Avg   Best  Wrst  StDev AS Name                         PTR
1   ???                         100.0  20    0.0   0.0   0.0   0.0   0.0   -
2   2a00:dd80:3e:10::32         0.0%   20    0.3   1.3   0.2   19.7  4.3   36236 US NETACTUATE       
3   2001:7f8:43::6939:1         25.0%  20    1.3   2.0   1.3   5.4   0.9   -
4   ???                         100.0  20    0.0   0.0   0.0   0.0   0.0   -
5   ???                         100.0  20    0.0   0.0   0.0   0.0   0.0   -
6   2001:470:0:52e::2           0.0%   20    12.5  13.3  12.0  26.7  3.1   6939  US HURRICANE        e0-36.core2.dus1.h...
7   2001:7f8:9e::c8e:0:2        0.0%   20    12.1  12.1  11.9  13.0  0.0   -
8   2a09:0:1337:c0c0::2         0.0%   20    12.5  12.1  12.0  12.5  0.0   -
9   2a14:7c0:4d00::1            0.0%   20    12.7  12.6  12.4  13.0  0.0   214775 CN

Running tcpdump icmp6 on all three hosts in our AS, we can observe ping requests coming from different regions. Thanks to ping.pe’s PTR records, we can clearly see where each request originates.

For example, the Singapore Vultr instance received pings from a DigitalOcean node in Singapore:

1
2

06:57:07.542784 IP6 sg-digitalocean.ping.pe > 2a14:7c0:4d00::1: ICMP6, echo request, id 45771, seq 48123, length 64
06:57:07.542845 IP6 2a14:7c0:4d00::1 > sg-digitalocean.ping.pe: ICMP6, echo reply, id 45771, seq 48123, length 64

The US Vultr instance received pings from India, the US, and elsewhere:

07:11:23.955198 IP6 ping-pe-us-ca-sf-digitalocean-do.localdomain > 2a14:7c0:4d00::1: ICMP6, echo request, id 50875, seq 48123, length 64
07:11:23.955213 IP6 2a14:7c0:4d00::1 > ping-pe-us-ca-sf-digitalocean-do.localdomain: ICMP6, echo reply, id 50875, seq 48123, length 64
07:11:24.340371 IP6 ping-pe-netactuate-chennai-india.localdomain > 2a14:7c0:4d00::1: ICMP6, echo request, id 11214, seq 48123, length 64
07:11:24.340410 IP6 2a14:7c0:4d00::1 > ping-pe-netactuate-chennai-india.localdomain: ICMP6, echo reply, id 11214, seq 48123, length 64

Now, as long as we provide consistent services across all three hosts, users worldwide can enjoy low-latency, high-bandwidth, and stable access. For example, installing nginx on all servers to host my static blog. ~~(Though to be honest, Cloudflare’s free tier still performs better.)~~ If we set up nginx reverse proxies or DNS servers on these hosts, we’ve essentially built our own simple Anycast CDN or Anycast DNS.

However, we also notice that our Singapore Vultr instance receives the least traffic — almost only from local Singapore sources — while other Asia-Pacific regions tend to route through the US. This happens because upstream ISPs don’t always choose routes based purely on hop count or physical distance. They also consider path priority, congestion, and cost.

Network connectivity in the Asia-Pacific region is generally worse than in Europe and North America. Inter-carrier settlement costs are also much higher in APAC, with smaller bandwidth and frequent congestion. Under such conditions, traffic often takes detours (taking a roundabout path via the US). As a result, routes announced in poorly connected regions struggle to propagate to neighboring areas.

To solve connectivity issues in APAC, you either need to add more servers across the region or pay upstream providers for higher-priority routes (like the infamous “CN2 GIA” in China). This also explains why VPS services offering “optimized routing” for China or other APAC regions are usually expensive and come with lower bandwidth or data caps.

Anycast for High Availability

Anycast doesn’t just reduce latency — it also enhances availability. If any node goes offline, the BGP session drops, the route is withdrawn, and traffic automatically reroutes to the remaining nodes. As long as at least one node is up, your stateless service stays online.

Let’s test this in practice: I pinged 2a14:7c0:4d00::1 from Suzhou Mobile — traffic was routed to the German v.ps node. Meanwhile, Shanghai Unicom pings were hitting the US Vultr node. While keeping the ping running, I performed a hard reset on the US Vultr instance to simulate a server failure. Here’s the result from Shanghai Unicom:

2024/9/12 21:21:52 - Reply from 2a14:7c0:4d00::1: time=348ms
2024/9/12 21:21:53 - Reply from 2a14:7c0:4d00::1: time=551ms
2024/9/12 21:21:54 - Reply from 2a14:7c0:4d00::1: time=595ms
2024/9/12 21:21:55 - Reply from 2a14:7c0:4d00::1: TTL exceeded in transit.
2024/9/12 21:21:56 - Reply from 2a14:7c0:4d00::1: TTL exceeded in transit.
2024/9/12 21:21:57 - Reply from 2a14:7c0:4d00::1: TTL exceeded in transit.
2024/9/12 21:21:58 - Destination net unreachable.
2024/9/12 21:22:00 - Reply from 2a14:7c0:4d00::1: time=500ms
2024/9/12 21:22:01 - Reply from 2a14:7c0:4d00::1: time=479ms
2024/9/12 21:22:02 - Reply from 2a14:7c0:4d00::1: time=631ms
2024/9/12 21:22:03 - Reply from 2a14:7c0:4d00::1: time=498ms
2024/9/12 21:22:04 - Reply from 2a14:7c0:4d00::1: time=411ms
2024/9/12 21:22:05 - Reply from 2a14:7c0:4d00::1: time=552ms

We can see that traffic rerouted within 5 seconds. I tested this multiple times — downtime was consistently around 5 seconds. Meanwhile, the Suzhou Mobile pings (which were already routed to the German v.ps node) remained completely unaffected.

The original route from Shanghai Unicom (before failure):

1   2408:84e2:440:a210::70    AS17621  China Shanghai Shanghai  chinaunicom.cn
                                       2.51 ms / 200.58 ms / 251.75 ms
2   *
3   fc00:1000::61             *        RFC4193
                                       37.79 ms / * ms / * ms
4   *
5   *
6   *
7   2408:8000:2:38b::         AS4837   China Beijing   chinaunicom.cn China Unicom
                                       169.25 ms / 67.16 ms / 117.78 ms
8   2408:8000:2:8044::1       AS4837   China Beijing   chinaunicom.cn China Unicom
                                       253.78 ms / * ms / * ms
9   *
10  *
11  2001:1900:2100:32::29     AS3356   USA California Los Angeles  lumen.com
    6-1-2.ear1.LosAngeles6.Level3.net   386.38 ms / 333.42 ms / 281.89 ms
12  2001:1900:2100::40da      AS3356   USA Louisiana Monroe  lumen.com
    CHOOPA-LLC.ear1.LosAngeles6.Level3.net   696.69 ms / 748.12 ms / 646.49 ms
13  2001:19f0:fc00::a44:10e   *        USA California Los Angeles
    ethernetae5-sr2.lax3.constant.com   544.16 ms / 489.78 ms / 425.65 ms
14  2001:19f0:fc00::a44:a6    *        USA California Los Angeles
    ethernetswp55-ds1-m1-r0305-a.lax3.constant.com   651.93 ms / 705.55 ms / 583.08 ms
15  *
16  2a14:7c0:4d00::1          AS214775 Europe
                                       389.20 ms / 525.20 ms / 936.49 ms

New route after the US Vultr instance went down:

1   2408:84e2:440:a210::70    AS17621  China Shanghai Shanghai  chinaunicom.cn
                                       83.69 ms / 34.61 ms / 200.55 ms
2   *
3   fc00:1000::61             *        RFC4193
                                       132.71 ms / 248.39 ms / 198.26 ms
4   2408:8000:9000:408f::d4   AS17621  China Shanghai Shanghai  chinaunicom.cn
                                       2.54 ms / 217.61 ms / 154.24 ms
5   2408:8000:9000:20e6::52   AS17621  China Shanghai Shanghai  chinaunicom.cn
                                       42.33 ms / 202.64 ms / 133.66 ms
6   2408:8000:9000:20e6::2b2  AS17621  China Shanghai   chinaunicom.cn China Unicom
                                       30.97 ms / * ms / * ms
7   2408:8000:2:38b::         AS4837   China Beijing   chinaunicom.cn China Unicom
                                       558.28 ms / 505.24 ms / * ms
8   *
9   *
10  *
11  *
12  *
13  *
14  *
15  *
16  *
17  *
18  *
19  *
20  *
21  2001:470:0:52e::2         AS6939   Germany North Rhine-Westphalia Düsseldorf  he.net
    e0-36.core2.dus1.he.net            338.38 ms / 274.45 ms / 242.29 ms
22  2001:7f8:8::c8e:0:1       AS3214   Germany North Rhine-Westphalia Düsseldorf MegaIX Dusseldorf - xTom GmbH - 10Gbps xtom.com
                                       262.57 ms / 250.95 ms / 462.66 ms
23  2a09:0:1337:c0c0::2       *        Germany North Rhine-Westphalia Düsseldorf
    r1dus.v.ps                         346.01 ms / 296.37 ms / 244.45 ms
24  2a14:7c0:4d00::1          AS214775 Europe
                                       245.00 ms / 446.79 ms / 391.97 ms

Summary

And just like that, we’ve built our own Anycast network!

Unlike traditional AS expansion, our approach doesn’t require internal connectivity between hosts or IGP protocols for internal routing. Instead, each node operates independently, forming an Anycast network to serve stateless applications. Which host a user reaches depends entirely on their network environment. Keep in mind: we can’t guarantee a user will always hit the same node. If you need to provide stateful services, you’ll need to synchronize state across all nodes yourself (e.g., via a distributed database) — but that’s a topic for another day.

That was a lot to write… This series should wrap up in one final part. ~~Stay tuned.~~

This article is licensed under the CC BY-NC-SA 4.0 license.

Author: lyc8503, Article link: https://blog.lyc8503.net/en/post/asn-4-anycast-network/
If this article was helpful or interesting to you, consider buy me a coffee¬_¬
Feel free to comment in English below o/