A few days ago I found myself explaing in Reddit how classic FTP (and by extension FTPS) works. Recollecting it here might be a good way to explain the problems involved in modern networks, especially when working through firewalls and NATs. Actually I’m only going to focus on NAT. I’m bad (read: too lazy) at drawing diagrams so I’ll try to explain in writing.
Terminology
- FTP – the good old protocol going back to 70s
- FTPS – classic FTP over TLS, not much else
- SFTP – SSH based totally different protocol
Why?
FTP is OLD. Almost everything supports it, one way or another. You probably have some requirements around it. Wonky devices, old processes, forgotten agreements and partnerships. Whatever, it has to work.
Those recommending switching to <insert your solutions> just don’t understand how world works. SFTP might seem an easy replacement for end-users but IMHO it has 2 main limitations:
- No builtin support in Windows. Freetards will blame Microsoft for everything wrong in this world but in truth SSH-based stuff is largely confined to power-user and/or UNIX world – please call me when the year of Linux desktop arrives.
Disclaimer: I use it daily but normal people use Windows and don’t understand stuff. To be honest, Windows doesn’t natively do even FTPS, but I digress . - No support for X.509 that is IMHO the biggest problem of the whole SSH ecosystem (I do know PKIX-SSH exists but it’s niche stuff). You might have heard of SSH certificates but sorry to disappoint you – it’s a totally independent and incompatible concept from certificates as you probably know it (as in TLS, SmartCards, PKI etc…). Effectively every user ever has to click through SSH certificate/key warning with no practical way to mitigate it.
- It’s not even related to FTP. OK, not really relevant
How it works – basics
You connect to a FTP site, for example… ftp.adobe.com. No special reason to choose it, it just works. Your FTP client creates a control connection to this FTP site. This is what carries most commands you see in your FTP client, for example:
220 Welcome to Adobe FTP services
USER anonymous
331 Please specify the password.
PASS *********************
230 Login successful.
OPTS UTF8 ON
This control session goes from client to server’s port TCP21, pure Telnet emulation, nothing fancy here. Now when data is involved (even listing directory contents), a second TCP session is created, called data connection.
Interlude history session – Active and Passive mode
Active mode is how it originally worked. Client tells server to establish connectivity to client for data connection. Now, there are people who call the protocol stupid or dumb for this. I’d put these people in the same category as why didn’t NASA use Space Shuttle to save Apollo 13.
FTP was created in the era of end-to-end connectivity, no firewalls no NAT. It made perfect sense in the 80s, up to the change of the century. NAT only became a thing as Internet exploded in popularity and ISPs started putting out broadband gateways with NAT and personal firewalls became widespread (Windows XP and newer).
Passive mode reverses the direction. Client creates data connection to server. In real world it works much better and now (year 2020) Active mode is only used by most obscure things (such as Windows command-line client that does not support Passive mode) and generally not used over public Internet.
We’re going to assume Passive mode from here.
How it works – continued
To list directory contents, data connection is created. Server tells client how to create it:
CWD /
250 Directory successfully changed.
PWD
257 "/"
PASV
227 Entering Passive Mode (193,104,215,67,82,74)
LIST
150 Here comes the directory listing.
226 Directory send OK.
(193,104,215,67,82,74) – that’s IP 193.104.215.67, port 82*256+74=21066. If your client connects there, you get sent directory listing through TCP session. Between response codes 150 and 226, this connection was completed. Same for file transfers, just commands and push-pull directions differ. In theory, server/client may even present a different IP than itself, for protocol quirk known as FXP. If you’re interested, Google it, not relevant here. There’s no authentication in data connection so I’m fairly sure that FTP servers tie data connection’s source IP to control connection source IP. Otherwise FXP would work anywhere when in fact it doesn’t.
Let’s throw in NAT
Server ftp.mysite.com has private IP 10.0.0.1, public IP 1.2.3.4. This is arguably the most popular scenario in modern times.
Client would connect to ftp.mysite.com and resolve it to 1.2.3.4. A control connection is made to 1.2.3.4, login happens and you try to dirlist root folder. What IP should server give to client? By default it’ll give 10.0.0.1.
Client happily connects to 10.0.0.1 and fails. It’s not Internet routable.
Solution 0 – modern FTP client
In reality many modern 3rd party FTP clients ignore RFC1918 IPs for Passive mode over Internet and use control connection’s IP silently. However you absolutely cannot rely on this behavior as your probably most popular client Windows Explorer does not support it. Probably your old business process with old tools and old clients don’t as well.
Solution 1 – NAT Support configuration
I made that name up, there’s not standard name for this functionality
Basically all modern FTP servers have support for this mode. Your garden variety Microsoft IIS FTP site has support for “FTP Firewall Support” that allows you to specify FTP site’s public IP (in this case 1.2.3.4) and port range (you’ll still need to do dNAT/port forward as well). Now, when you connect, server tells client to connect to 1.2.3.4 and it works out fine. This does realistically require static IPs on public end so not very usable for home connections with dynamic IPs.
Solution 2 – Application Layer Gateway
Most enterprise firewalls (and some cheaper stuff) supports FTP ALG. This means that firewall will in real time rewrite IPs in control connection. You don’t have to enter public IP in FTP server configuration but when client connects, firewall will replace private IP with public one transparently.
NAT support with a twist
Let’s make it harder. Now, let’s connect from internal client to internal IP 10.0.0.1 on Microsoft IIS FTP site. Server would tell us to create data connection to 1.2.3.4. Huh… this will probably not work by default with most enterprise firewalls. Or if you don’t have one for ALG, it will break.
Solution 3 – RFC1918 exceptions
Some FTP servers will allow different data connection IP to be presented depending on client IP. The simplest case would present real IP to RFC1918 matching clients and NAT’s public IP to everyone else. Microsoft IIS does not support it.
Throw in TLS
Boom, we have encryption. If you have IIS, firewall will give up as it can’t do squat on encrypted control connection. Choose internal or external connectivity.
Solution 4 – Hairpin NAT
…will work perfectly (also with RFC1918 exceptions case), with caveat of some extra load on fireall. In most firewalls you have to specifically allow this mode. I’m not going to explain the principles of Loopback NAT/Hairpin NAT but this will save your day. In fact you could/should tell internal clients to connect to public address only and it’ll work fine.
Conclusion
The only thing that will work in every case is Hairpin NAT. There are schools of though fighting over Split DNS and Hairpin NAT and in this case Hairpin NAT wins hands down because Split DNS would not help as FTP data connection knows nothing about DNS, only raw IPs.
PS. IMHO Hairpin NAT always wins but what do I know, potayto-potahto.
You could run internal FTP on different IP with cloned configuration (except firewall/NAT configuration), possibly Split DNS record but I don’t consider this a better solution, especially in case of more complex server configuration.
Do Hairpin NAT, please.
Please do not ask me how to configure your firewall.