Cross-forest Kerberos custom SPN routing

One day I was working on a problem of setting up a kerberized service with custom SPN across multiple trusted AD forests. Cross-forest Kerberos is not unusual but I had never worked with custom/disjoint SPNs/domains.

User forest containing domainA.com would need to authenticate to SPN myservice.com located at resource forest containing domain forestB.com.

By default you get suffix routing for default UPN/domain name/Kerberos realm name. As client nor DC will search GC in other forests for SPNs, i scratched my head a bit and went back to documentation that I hadn’d probably read for over a decade.

Thank god Microsoft hasn’t yet deleted all Windows 2003 era stuff. While initially reading about name suffix routing i was a bit puzzled as it was talking about SPNs and UPNs together yet I could not remember nor find anything specific to SPNs. Cross-forest access pointed me to Trust Domain Objects (TDO). This is a bit of dead end as the important msDS-TrustForestTrustInfo attribute is a blob and I didn’t really want to start parsing it.

However as again UPN and SPN suffixed were being discussed together, I added custom UPN suffix myservice.com to forestB.com. After enabling routing – lo and behold – it worked for SPNs as well. This clicked some memories where I had browsed some old AD environments and seen some UPN suffixes that had not made any sense. Probably the same case.

This didn’t leave me satisfied as there definitely there had to be a more elegant solution. UPNs show up in user management and this may not be something that you want.

After digging through schema and some other searches I came upon ms-DS-SPN-Suffixes attribute that seemed interesting and actually seemed to do the job when manually edited. Yet more searches led me (back) to Set-ADForest CmdLet that has parameter to modify SPN suffixes. Huh, never noticed that.

Now looking back and knowing what to look for, it has been well described in great detail for 10 years. The linked blog post has some other great details or options.

TL;DR: For cross-forest Kerberos with custom SPN suffixes you need to use Set-ADForest to set routing hints for trusts.

Azure MFA plugin for NPS is all or nothing

Some time ago, a customer wanted to use Azure MFA for some NPS authenication requests (network policies). It turns out that it affects all authentication attempts. Imagine a (quite real-world) scenario where NPS or NPS farm should service:

  • VPN appliance authentication backend, that should have MFA
  • 802.1x with EAP-TLS

Well you can’t pick and choose. All requests get MFA treatment though I’d say you don’t need (or want) it for internal 802.1X. I haven’t found any information or documentation to say otherwise.

So if you need some network policies to have MFA and other not to have then you need to look to other solutions or just deploy a separate NPS server or farm for MFA.

ADMT Password Export Service and fixing RPC Server Unavailable

I was working on a forest migration the other day and stumbled on this infamous error a few days ago. 1722 RPC Server Unavailable.

Huh… tried a few simple Googled things

  • Check firewall for ports
  • Reinstall
  • Check permissions
  • Check registry flags
  • Check if service is running
  • Change service account
  • Probably something else that I don’t remember

Debugging

When none helped, I started to dig deeper. Password Export Server (PES) is quite a niche tool so there’s very little information on how it actually works. Portqry showed that RPC was listening on a named pipe. Therefor it talks on TCP445 and high ephemeral ports access through firewall is irrelevant, it only talks over SMB.

Tried rpcping and rpcdump from old Windows 2003 Resource Kit, RPC part seemed to work just fine.

Well Wireshark it is. After some packet capture it was clear that initial pipe setup was successful but error was coming from a FSCTL after opening the pipe, FSCTL_PIPE_TRANSCEIVE. This has exactly one useful result, about Microsoft Exchange. That rabbit hole led nowhere, let me save you the trouble, ACL problem described actually is irrelevant.

At this point it was getting quite interesting. I had exhausted simple debugging options so I started to investigate, how this actually works. First I opened installation MSI in ORCA. What I saw is that it installs 3 files including a library (DLL) that does not seem to be used anywhere. I analyzed some dependencies and it seemed independent. Interesting…

Further analysis of MSI let me to an interesting CustomAction function call “AddToLsaNotificationPkgValue”. CustomAction binary itself is a DLL. I don’t have the skills decompile this but I saw some function calls that caught my interest.

LSA? Notfication Packages? PasswordChangeNotify? Sounds like Active Directory Password Filters! Quick check with Nirsoft DLL Export Viewer confirmed that this PwMig.dll is indeed a password filter, with only 3 compulsory functions exported. At this point it clicked to me how PES actually works – back to this later, let’s continue with debugging.

To register a password filter, you have to place the relevant DLL file in System32 folder and add a registry value. Password filters are quite rare in the real world, they have 2 real-world uses, neither that is very widespread

  • Password synchronization (from Active Directory)
  • Custom password complexity requirement enforcement

However this environment had a password synchronization solution in use so the value existed. Also the required registry value was of the wrong type. Documentation requires REG_MULTI_SZ however REG_SZ was actually used.

Fix

The real fix was adding recreating Notification Packages value with correct type and adding existing password sync DLL name with PES DLL name. I’m not sure if the real problem was wrong value type or the value simply existing. Either way the setup didn’t handle the error and didn’t give any feedback that the registry field update had failed.

You must restart the DC as the DLL only gets loaded at LSASS start.

How it works

By design you cannot read object’s password hash in Active Directory. At least this is what the documentation tells you. However it is very useful for migration scenarios to be able to migrate password (as a hash).

LSASS itself can of course read the hash, for password validation. There is just no API for user to access it. Getting code into LSASS also tends to be quite hard without injection.

My suspicion is that Microsoft used password filter to… let’s say legally/officially… get code into LSASS. Password filter part is probably a red herring. I don’t have the skills to investigate but these 3 functions likely do nothing. On the other hand, DLL probably enables a new interface (or backdoor?) to read hashes. I had noticed earlier that PES holds a handle to LSASS, so probably PES is just a proxy to LSASS. RPC error is probably returned by LSASS that doesn’t have the DLL loaded and therefor no interface for PES to access. PES merrily proxies the error and you are left scratching your head. Now you know.

Veeam Legend

Apparently I’ve been named a Veeam Legend for occasionally writing about stuff on their forum. Well, can’t hurt.

Maybe I’ll take some time to update the blog as well, update frequency has been disgraceful. There’s a dozen half-completed articles in pipeline since forever but kids and free time don’t mix well.

Joining VMware templates to custom Organizational Unit with customization specification

By default, customization specification has domain join function. Sad part is that is doesn’t allow for selecting your custom organizational unit. Also you can’t upload your custom unattended XML and preserve the option of entering desired VM name during template deployment. Therefor you’re stuck with default CN=Computer or whereever this is redirected. In bigger environments this might be an issue as you might need to join templates to join different OUs depending on different requirements.

One option is enabling autologin for built-in Administrator once and using RunOnce commands to run NetDom.

netdom.exe join %COMPUTERNAME% /domain:my.domain.com /userd:NETBIOS\domainjoinserviceaccount /passwordd:PaS$W0rd /ou:"OU=my,OU=custom,OU=Organizational Unit,DC=my,DC=domain,DC=com" /reboot

This is old news and used to work fine until a few months ago and I unexpectedly discovered that variable substitution was done before changing computer name and NetDom used name in template (something random as by default), causing netdom to fail (as it needs to realistically be local computer name).

After some head scratching, simple workaround was to simply wrap it in PowerShell to hide the batch variable so it doesn’t get substituted until the last moment. Might have done native CmdLet but it’d likely require a very complex oneliner to prepare a credential object.

powershell netdom.exe join $env:computername /domain:my.domain.com /userd:NETBIOS\domainjoinserviceaccount /passwordd:PaS$W0rd /ou:"OU=my,OU=custom,OU=Organizational Unit,DC=my,DC=domain,DC=com" /reboot

The main problem with this approach is that plaintext passwords are written to unattended.xml that is not cleaned up after process completes. Windows cleans up explicit unattended domain join credentials after specialization but credentials in runonce commands get left behind.

First try was to just delete file in next runonce command however unattended.xml still seems to be in use during command execution and you can’t simply delete it. One option would be to leave a custom script in template that would register unattended.xml in PendingFileRenameOperations to be deleted on restart. Simpler way is to apply GPO that would delete the answer file.

Don’t leaks your privileged credentials.

FTP and firewalls and NAT

A few days ago I found myself explaing in Reddit how classic FTP (and by extension FTPS) works. Recollecting it here might be a good way to explain the problems involved in modern networks, especially when working through firewalls and NATs. Actually I’m only going to focus on NAT. I’m bad (read: too lazy) at drawing diagrams so I’ll try to explain in writing.

Terminology

  • FTP – the good old protocol going back to 70s
  • FTPS – classic FTP over TLS, not much else
  • SFTP – SSH based totally different protocol

Why?

FTP is OLD. Almost everything supports it, one way or another. You probably have some requirements around it. Wonky devices, old processes, forgotten agreements and partnerships. Whatever, it has to work.

Those recommending switching to <insert your solutions> just don’t understand how world works. SFTP might seem an easy replacement for end-users but IMHO it has 2 main limitations:

  • No builtin support in Windows. Freetards will blame Microsoft for everything wrong in this world but in truth SSH-based stuff is largely confined to power-user and/or UNIX world – please call me when the year of Linux desktop arrives.
    Disclaimer: I use it daily but normal people use Windows and don’t understand stuff. To be honest, Windows doesn’t natively do even FTPS, but I digress .
  • No support for X.509 that is IMHO the biggest problem of the whole SSH ecosystem (I do know PKIX-SSH exists but it’s niche stuff). You might have heard of SSH certificates but sorry to disappoint you –  it’s a totally independent and incompatible concept from certificates as you probably know it (as in TLS, SmartCards, PKI etc…). Effectively every user ever has to click through SSH certificate/key warning with no practical way to mitigate it.
  • It’s not even related to FTP. OK, not really relevant

How it works – basics

You connect to a FTP site, for example… ftp.adobe.com. No special reason to choose it, it just works. Your FTP client creates a control connection to this FTP site. This is what carries most commands you see in your FTP client, for example:

 

220 Welcome to Adobe FTP services
USER anonymous
331 Please specify the password.
PASS *********************
230 Login successful.
OPTS UTF8 ON

This control session goes from client to server’s port TCP21, pure Telnet emulation, nothing fancy here. Now when data is involved (even listing directory contents), a second TCP session is created, called data connection.

Interlude history session – Active and Passive mode

Active mode is how it originally worked. Client tells server to establish connectivity to client for data connection. Now, there are people who call the protocol stupid or dumb for this. I’d put these people in the same category as why didn’t NASA use Space Shuttle to save Apollo 13.

FTP was created in the era of end-to-end connectivity, no firewalls no NAT. It made perfect sense in the 80s, up to the change of the century. NAT only became a thing as Internet exploded in popularity and ISPs started putting out broadband gateways with NAT and personal firewalls became widespread (Windows XP and newer).

Passive mode reverses the direction. Client creates data connection to server. In real world it works much better and now (year 2020) Active mode is only used by most obscure things (such as Windows command-line client that does not support Passive mode) and generally not used over public Internet.

We’re going to assume Passive mode from here.

How it works – continued

To list directory contents, data connection is created. Server tells client how to create it:

CWD /
250 Directory successfully changed.
PWD
257 "/"
PASV
227 Entering Passive Mode (193,104,215,67,82,74)
LIST
150 Here comes the directory listing.
226 Directory send OK.

(193,104,215,67,82,74) – that’s IP 193.104.215.67, port 82*256+74=21066. If your client connects there, you get sent directory listing through TCP session. Between response codes 150 and 226, this connection was completed. Same for file transfers, just commands and push-pull directions differ. In theory, server/client may even present a different IP than itself, for protocol quirk known as FXP. If you’re interested, Google it, not relevant here. There’s no authentication in data connection so I’m fairly sure that FTP servers tie data connection’s source IP to control connection source IP. Otherwise FXP would work anywhere when in fact it doesn’t.

Let’s throw in NAT

Server ftp.mysite.com has private IP 10.0.0.1, public IP 1.2.3.4. This is arguably the most popular scenario in modern times.

Client would connect to ftp.mysite.com and resolve it to 1.2.3.4. A control connection is made to 1.2.3.4, login happens and you try to dirlist root folder. What IP should server give to client? By default it’ll give 10.0.0.1.

Client happily connects to 10.0.0.1 and fails. It’s not Internet routable.

Solution 0 – modern FTP client

In reality many modern 3rd party FTP clients ignore RFC1918 IPs for Passive mode over Internet and use control connection’s IP silently. However you absolutely cannot rely on this behavior as your probably most popular client Windows Explorer does not support it. Probably your old business process with old tools and old clients don’t as well.

Solution 1 – NAT Support configuration

I made that name up, there’s not standard name for this functionality

Basically all modern FTP servers have support for this mode. Your garden variety Microsoft IIS FTP site has support for “FTP Firewall Support” that allows you to specify FTP site’s public IP (in this case 1.2.3.4) and port range (you’ll still need to do dNAT/port forward as well). Now, when you connect, server tells client to connect to 1.2.3.4 and it works out fine. This does realistically require static IPs on public end so not very usable for home connections with dynamic IPs.

Solution 2 – Application Layer Gateway

Most enterprise firewalls (and some cheaper stuff) supports FTP ALG. This means that firewall will in real time rewrite IPs in control connection. You don’t have to enter public IP in FTP server configuration but when client connects, firewall will replace private IP with public one transparently.

NAT support with a twist

Let’s make it harder. Now, let’s connect from internal client to internal IP 10.0.0.1 on Microsoft IIS FTP site. Server would tell us to create data connection to 1.2.3.4. Huh… this will probably not work by default with most enterprise firewalls. Or if you don’t have one for ALG, it will break.

Solution 3 – RFC1918 exceptions

Some FTP servers will allow different data connection IP to be presented depending on client IP. The simplest case would present real IP to RFC1918 matching clients and NAT’s public IP to everyone else. Microsoft IIS does not support it.

Throw in TLS

Boom, we have encryption. If you have IIS, firewall will give up as it can’t do squat on encrypted control connection. Choose internal or external connectivity.

Solution 4 – Hairpin NAT

…will work perfectly (also with RFC1918 exceptions case), with caveat of some extra load on fireall. In most firewalls you have to specifically allow this mode. I’m not going to explain the principles of Loopback NAT/Hairpin NAT but this will save your day. In fact you could/should tell internal clients to connect to public address only and it’ll work fine.

Conclusion

The only thing that will work in every case is Hairpin NAT. There are schools of though fighting over Split DNS and Hairpin NAT and in this case Hairpin NAT wins hands down because Split DNS would not help as FTP data connection knows nothing about DNS, only raw IPs.

PS. IMHO Hairpin NAT always wins but what do I know, potayto-potahto.

You could run internal FTP on different IP with cloned configuration (except firewall/NAT configuration), possibly Split DNS record but I don’t consider this a better solution, especially in case of more complex server configuration.

Do Hairpin NAT, please.

Please do not ask me how to configure your firewall.

Loading certificates from SK LDAP for Estonian ID-Kaart SmartCard authentication to Active Directory – the old way

Phew, that’s a long title. But to the point. Many years ago I promised to release that script. In the meanwhile ID-Kaart PKI topology has changed but I think that the script remains quite relevant as it should be quite easy to fix up.

About LDAP interface. I think you need to query both as not all cards from old root have expired.

The official doc for configuring ID-Kaart login:

Unfortunately it lacks mass-loading. Using ADUC per-certificate is just… not scalable at all.

Remarks:

  • It was originally written… I guess about 7 or 8 years ago for exactly that reason – manual loading of certificates is just impossible but in the smallest of environments. First attempt used commercial cmdlets as native LDAP in PowerShell used to require (still does?) some native .Net binding and it was easier that way.
  • There were a few commercial products for mass-loading but I guess I just closed their businesses if they even still exist (didn’t check)
  • In the olden days you required a contract with SK as LDAP was (is?) throttled for those without whitelisted IPs. Too many queries got you blocked for some time. Maybe a few sleeps here and there helps…
  • As usual, some logging and crust have been removed.
  • I’m not going to discuss all the requirements for SmartCard login, SK’s document has a pretty good overview.
  • But you CAN use one certificate with several accounts, unlike stated in SK’s document. Maybe more on this later.
  • I don’t remember exactly where I got the LDAP code from but I think it was some SDK example for C# or something. Who knows, MS keeps dropping useful doc all the time so it’s probably gone anyways.
  • Maybe oneday I’ll fix it up for new topology, perhaps one query per person or more optimizations…
  • Not supported, not tested (after a few changes just now),  a bit of code rot (not used by me for years) – understand what you are doing

 

Function Get-AuthenticationCertificate {
    param(
        [long]$IDCode,
        [string]$Type
    )
    $Filter = "serialnumber=$IDCode"
    $BaseDN = "ou=Authentication,o=$Type,c=EE"
    $Attribute = "usercertificate;binary"
    $Scope = [System.DirectoryServices.Protocols.SearchScope]::subtree
    $Request = New-Object System.DirectoryServices.Protocols.SearchRequest -ArgumentList $BaseDN, $Filter, $Scope, $Attribute
    $Response = $LdapConnection.SendRequest($Request, (New-Object System.Timespan(0,0,120))) -as [System.DirectoryServices.Protocols.SearchResponse]
    If ($Response.Entries.Attributes.$Attribute) {
        $Certificate = [System.Security.Cryptography.X509Certificates.X509Certificate2] [byte[]]$Response.Entries.Attributes.$Attribute[0] #Cast byte array to certificate object
        Return ("X509:<I>" + $Certificate.GetIssuerName().Replace(", ",",") + "<S>" + $Certificate.GetName().Replace(", ",",")) #Probably string replacement is not needed, just following empirical behavior from ADUC.
    }
}

#Contains all useful SK LDAP Certificate branches
$SKCertificateBranches = @("ESTEID","ESTEID (DIGI-ID)")
[Reflection.Assembly]::LoadWithPartialName("System.DirectoryServices.Protocols") 
$LdapConnection = New-Object System.DirectoryServices.Protocols.LdapConnection "ldap.sk.ee" 
$LdapConnection.AuthType = [System.DirectoryServices.Protocols.AuthType]::Anonymous
$LdapConnection.SessionOptions.SecureSocketLayer = $false #New one uses TLS
$LdapConnection.Bind()
#Loads AD Users. For example you store ID code in extensionAttribute1.
#There is no validation or filter IF actually user has ID-code stored. That's a task left to you as it's quite environment dependent. For example refer to my article about ID-code validation
$ADUsers = Get-ADUser -Filter *-SearchBase "DC=my,DC=domain,DC=com" -Properties altSecurityIdentities,extensionAttribute1
ForEach ($ADUser in $ADUsers) {
    $UserSKCerts = @()
    ForEach ($SKCertificateBranch in $SKCertificateBranches) {
        $UserSKCert = Get-AuthenticationCertificate $ADUser.extensionAttribute1 $SKCertificateBranch #positiional attributes
        If ($UserSKCert) {
            $UserSKCerts += $UserSKCert #Slow but whatever, it's a small array
        }
    }
    #Arrays must be sorted before compare because they are retrieved in undetermined order
    If (Compare-Object -ReferenceObject $UserSKCerts -DifferenceObject $ADUser.altSecurityIdentities) {
        Set-ADUser $ADUser -Replace @{"altSecurityIdentities"=$UserSKCerts}
    }
}
$LdapConnection.Dispose()

Quirks in permission management with vCenter Content Libraries

First of all, Content Libraries are a pretty useful concept in larger environments. I especially use it for automatically sync between different vCenters that are physically separated. It also saves the user (usually a clueless sys/app admin) from browsing and finding files, replacing it with a flat list of items. Great huh?

Now the bad parts. Read all the way through because some things have implications and workarounds below.

No default access

That is normal. Annoying thing is that you also don’t get a default role for regular users (as in content consumers, not managers). I’m going to save you the hassle. You need a custom role with these privileges:

  • Content Library – Download files
  • Content Library – Read storage
  • Content Library – View configuration settings

Global Permissions with required inheritance

Content Library permissions are Global Permissions only and Content Libraries only inherit permissions – they do not have any explicit permissions of their own so you are forced to use “Propagate to children” flag.

Why is this a problem? Several things:

  • No privilege separation between libraries. You can’t have internal… “tenants” with separated content – it’s all shared. Yes – there’s overarching products for that but I’m talking basic vCenter functionality.
  • If you have several vCenters (with ELM), all permissions propagate to all libraries in all vCenters.

And the thing I hate the most. It’s impossible to create a custom role without implicit “Read-only” privileges. Believe me I’ve tried with different APIs. If you create a role, it always includes read-only privileges. Try it out and check results in PowerCLI. There are some privileges that cannot be removed. According to GSS, it’s by design.

Implication is that it’s harder to have delegated minimal permissions on objects. Everybody that needs library access will see every object in all vCenters due to inherited implicit read-only, even if you haven’t delegated any permissions. Pretty bad (confidentiality between delegated users) or just annoying (seeing possibly thousands of objects that have no relevance to user) depending on your environment.

Luckily there’s a simple workaround – overwrite permissions with “No access” on vCenter level (every vCenter that is). This built-in role is the only one that does not include read-only. That is – if your delegated permissions don’t require vCenter permissions, I can’t see a reason for that right now. As you probably have delegated permissions somewhere below, they will overwrite “No access” again and delegated access will work. Funny thing – if you clone “No access” role, the new role gets read-only added…

ISO mount requires “Read-only” on Content Library datastore

This took some thinking and trying to figure out. Let’s say your library is stored on a VMFS datastore that is not visible to your users. Sounds reasonable, it’s backend stuff after all and users should have no business there directly.

Now, deploying templates from this library on hidden datastore will work fine. However when you want to mount ISOs, you get an empty list. If you add “Read-only” to this datastore, it’ll start working. Keep in mind that this role will only show object metadata (and show it in any datastore list with no actionable features), but users can’t see contents or change/write anything.

Maybe will update, if I’ll find something else.

FeatureSettingsOverride bitmap

If I understood information here correctly, you can currently play with following mitigations. More will surely show up over time.

Value Platform CVE Notes
1 Intel CVE-2017-5715 Disables Spectre Variant 2 mitigation
2 Intel CVE-2017-5754 Disables Meltdown mitigation
8 Intel CVE-2018-3639 Enables Speculative Store Bypass mitigation
64 AMD CVE-2017-5715 Enable Spectre Variant 2 mitigation on AMD

Combinational values that are seen

  • 0 – enable Spectre/Meltdown on Intel
  • 3 = 2 +1 – disable Spectre/Meltdown on Intel

By adding bits together, you could create your custom mitigations. For example:

  • 72 = 64+8 enable all mitigations on all platforms.
  • 11 = 8+2+1 enable CVE-2018-3639 but disable CVE-2017-5715 and CVE-2017-5754

I’m not sure if these values would make any sense or work at all but my guess is that they will not crash anything. By observation, i think each mitigation is optional and can be enabled atomatically if hardware/microcode supports it. I don’t have an AMD at hand but someone could try out these homebrew combinations.