I recently fixed a bug resolving Domain Local groups in Winbind. I was asked how to reproduce it with a more complex setup, so I had to dig through the Winbind code to understand everything in more detail. I have documented my findings here, in order to retain what I’ve learned and to help others understand how Winbind works.
We have a forest with two AD domains: level1.discworld.site and level2.discworld.site. The two domains have a two way trust. User accounts are created on LEVEL1, groups and machine accounts are on LEVEL2. We have a Linux machine named ‘linux’, with Winbind joined to LEVEL2. I will describe everything from the perspective of Winbind, so LEVEL2 inside of Winbind is also referred to as ‘own domain’.
LEVEL2\samba (members: LEVEL1\ab, LEVEL2\asn, LEVEL2\gd)
Lets assume we have successfully joined the machine ‘linux’ to LEVEL2 and then start Winbind. There is a parent Winbind process which delegates work to Winbind children. The parent forks a child for each logical domain, so in this setup there are 4 domain child processes: LEVEL1, LEVEL2, BUILTIN and SAMBA (local SAM). LEVEL1 and LEVEL2 will connect to their corresponding AD domain controllers.
Querying information from AD domain controllers
If we want to obtain information about users or groups we have to query a Domain Controller. There are two ways to lookup this information. If the corresponding user is not logged in, then we create queries for this information using the machine account. The machine account has limited permissions to query information, especially on Domain Controllers of trusted domains, so most of the time this information is incomplete (or may be incorrect), as we cannot provide more than what the AD domain controllers allow us to read. Often these queries are expensive, so caching is important to reduce the load on the domain controllers. Correct information about e.g. group memberships for a user is obtained when we authenticate as this user. The domain controller will then collect the information with the token of the user and send it to Winbind (netr_LogonSamLogon or Kerberos PAC). In Winbind we cache this information. We have an issue here. If you get the information about a user with the machine account and cache it. Then authenticate as the user and get the information again, we may return the information from the cache. If the groups of a user change while he is logged in, we will not get an update until the user logs in again.
If Winbind authenticates a user there are normally two routes. It could do a normal netr_LogonSamLogon or a samlogon with kerberos. If you want to authenticate a user using samlogon you can do this using ‘wbinfo -a
So if a login is initiated, the main Winbind process gets a pam authentication request. The authentication request is normally sent to the DC Winbind has been joined to. So if LEVEL1+asn is trying to login, the Winbind child handling LEVEL1 will do a LogonSamLogon using the Netlogon PIPE to the domain controller. The domain controller is responsible for collecting all required information about the user and will return all information about group memberships in the info3 structure of the LogonSamLogon response.
If Kerberos is involved the Winbind child handling LEVEL1 will authenticate the user talking to the KDC of the domain controller. All information will be stored in the PAC (Privilege Attribute Certificate) of the Kerberos ticket (which is similar to the info3 structure in the LogonSamLogon response).
id and getent
Information retrieval for ‘id’ or ‘getent’ will take a special route if we are only able to gather it using the machine account. The results collected with the machine account can differ from the results obtained during user authentication! Normally the information sent back from the Domain Controller during authentication is much more detailed and complete. It is possible the results differ between querying information as a user and as a machine account. With the limited resources of a machine account we can only try to get the basic information from the domain controller that the user is a member of. We will not contact trusted domains as enumeration is expensive and often not allowed with a machine account.
Lets look which functions will be called in Winbind if a user runs the command ‘id LEVEL1+asn’. For ‘id’ to be working we assume that nsswitch has been correctly configured to talk to Winbind. There is a libnss_winbind.so module which talks to the parent Winbind process over a unix pipe. The parent Winbind process handles all nsswitch function calls (POSIX functions) coming over the pipe asynchronously. We do not discuss id mapping here, it will get too complex, we will just look on the flow of information.
‘id LEVEL1+asn’ will calls several POSIX functions which are sent over the UNIX pipe to the main Winbind process. These functions are getpwnam, getgrgid and getgroups. We assume that we have cold caches and need to handle these requests using machine account privileges.
The first thing ‘id’ calls is the getpwnam function. This will retrieve basic information about the user like the primary group id, the home directory and shell. The main Winbind process sends three queries to the LEVEL1 child for this: lookupname to get the SID of the user, a second lookupname to translate the SID to the username for verification, and finally a QueryUser call to get the basic information (primary gid, …). The first lookupname is a lsa_LookupNames call to the DC’s LSA server. The second is a LookupSids call to translate the SID to a name again. The QueryUser command is a LDAP query. Normally we always try to get the information with the fastest method and fall back to slower mechanisms if that fails.
After we collect all information and also store them in the cache (the child processes are responsible for the caches) we return the information to nsswitch. Now the ‘id’ command needs to get the name for the primary group and calls ‘getgrgid 1000000’ (1000000 being the gid of the primary group).
The parent Winbind will connect to the LEVEL1 domain child and call lookupname. LEVEL1 will then connect over RPC to the LSA pipe and call LookupSids3 to translate the SID to the name (the idmapping knows about the SID for the gid, the details are left out here).
As we have the important user information it is time to ask for additional group memberships of the user. This results in the following call:
The request is received by the main Winbind process, which needs to resolve the groups on three domains. It is always the same even if the machine is joined to a different domain the user is a member of.
a) The domain the user is a member of (LEVEL1)
b) The local SAM Authority (SAMBA)
c) The BUILTIN domain
We will need the SID of the user first so we ask the LEVEL1 child to resolve the name to a SID. Then for each of the domains we ask the corresponding child to do a LookupUserGroups. The LEVEL1 child will do a LDAP search to get a list of SIDs the user is a member of. Then it will talk to the DC LSA Server and call LookupSids3() to translate the SID into a name for each group. The information is sent back to the parent which will ask the local domain (SAMBA) if there are any aliases that the user is a member of. It will send a LookupUserAliases to the SAMBA child which will lookup the information using pdb. The final step is to talk to the BUILTIN domain for user aliases.
After all of the above POSIX calls were successful id will print the information it collected.
If you login as the user using kerberos first, then the information about the user are cached by the domain child serving the users domain. If you now call getpwnam then query will be filled with the information stored in the cache. Only the SID to name translation requires a LookupSids3 LSA call to the DC if it is not cached yet. The same for get getgrgid or getgroups call. We already got the information from the DC in the PAC which groups the user is a member of. We just need to translate the SIDs to names.
To be continued …