[ Go to August 1997 Table of Contents ]|
-- by Tom Henderson
I know several good reasons to use multiple domains in an enterprise. The first is to logically delineate the resources in an organization. This is often done by geography: Location domain controls are the most popular and logical to use. Another reason is to cut the amount of administrative detail it takes to manage a locale or group of users and servers.
Unfortunately, multiple domains within an organization have become commonplace for another reason: the mistaken belief that multiple domains automatically increase security by providing an added layer of partitioning.
The myth of partitioning has a bit of credence. The fewer people who have keys to resources, the fewer people who can mess them up or provide inadvertent security entrance points. Trusted domains link different NT server domains, allowing a home or "master" domain to link to other domains. Users within the trusted domain receive access to the foreign domain's resources according to the usual set of rules.
However, there are several ways that trusted domains can become untrusting. A former associate of mine who administers a large WAN recently found out how the relationships can fall apart. Her catastrophe occurred after a holiday weekend when several pieces of the network changed. A new router went online, a relational database server received a hardware upgrade and the corporate WAN received an upgrade.
Her company's network operations center (NOC) consists of several servers that support two primary business activities, as well as data gathering from branches, and e-mail and electronic data interchange (EDI) links to their business partners.
Users were warned that log-in services would be suspended and Internet mail might not be delivered over the long holiday weekend. Administrators also advised that WAN and remote-access services would be intermittent.
The first step in the equipment deployment came when the new routers went online. Higher-speed circuits to both the Internet and a major branch were installed. This adjustment to the network backbone took most of the servers in the NOC offline. The NT servers, which included a domain controller, a backup domain controller and a Microsoft Exchange server within one domain, were taken offline. A primary domain controller (PDC) and a SQL Server in another domain were also taken offline; the SQL Server received two additional 6GB drives, and the database administrator restarted the server.
So far everything looked great. The router backbone and the WAN were brought up. Lots of green blinking lights lit up the backbone racks as routers started talking to each other from across the country. A fat HP9000 was brought up, and then the NT servers were powered on. That's when the trouble started.
The database administrator started a session on a Citrix server in the NOC and tried to link to the PDC for the SQL Server. The SQL Server's location was adjacent in the NOC, but in a different domain. No go. In fact, the trusted relationships for all three domains in the NOC failed. The walls between the domains had grown infinitely high.
The passwords for the trusted and trusting domain accounts were reentered but only brought an error message. It was Sunday at 7 p.m., and things looked grim. That's when my phone rang. I received a fast education that night.
The establishment of initial trusted relationships between domains ignites a sequence of events that might surprise you. After about a week, the initial password that's used to establish a trusted relationship is discarded. A secure channel and remote procedure calls (RPCs) continue the link. The secure channel links trusted domain PDCs only-not backup domain controllers (BDCs)-to send information about user accounts and other System Administration Manager (SAM) information. If the PDCs can't communicate with each other during an update cycle for passwords, the passwords fall from revision synchronization. The trusted relationships fail not because the administrator-entered password is incorrect, but because the private (Microsoft goes so far as to call the password "secret") password is wrong.
The BDC doesn't carry this information, because the PDC is the link between domains in this case. There are three possible cures:
1) Delete the trusted relationships in the "Permitted to trust this domain" and Trusted Domain sections of the User Manager/Policy/Trust Relationships dialog box. Enter the domains permitted to trust in each failed PDC. Having finished each of the PDCs, reenter the trusted domain section for each of them. This reestablishes the initial link.
2) Promote a server in the domain rapidly from server to BDC, then PDC. Don't hesitate, because the BDC will try to populate the freshly promoted PDC with erroneous data. This usually happens when user information gets corrupted. After the promotion, you can restore the access control list (ACL) and SAM information from a previous backup that's known to be good.
3) Edit certain Registry entries. I won't list the entries because of current security gaps in NT Web servers. The information is available, but I have to force you to hunt for it. Sorry.
The passwords change randomly. In this case, they were about to change when the router was down and communication between the servers ceased. We chose Fix 1 because of the small number of discrete domains. By midnight, everything was talking to everything.
Why does Microsoft use this method when there's a chance that relationships could fail? Remember that authentication takes place between servers over the network wires with the secret passwords and accounts (there's another clue to Fix 3). Users with shareware network protocol analyzers can easily find the password that permits trust by watching network traffic. Random changes force expiration of passwords without additional administrator intervention. Administrators change the passwords used at the top layer, which can still force a password change at the layer underneath.
Encrypted passwords can't guarantee security. It's becoming common to see session hijacks where passwords are grafted and then used in authentication hacks. I'll live with the trade-off until something more usable arrives.
Until Active Directories and the Lightweight Directory Access Protocol deployment in Microsoft products roll around sometime in early 1998, we're stuck with trusted domains' nuances and oddities. Active Directories, when additionally deployed and supported inside of network routers, will add an extra layer of protection. The Microsoft-Cisco venture may bring about products that trap access at the router level in front of network security.
The Microsoft Internet Information Server (IIS) problem I wrote about last month has become an ongoing story. The final solution to the problem was different than I thought. Although IIS 2.0 and 3.0 can't handle HTML 3.2, there's another setting that has stabilized the site. It's a Registry entry.
Downstream from our connection was a black hole router. In a Web conversation, a browser accesses a server that returns a page. The server delivers the page as a set of packets. When a router has a packet size smaller than the size the server is sending, it is supposed to notify the HTTP server that it doesn't accept the packet size. The router should send a ceiling called the Maximum Transfer Unit (MTU) to the HTTP server, which responds with smaller packets. A black hole router trashes packets that are too big but never notifies the server. The result is an infinite wait for packets.
The Registry setting is:
(REG_DWORD, 0=disabled, 1=enabled)
Microsoft technical support (in Raleigh, N.C.) made the catch. A detailed explanation is available at the Microsoft Web site at http://www.microsoft.com/kb/ with the Knowledge Base article Q136970.
Contributing editor Tom Henderson is senior vice president of engineering for Indianapolis-based Unitel. Contact Tom in WINDOWS Magazine's areas on America Online and CompuServe, or care of the editor at the addresses on page 20.