[ Go to July 1997 Table of Contents ]|
-- by Tom Henderson
I'm not sure I want to write about this. But let me describe a recent woeful Web experience while I'm still feeling the pain. Maybe I should look on the bright side: I've honed my detective skills and learned a valuable lesson about mixing Netscape Navigator Gold 3.0 with Microsoft Internet Information Server (IIS) 3.0.
My company hosts three Web sites (two are sites owned by other companies) on IIS 3.0 with an NT 4.0 Server platform underneath. It's easy to do. If you right-click on NT Server's Network Neighborhood, click on Properties and look at the TCP/IP protocol properties, you'll notice an Advanced box. If you click that box, you'll see you can add up to five additional addresses and bind them to the network card's properties.
Once Microsoft's multiprotocol router, code-named Steelhead, arrives, this box will become an important place to add routing information. But for now, this procedure is a way to use one network card to answer to several IP addresses on different Web-server root directories, and it's how we run three sites from a single server. The IIS Manager software allows you to create either home directories or virtual root directories associated with the correct IP address. You can check the configuration using a command line utility from the WINNT directory called IPCONFIG.EXE.
While NT's ipconfig doesn't take the same command-line arguments that the original UNIX command does, it will at least verify the network cards' configurations. The UNIX ipconfig allows options to be set on the command line. Windows 95 users have another utility, called WINIPCFG.EXE, that does a better job of displaying IP configuration information, mostly because it's a Windows application.
The sites lived for many months without a lot of traffic, updates or headaches. Then an update came for one of the hosted sites.
We keep the Web server outside of our firewall, on a collision domain with the router that connects us to our ISP. The router has a T1 line on one side, and the Web server, our firewall and some 56Kbps lines to the sites we host on the other side. The Compatible Systems routers do a great job of connecting to the ISP and are a breeze to configure.
When it was time to update the site, I used ftp to upload the new data files onto the virtual directory associated with the partner site. Once installed, I could see the new site content from a PC behind the firewall, and it looked fine. Then came word from the site owner asking if the Web site was down. I checked it from my PC, and it still looked fine.
Can't see the site
Then we heard that the site couldn't be seen from any dial-up ISP. Strange, I thought. I used CompuServe as a dial-up Internet connection, invoked Internet Explorer and saw the problem for myself. A browser looks up a site's IP address through the Domain Name System server listed in the client's configuration files. At the bottom of Internet Explorer 3.0, it usually first says "Finding site, please wait." Then, after finding the IP address (and presumably a route through the Internet), it loads the page. But in this case, I received the message "Web site found, waiting for reply."
I waited forever ... or at least for 20 minutes until CompuServe finally timed-out and hung up on me.
I called my ISP and asked somebody there to try the site. He said it worked fine for him. Yet any dial-in call incurred the World Wide Wait problem. There's an algorithm called the Van Jacobsen that HTTP uses to pace packet requests. Sometimes it causes latency problems, depending on how the operating system vendor and the network card maker's drivers interact. But the other two sites on the same server had no problems, regardless of the connection's latency.
So I tried some tools. From a dial-up connection, I could ping or ftp the site without problems. The other sites on the same server were fine, though a bit slow compared to my direct connection. I used Net.Medic (http://www.vitalsigns. com) to see if its diagnostic capabilities could pinpoint a problem. Although Net.Medic does many things and points fairly accurate fingers at problem sources, this problem escaped it.
I checked DNS to see if our ISP had unwittingly moved the site's IP address somehow. No, it was listed correctly. I checked the InterNIC Registration Services (http://rs.internic.net/rs-internic.html) thinking that maybe our accounts payable department had overlooked the registration bill and we were slowly being turned off. Nope, everything was in order.
Frustrated, I connected a protocol analyzer to the network segment to find out the differences in packets between a local browser connecting to the site and a remote one. The test had to run late at night, so I could be guaranteed a quiescent network and no stray packets. A trace of the packets showed that any browser on a low-latency connection got the page, but any connection with more than 70 milliseconds of delay didn't get the page. It was that simple. Even more interesting: The other two hosted sites continued to work well with any browser.
I used telnet to check the routers on both my side and the ISP's side to make sure that the routing tables hadn't self-destructed or mutated. I used their ping utilities to check connection latencies, and none seemed abnormal. Everything appeared to be normal except for dial-in connections to that specifically troubled IP address.
I used a Lynx text-only Internet browser in an attempt to see the site. Lynx doesn't load graphics, so it's great for people with visual disabilities or excruciatingly slow modem connections. I thought that perhaps the site's graphics content (several small GIF files) might be causing problems somehow. Lynx worked inside the router connection, but failed outside the router connection-just like other browsers at the troubled site.
Since the site had been down for several days, it was time for desperate action. I wiped the server, downgrading to IIS 2.0, but the same problem existed. I tried different browsers. Still there was no luck, and I wasted an entire weekend.
After a week of tracking down the problem and sincere yet fruitless help from Microsoft's tech support, I started to experiment. I put one site's content on the virtual root directory for the site that wasn't correctly performing dial-up access. I copied the DEFAULT.HTML file from one site's directory into the problem site's virtual directory. The replacement content worked great! I could dial in and see the accurate replacement content. The delivery chain was working, and that narrowed the problem considerably.
Then I thought the problem might be in the Web-page code itself. When I looked at the problem home page's HTML, I discovered a weird header that I hadn't seen before. But since I haven't hand-coded HTML pages in nearly a year, I thought that it must be some cool feature from the site's content authors.
The authors of the troubled site's content used Netscape Navigator Gold 3.0, which produces HTML 3.2 pages. Microsoft verified that IIS 3.0 (and 2.0) only supports HTML code up to version 3.0. I deleted the only apparent HTML 3.2 line of code, which was the header in question, and the site came alive via dial-up ISP connection immediately. Deleting the line didn't seem to affect the page's appearance.
Here's the offending line that nearly caused psychosis in an IIS Webmaster.
<!DOCTYPE HTML PUBLIC</P>
>"-//W3C//DTD HTML 3.2//EN"
I'm still growling when I think of the ordeal. Whose fault is it? Is it the author of the code, who chose to use Netscape? Is it Microsoft's fault for not identifying indigestible HTML? Is it merely a revision synchronization problem? Or is it just a good reason to buy Maalox by the quart?
Contributing editor Tom Henderson is senior vice president of engineering for Indianapolis-based Unitel. Contact Tom in WINDOWS Magazine's areas on America Online and CompuServe, or at the e-mail addresses here.