[ Go to June 1997 Table of Contents ]

Save As... HTML
A corporate intranet can put all your shared information just a browser and a mouse click away-but the road to HTML can be a rocky one.

-- by Lynn Ginsburg

Examine a diagram of a company's installed computers, and they may look like well-tuned machines brought into harmony by an embracing network. But take a walk through the trenches, and you'll most likely find hostile encampments supporting divergent operating systems. A typical corporation is a house divided, with a variety of computing platforms under one roof, including systems that speak only Windows, Macintosh or various dialects of UNIX. Making the mix work as one can be an IS department's darkest nightmare.

Enter the intranet. The term's been bandied about enough to gain buzzword status, but, for once, there's plenty of substance behind the hype. To put it simply, an intranet is an in-house network that combines the benefits of Internet access and protocols with the security and privacy of a corporate LAN. Most importantly, by providing a common data format based on HTML, an intranet can bring about a detente in the platform cold war.

Intranet users need only a browser to access HTML data; regardless of operating system or hardware, they'll all see exactly the same data in the same format. That bridges the platform gap, but it also gives rise to a new set of problems: How do you get that enormous backlog of corporate documents, files, forms and publications into HTML? Solutions abound for new documents, with dozens of capable HTML design and editing programs available to create intranet-ready documents. But the reams of legacy data tucked away on file servers and hard drives throughout the company call for a different approach, and different tools.

You'll need software to convert your company's documents from various proprietary formats into HTML. The solutions are likely to vary, depending on the types of documents you need to convert. For example, you may be able to convert your word processing documents to HTML using the program that created them, or another word processor. But for documents with more complex formatting and layout-with a mix of data, images, fonts and other design elements-you'll have to employ more sophisticated means to create picture-perfect HTML replicas.

We'll examine several file formats and document types, and explore the best ways to convert them to HTML. No matter which operating system camp you've pitched your flag in, you'll quickly find yourself becoming a follower of the unifying force of HTML.

Introducing Your Intranet

The infrastructure for your corporate intranet is probably already in place. An intranet can use the network you've already installed and can run over virtually any kind of wiring, on virtually any network OS. You'll probably need a system to host the required Web server software, but an existing server and an inexpensive-or even free-Web server program might suffice. So initial outlay for your intranet could be modest enough to put only a small dent in your IS budget.

With the back end of your intranet in place, you'll need to standardize on a browser so your users can access HTML documents. Standardizing on a single browser is important, as a document's layout and design are typically optimized for a particular browser's capabilities. For most Web authors, Netscape Navigator remains the de facto standard for document layout, although Internet Explorer is gaining ground quickly.

Browsers are designed to read and display HTML documents. However, they typically can display or interpret a variety of other standard file types as well, including ASCII text, sound, graphics, video and VRML. Navigator and Explorer support plug-ins that allow users to display intact proprietary formats, such as Shockwave (which displays Director, Authorware and FreeHand files), RealAudio (streaming audio) and Adobe Acrobat (for displaying formatted Acrobat PDF files)

Managing Change

To illustrate the conversion issues related to different types of documents, we looked at three common document types-word processing files, spreadsheets and highly formatted, complex documents like newsletters-to determine the software and strategies you'll need to render them into HTML.

To perform the first task-converting a word processing document-you simply export plain text to HTML. Many word processing programs, including WordPerfect and Microsoft Word, offer a Save As HTML option that can quickly turn plain text into HTML. However, it may take some extra work to maintain even basic formatting options, such as layout and spacing; italicized or bold text; special fonts; and simple graphics elements like borders and lines.

For our test, we used a word processor to create a document containing a basic logo, bulleted and indented text, and italicized text. We created the same document in both WordPerfect 7 and Microsoft Word 97.

When we converted the document into HTML in WordPerfect, we had a couple of options using the program's included Internet Publisher (available on the File menu). We could choose to convert the document to HTML and instantly see the results; or we could publish it to the Web, converting the document and saving it to a specified location, while keeping the document live in WordPerfect format.

We viewed the results using Netscape Navigator. The HTML version, while not completely faithful to the original, was better than what Word 97 would later yield. With WordPerfect we lost the custom horizontal graphics lines in our logo (the program instead inserted a generic horizontal embossed rule), as well as the logo's font. We expected the font switch, because browsers display only a few standard fonts. Also, WordPerfect replaced the square bullets we had used with round ones, but it did a remarkably good job of maintaining our original's layout and spacing. Italics and indented text also survived the process intact.

The Word on Word

We then converted the same basic document we'd created using Word. Word offers a single conversion choice: Save As HTML. The conversion output was less successful than WordPerfect's. The logo's horizontal graphics lines and font weren't reinterpreted-they were lost completely. Word also lost the document's spacing, placing large spaces between each line. We could tighten the spacing by going back into the original Word document and making adjustments so the HTML version would more closely resemble the original.

If your company uses Microsoft's Office suite, you'll probably want to make Internet Explorer your standard-issue browser. A helpful tie-in between these programs can eliminate the need to convert to HTML native Office formats like Word and Excel, but still make them available over your intranet. If you have an Office document's app installed, Internet Explorer can open the document while maintaining its format perfectly. Office 97 can pull off a similar stunt in Netscape: If you open a Word document in Navigator, the program will launch Word (if it's installed on your system) to let you view the document. If your users don't have the appropriate Office 97 applications installed, they'll need special viewers that work within IE.

In both cases, to restore the document to its complete original format, we had to re-create the logo elements that the word processing programs couldn't faithfully convert by converting them to graphics objects. To do this, we used a good shortcut-we pressed the keyboard's Print Screen button to capture an image of the original document while displaying it in the word processing program. Next, we used an image-editing program-Paint works fine-and cropped the individual elements we needed to create bitmap images. Finally, we put the bitmaps in place using an HTML editing program (such as FrontPage or NetObjects Fusion) to fully restore the layout and formatting.

The most pleasant surprise in our testing involved converting an Excel spreadsheet to HTML. We created an Excel 97 spreadsheet and selected Save As HTML from the File menu. At that point, a wizard popped up. We specified a range from the spreadsheet, chose whether to create a standalone HTML document or to insert the spreadsheet into an existing HTML document, and gave the document a header and a title. That was it. A quick look at the sheet with Navigator confirmed that it was completely intact, with its original columns, spacing and layout.

The final document conversion test proved the most difficult: a newsletter that included graphics, custom logos, banners and formatted columnar text.

A viable option for converting complex, highly formatted documents for Web use is to sidestep HTML and use Adobe Acrobat. Acrobat can take any printable file and convert it intact to Acrobat's proprietary PDF format.

The Acrobat PDF Writer captures the print stream from an application and converts it to the PDF format. When working with applications that create more richly formatted documents, such as QuarkXPress, PageMaker, Illustrator or Freehand, you use Acrobat Distiller, which converts PostScript output from these programs to the PDF format. You can view any PDF file with a browser that has the Acrobat Reader plug-in installed.

Another key component of Adobe's software is Acrobat Capture, which accepts scanned documents and OCRs them to create fully formatted, searchable PDF files. Capture converts recognizable text into live characters and retains graphics as a separate layer in bitmaps. This is a great way to get printed documents onto your intranet.

To convert our newsletter to PDF, we selected the Acrobat Distiller Assistant from within the desktop publishing program (in our tests, QuarkXPress) and "printed" the document to a PostScript file. We then opened the PostScript file in Distiller. We also tested scanning the same document using Capture to convert it to PDF. In both cases, Distiller and Capture successfully converted the text to the proper fonts and retained the graphics elements-logos, images and banners-as bitmap images.

Keeping Up Appearances

We converted the same newsletter to HTML using two different HTML design programs, NetObjects Fusion and Microsoft FrontPage. To do this, we first had to individually export each of the newsletter's various elements, including the text, banners, logo and image. To retain the style of some of the original text elements (such as specific fonts, or text that had specified formatting such as kerning, leading and small caps), we also had to use an image-editing program to create bitmaps of those elements.

Of the two HTML design programs we looked at, Fusion is the better choice for laying out richly formatted, complex documents. Unlike most other HTML programs, Fusion uses a desktop publishing paradigm, so you can create and edit objects and place them using drag-and-drop. You can organize elements spatially on a grid, just like in a desktop publishing program, so you can place objects anywhere, using any alignment.

FrontPage requires you to insert objects at the cursor position. You then must use either general alignment options such as "left," "right" or "center," or numerical controls that can be exasperating to enter and reenter as you try to place an object exactly.

The toughest challenge for the HTML editors was recreating the newsletter's layout and spacing. The newsletter was laid out in two columns. In the left-hand column were three boxes-one each for body copy, an image and a caption. A bulleted list ran the full length of the right-hand column. In Fusion, we just drew individual text boxes for the body copy, list and caption text, and drew rectangles for the graphics elements. Then we aligned and placed the elements using the grid, reproducing a variation of the document that closely resembled our original. Finally, we reoriented the page from the vertical view of a printed document to the horizontal aspect of a Web browser.

Repeating the process using FrontPage was much more laborious. Although FrontPage has many other strengths as an HTML design program, Fusion simply outshines it for layout. In FrontPage, we had to construct the page layout using a linear process, inserting each element-one at a time-at the cursor, rather than spatially laying out elements free-form on the page. We also couldn't duplicate the style of our original newsletter. Specifically, we couldn't create a two-column layout and designate spacing and alignment to subdivide the columns into individual elements. FrontPage doesn't quite match Fusion's more advanced abilities for laying out richly formatted documents. However, if you don't need Fusion's more advanced capabilities, you'll want to note FrontPage's price of $149 vs. Fusion's $495.

Another option is to use Microsoft Publisher for more complex layouts. It's an excellent program for quickly creating HTML documents with free-form formatting. Publisher's layout tools aren't as sophisticated as Fusion's, but it's a good low-cost, easy-to-use alternative to FrontPage for a complex document like a newsletter.

Conversion Kit

Even if you tend to buck the latest trends, the corporate intranet is one that's worth your attention.

It offers a low-cost, high-tech way to unite your company's warring platforms with the single file format of HTML, as viewed through standard browsers.

Making the conversion to HTML is often less daunting than it first appears. With many of the tools already on your desktop, you can send to HTML your company's treasure trove of data and documents. Just add a good HTML design program, and try some of the tips we've provided for overcoming the last few nail-biting obstacles, and your company should be well on its way to HTML unity.

Lynn Ginsburg is a freelance writer based in Boulder, Colo. Contact her care of the editor at the e-mail addresses here.

SIDEBAR: 10 Tips for Quick Conversions

1. Portrait-Perfect Pages

Most documents are designed to look their best as vertical pages. And while you want your HTML conversions to be true to the originals, keep in mind that most Web documents are horizontal (or landscape) pages when viewed with a browser. So you may have to make some stylistic tweaks-or sacrifices-when adapting documents to a browser's viewing area.

2. Contemplate Templates

Try to use page layout templates for your new Web pages. This will speed up the process and give your intranet documents a more uniform appearance.

3. One from Column A ...

Many programs that make it easy to convert legacy documents to HTML can only produce single-column pages. For multiple-column pages, you'll need an HTML design program that mimics the layout and design tools of traditional desktop publishing programs-like NetObjects Fusion or Microsoft Publisher.

4. Familiar Fonts

Your original document may have a variety of specialty fonts, but most Web browsers recognize only a few fonts. Stick with simple type styles like Times Roman.

5. Fantastic Fonts

If you must use special fonts-for elements such as logos or headlines-convert the elements to bitmaps and place them on your Web pages as graphics.

6. Screen Swipes

Press Print Screen to grab elements from your original document that won't convert properly. Then paste the captured image into any paint package, crop it and place it into your HTML document as a bitmap.

7. Bitmap Bloat

Use graphics prudently. Images, bitmapped fonts and other graphics elements can add richness to HTML pages, but they can slow their downloading, too.

8. Image Conscious

Universal image formats for Web pages are JPEG and GIF. Images on a page should be converted to JPEGs or GIFs when you save the page in HTML format. Some programs handle the conversion automatically.

9. Put the Squeeze on Pictures

You can compress JPEG images so that they become smaller files. But as JPEGs get smaller, they lose quality. Compress JPEGs as much as necessary to ensure speedy downloads, but be careful not to sacrifice too much of the image's fidelity.

10. Is There an Explorer in the Office?

If your company uses Microsoft Office and Internet Explorer, you may not have to convert some documents to HTML, because IE can display Word and Excel documents in their native formats.

Windows Magazine, June 1997, page 228.

[ Go to June 1997 Table of Contents ]