Move Web site log data to a SQL Server database for easier analysis
Microsoft Internet Information Services (IIS) 5.0 logs Web-server events by default. You can choose to turn logging off or ignore log data, but more likely, you want to collect and analyze the data for insight into how users use your Web site. Like Internet Information Server (IIS) 4.0, IIS 5.0 gives you a choice of log-file formats and lets you log to a file or directly to a database. My preferred method of collecting data is to log to World Wide Web Consortium (W3C) Extended Log File Format files, then import the files' data into a Microsoft SQL Server database for analysis. In the paragraphs that follow, I show you how to set up such a logging process and point out some minor differences between IIS 5.0 and IIS 4.0.
Logging to the W3C Format
To configure logging (which is enabled by default), you use Internet Services Manager (ISM) 5.0, which functions similarly to Internet Service Manager (ISM) 4.0. IIS 5.0 logging operations are also quite similar to IIS 4.0, and IIS 5.0 supports the same log formats as IIS 4.0:
- W3C Extended Log File Format (exyymmdd.log)
- Microsoft IIS Log File Format (inyymmdd.log)
- National Center for Supercomputing
- Applications (NCSA) Common Log
- File Format (ncyymmdd.log)
- ODBC Logging format
The filename format for logs closed on a daily basis is in parentheses after the log format. According to the Microsoft article "W3SVC and IIS Log File Names Are Listed with NCSA Format in HTMLA" (http://support.microsoft.com/support/ kb/articles/q240/0/27.asp), the IIS 5.0 documentation shows an incorrect prefix for the W3C and Microsoft IIS filename formats. However, the names are correct in my system's documentation.
One difference between IIS 5.0 and IIS 4.0 is that IIS 5.0 supports the ODBC logging option only in Windows 2000 Server products, not in Win2K Professional. For more information about IIS 4.0 logging and the four logging formats, see "IIS 4.0 Event Logging," March 2001.
The W3C Extended Log File Format is probably the most-used log format because it lets you log more information than the other formats and because it's flexible (i.e., you can specify which information to log). By default, IIS 5.0 W3C logging uses Universal Time Coordinate (UTC) time, which is a new name but is the same time as IIS 4.0's default Greenwich Mean Time (GMT). Thus, IIS 5.0 uses UTC rather than local time to determine when to create a new log file, and all the times in the log file are based on UTC. You might want to use UTC if you have many servers spread across several states or countries and you want to synchronize them on one time.
However, if all your servers are in one location, you probably prefer to use local time. IIS 5.0 gives you an easy way to specify that you want to use local system time for naming and rolling over log files. Open ISM, click the appropriate server, right-click the appropriate Web site, and select Properties. Click the Web Site tab, ensure that the W3C log format is selected, and click Properties. On the General Properties tab, which Figure 1 shows, select the Use local time for file naming and rollover check box. You can use local system time in IIS 4.0 Service Pack 4 (SP4) or later, but to do so, you must edit the registry.
Figure 2 shows the window that you use to choose which of the W3C format's extended properties to log. This IIS 5.0 window displays the properties in a tree format instead of the fixed list that IIS 4.0 uses. The IIS 5.0 format also displays in parentheses the property names as they appear in log files. This cross-reference is really handy when you need to compare log entries with the properties you chose to log.
IIS 5.0's extended properties are almost the same as IIS 4.0's, with a couple of exceptions. IIS 4.0's HTTP Status property is Protocol Status in IIS 5.0, although the IIS 5.0 documentation shows the property as HTTP Status. The IIS 5.0 Host property is new and contains the server name. If you turn on Process Accounting for an IIS 5.0 Web site, the Web server displays an additional set of W3C log entries.
The W3C "Logging Properties Reference" section of the IIS 5.0 online documentation is cryptic and somewhat confusing. I've reproduced its tables here, fixing a few errors, modifying some text, and changing the column headings to make the tables more readable. Table 1 defines the prefixes for the W3C property names as they appear in the logs. Each prefix identifies the type of action the property relates to. Table 2 describes the W3C properties and corrects the Protocol Status property information. Table 3, page 60, shows the new W3C Process Accounting properties, and Table 4, page 60, describes the values for Process Accounting's Process Event property.
When IIS 5.0 generates a W3C log file, it stamps at the start of the file a header like the one that Figure 3, page 61, shows. The header's first line identifies the server type, the second line identifies the log-file version, the third line is the date and time IIS 5.0 created the log file, and the last line is a space-delimited header line containing all the fields in a file row. Be aware that if you change the fields that are logged, IIS 5.0 stamps a new header in the log file, then continues adding rows consisting of the new fields to the file. If you're importing logs into a database, you'll need to make sure the database accommodates all the new log fields.
Importing Logs into SQL Server
If you want to be able to analyze log data, you need to import the data into a database where you can perform queries against it. If you choose to work directly with the log data that IIS 5.0 generates, several options are available to you. You could use the ODBC log format (instead of the W3C format) to log Web site data directly to SQL Server or another ODBC-compliant database. However, this method writes log entries to the database as events happen, so it can be resource-intensive. I prefer to set up a Data Transformation Services (DTS) package in SQL Server to load W3C log files. The package reads from a text file that contains the log data that IIS 5.0 generates and imports the data into SQL Server. Because the package runs only after IIS 5.0 has closed a log, the update can occur in batch mode and not bog down SQL Server or IIS 5.0 with updates each time IIS 5.0 logs an entry.
To routinely import W3C logs into a SQL Server database by using a DTS package, you need to set up a task within SQL Server that performs several jobs. You can schedule IIS 5.0 to close the log file at a certain time each day and schedule the task to run just after the log file is closed. The task must first copy the closed log file to a set input file or rename the log file to the input file.
Next, the task should strip the header information from the input file, deleting completely the first three header lines and deleting "#Fields:" from the fourth header line. The file now has only one header row, which contains the name of each field. Figure 4, page 61, shows the first two linesthe header line and one record lineof an input file with six fields. Finally, the task should run the DTS package to load the input file data into the database.