This article describes the predefined log formats in Angelfish. The formats available by default are:
- Common Log Format (CLF)
- NCSA Combined
- NCSA Combined + Cookie
* If you're not sure which format your log file uses, please copy a single line from your log file and test it with the Custom Log Format utility in Configure - Global - Log Formats.
If you need help, please open a support ticket - our support team can usually figure out the log format after looking at a single log line.
All W3C logs contain a #Fields: header which shows the contents and position of each field in the log. Angelfish uses the #Fields: directive to determine the contents of each field. The W3C format is used by IIS websites and by other vendors, like Akamai.
Example W3C log file header:
#Software: Microsoft Internet Information Server 7.5
#Date: 2013-10-01 00:00:08
#Fields: date time s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs(User-Agent) cs(Referer) sc-status sc-substatus sc-win32-status
Example Log Line:
2008-06-24 00:00:07 192.168.100.66 GET /index.aspx - 80 - 126.96.36.199 Mozilla/5.0+(compatible;+Charlotte/1.1;+http://www.example.com/support/) - 200 0 2
W3C is the preferred log format for Angelfish.
The IIS log format is a logging option on IIS servers, and uses a fixed format that cannot be customized. Unlike the W3C format, the IIS format does not use the #Fields: directive.
Microsoft's Spec for the IIS format:
Common Log Format (CLF)
CLF logs only contain basic HTTP request data and don't contain the referrer or user agent fields, which limits the amount of data Angelfish can provide. This format is infrequently used, although some products like the Google Search Appliance use a modified version of it.
The fields in the Common Log Format are:
host logname username date:time request statuscode bytes
188.8.131.52 - - [20/Oct/2012:15:03:25 -0700] "GET /index.html HTTP/1.1" 200 1402
The remote IP address or hostname of the client that made the request.
The remote logname (from identd, if supplied)
If logged in or otherwise authenticated this will be the username associated with the client making the request.
date:time timezone ([20/Oct/2012:15:03:25 -0700])
Specifically [dd/MMM/yyyy:hh:mm:ss +-hhmm]
request ("GET /index.html HTTP/1.1")
The HTTP request made by the client.
The numeric code indicating the success, failure or redirection of that HTTP request.
The size of the resource being requested by the HTTP request.
This log format includes the same fields as the Common Log Format and adds the referrer and user agent fields to the log line.
The fields in the NCSA Combined log format are:
host rfc931 username date:time request statuscode bytes referrer user_agent
184.108.40.206 - - [20/Oct/2012:15:03:25 -0700] "GET /index.html HTTP/1.1" 200 1402 "http://www.example.com" "Windows NT 6.1; WOW64; rv:16.0) Gecko/20100101 Firefox/16.0"
referrer field: "http://www.example.com"
The URL of the page which linked the client to the site. Angelfish uses this to calculate referral information for a variety of reports.
user_agent field: "(Windows NT 6.1; WOW64; rv:16.0) Gecko/20100101 Firefox/16.0"
This field contains the browser and platform used by the visitor to the site.
NCSA Combined + Cookie
This log format includes the same fields as the Common Log Format and adds the referrer, user agent, and cookie fields. This isn't an officially-sanctioned NCSA log format - we created it for Angelfish.
The fields in this log format are:
host rfc931 username date:time request statuscode bytes referrer user_agent cookie
Example log line:
220.127.116.11 - - [20/Oct/2012:15:03:25 -0700] "GET /index.html HTTP/1.1" 200 1402 "http://www.example.com" "Windows NT 6.1; WOW64; rv:16.0) Gecko/20100101 Firefox/16.0" "WRUID=770623994.1162914482"
cookie field: "WRUID=770623994.1162914482"
This log format is recommended for any datasources (log sources) migrated from Urchin that contain UTM data.