| Your hosting account
comes with HTTP-Analyze preinstalled and configured.
HTTP-Analyze is a log analyzer
for web servers. It analyzes the logfile of a web server and
creates a comprehensive summary report from the information
found there. http-analyze has been optimized to process large
logfiles as fast as possible.
In easier-to-understand terms,
HTTP-Analyze is a very powerful traffic analyzer that quickly
and efficiently delivers you statistics on the traffic that your
web pages have generated. It has a user-friendly graphical user
interface (GUI) that by a click of your mouse button will
produce your traffic reports.
Below we explain in more detail
how this powerful software works with your web site, as well as
provide you with definitions to the results you'll receive.
The web server is a program
running on a networked machine, waiting for connections from the
outside world to serve certain documents on behalf of a request
by a browser.
To communicate, the server and
the browser use an asynchronous communication method called the
HTTP
(hypertext transaction) protocol. It works as follows:
- The user starts the browser
and types in an URL
- The browser connects to the
given host and requests the specified document.
- The web server handles the
request and sends out a response:
a.if this document exists,
the web server delivers it
b.if it does not exist or if
access is not permitted, the web server sends back an error
message instead.
- The document delivered as an
answer to this request may contain inline objects. Inline
objects are simply URLs pointing to another resource, either
a document, an image, an applet, a video/audio stream, or
any other addressable HTML object.
- The browser then requests all
inline objects of the current page from the server using the
steps 2 and 3 above, before it can display the content of
that page.
This communication method is
called asynchronous, because the browser sends out many requests
for inline documents at once (without waiting for a response
from the server before sending the next request) using different
communication channels:
Since the browser's requests are
often handled by different server processes or different threads
of a server process, there is absolutely no relationship between
the logfile entries caused by the responses from the server due
to a request of a document and it's inline objects. For example,
the order in which the server logs the successful transmission
of the document itself and the inline images contained therein
is not predictable and depends on the type of documents,
objects, server speed, system and network load, and many other
parameters.
Q: What is logged?
A: Each and every response from the server - whether it
indicates success, an error, or even a timeout (i.e. no
response) - gets logged in the server's logfile. Since the
server was hit by a request, such a response is called a Hit. In
other words, the total number of hits must equal the total
number of lines in the logfile minus the number of corrupt and
empty lines. A typical logfile entry in the Common Logfile
Format looks like:
hostname-[01/Feb/1998:10:10:00
+0100]"GET/index.html HTTP/1.0"200 4839.
The hostname field contains the
full qualified domain name (FQDN) of the site accessing your
server. The next two fields usually contain a minus (`-')
to indicate that those fields are empty. The date is surrounded
by square brackets ('[' and ']'). The next field contains
the request. It contains the request method ('GET' for example),
the name of the requested document (URL), and the protocol
specification ('HTTP/1.0').
The following field contains the
servers response code ('200' stands for an 'OK', while '404'
would mean 'Document not found', for example). The last field
contains the size of the document (some servers log the number
of bytes transferred actually, while other servers log the size
of the document, which makes a difference if the user interrupts
the transfer before the document could be transmitted
completely.
There are two other logfile
formats, the Combined or Extended Logfile Format. Those formats
add the user-agent (browser type) and the referrer URL (the
page, which contains a link to the requested document if this
request for such document has been generated by following a
link) to the logfile entry. Those Combined or Extended Logfile
Format append following two fields to the Common Logfile Format
(CLF) in one of two usual ways:
- CLF Mozilla/2.0 (X11; IRIX
6.3; IP22) http://foo/bar.html
- CLF "http://foo/bar.html"
"Mozilla/2.0 (X11; IRIX 6.3; IP22)"
Note that in the second form, the
user-agent and the referrer URL are surrounded by double quotes,
which makes them ambiguous in certain cases such as erroneous
referrer URLs, which contain double quotes. Therefore, the first
form should be preferred if possible.
The entries shown above are the
only information the server records in the logfile. There might
be much more information being transferred from the browser to
the server, but although this additional information is
available through CGI-scripts running on your server, it gets
not logged in the logfile. Therefore, http-analyze can only show
you a summary of the information in the logfile.
Definition of Terms
The statistics report contains
among others the following information:
- The number of hits, 304's,
files, pageviews, sessions, data sent (in KB)
- The amount of data requested,
transferred, and saved by cache (in KB)
- The number of unique URLs,
sites, and sessions per month
- The number of all response
codes other than 200 (OK)
- The average hits per weekday
and for last week
- The maximum/average hits per
day and per hour
- The number of hits, files,
304's, sites, data sent by day
- The top 5 days, 24 hours, 5
minutes and 5 seconds of the summary period
- The top 30 most commonly
accessed URLs (hits, 304's, data sent)
- The 10 least frequently
accessed URLs (hits, 304's, data sent)
- The top 30 client domains
accessing your server most often
- The top 30 browser types
- The top 30 referrer hosts
- The overview/detailed list of
all files requested
- The overview/detailed list of
all sites by domain and reverse domain
- The overview/detailed list of
all browser types
- The overview/detailed list of
all referrer URLs
The following table summarizes
the meaning of all terms in the statistics report which are not
self-explanatory:
| Term |
Meaning |
| Hits |
A hit is any
response from the server on behalf of a request sent
from a browser. This includes any response from the
server, not only text files or documents. If, for
example, a HTML page has two images embedded, the server
generates three hits if this page is requested: one hit
for the HTML page itself and two hits for the two inline
images. |
| Files |
If the user requests a
document and the server successfully sends back a file
for this request, this is counted as a Code 200 (OK)
response. Any such response is counted for as a file.
Again, "file" here means any kind of a file. |
| Code 304 |
A Code 304 (NotModified)
response is generated by the server if a document hasn't
been updated since the last time it was requested by the
user and therefore there was no need to actually send
the files for this document. This happens if the browser
(or a caching proxy server between the browser and your
web server) still has an up-to-date copy of the page in
it's local storage (cache) and therefore can display the
page without requesting the actual content. This
technique is used to reduce network traffic, but it also
causes an inaccuracy in the statistics reports regarding
the number of visitors, because the browser or proxy
usually sends only one such a conditional request per
user session if it still holds an up-to-date copy of the
file. However, the ratio between files and 304's
reflects the efficiency of overall caching mechanisms
for at least those hits which made it's way to the
server. |
| Pageviews |
Pageviews are all files
which either have a text file suffix (.html, .text) or
which are directory index files. This number allows to
estimate the number of "real" documents
transmitted by your server. If defined correctly, the
analyzer rates text files (documents) as pageviews.
Those pageviews do not include images, CGI scripts, Java
applets or any other HTML objects except all files
ending with one of the pre-defined pageview suffixes,
such as .html or .text. |
| Other
responses |
There are much more
responses than only Code 200 (OK) and Code 304
(Not Modified) responses, especially in the coming
standard, the HTTP 1.1 protocol specification. For
example, the server could generate a Code 302
(Redirected) response if a page has moved, a Code
401 (Unauthorized Request) response if access to the
document is denied or a Code 404 (Not Found)
response if the requested page does not exist on this
server. |
| KBytes
transferred |
This is the amount of data
sent during the whole summary period as reported by the
server. Note that some servers log the size of a
document instead of the actual number of bytes
transferred. While in most cases this is the same, if a
user interrupts the transmission by pressing the
browser's stop button before the page has been received
completely, some servers (for example all Netscape web
servers) do not log the amount of data transferred but
the amount of data which would have been transferred if
the user would have completely loaded the page. |
| KBytes
requested |
This is the amount of data
requested during the whole summary period. http-analyze
computes this number by summing up the values of KBytes
transferred and KBytes saved by cache (see
below). |
| KBytes saved
by cache |
The amount of data saved
by various caching mechanisms such as in proxy servers
or in browsers. This value is computed by multiplying
the number of Code 304 (Not Modified) requests
per file with the size of the corresponding file. Note:
Because http-analyze can determine the size of a file
only if the file has been requested at least once in the
same summary period, the values for KBytes saved by
cache and KBytes requested are just
approximations of the real values. |
| Unique URLs |
Unique URLs
are the number of all different,
valid URLs requested in a given summary period. This
shows you the number of all different files requested at
least once in the corresponding summary period. |
| Unique sites |
This is the sum of all
unique hosts accessing the server during a given
time-window . The time-window is hardwired to the length
of the current month. This means that if a host accesses
your server very often, it gets counted only once during
the whole month. Only the sum of the unique hosts per
month is listed in the statistics report. |
| Sessions |
Similar to unique sites,
this is the number of unique hosts accessing the server
during a given time-window. This time-window is one day
by default for backward compatibility, but it can be
changed with the option -u or the Session directive in
the configuration file. For example, if the time-window
is two hours, all accesses from a certain host in less
than 2 hours after the first access from this host are
lumped together into one session. All following accesses
more than 2 hours apart from the first access will be
counted as a new session. This way you may get an
estimated number of how many sessions are started on
different sites to access your server. |
|