This article contains a list of recommendations for optimizing the performance of your Angelfish instance. These recommendations are especially relevant for large scale / high traffic environments.
If you have a question about anything on this page, please open a support ticket.
At a basic level, Angelfish is a database application that benefits from better hardware resources. You can install Angelfish on older hardware and the application will run adequately, but it may be slow.
- Disk I/O - Typically, the first bottleneck you will encounter is disk i/o. We recommend storing Angelfish's application files AND data directory on local storage. Angelfish installations that use network-attached storage are known to have performance issues. Our tests have also shown that i/o contention is almost eliminated with SSD storage, which allows you to take advantage of advanced settings like max_threads.
- CPU - We recommend a minimum of 4 CPU cores for your Angelfish instance, and larger environments with lots of report data / Profiles / Users will benefit from 8+ cores. Angelfish uses 2 CPU cores per active processing job, and each individual API request uses 1 or more cores (see max_threads, below). You'll also see better performance with faster clock speeds and larger CPU caches.
- RAM - Active data processing jobs and API requests use RAM, and you want to have enough RAM so your OS doesn't use the page file. We recommend a minimum of 4 GB (8 GB is better), increasing as your environment dictates.
Reduce Page Cardinality
Most processing jobs and API requests (a.k.a. report requests) use pageview data, and Angelfish doesn't enforce any db row limits. This means the more unique pages your profile has, the more time processing jobs and API requests will take to complete.
Here are some tips for reducing page cardinality:
- Limit Pageview Query Parameters - Most Query Parameters are useful for the web server but are irrelevant from a reporting perspective. If you need Query Parameters in your reports, we recommend using the Include option and only adding necessary query parameters.
- Reduce Pageview File Types - If you use a log-based tracking method (SID, USR, IPUA or IP), you must specify the file types that will be counted as pageviews. This is configured in the "Pageview File Types" field in the Settings tab - we recommend using the Include option with a short list, as it's easier to manage and will automatically skip over obscure file types. This is especially important if you've migrated profile data from Urchin - Urchin's default "Pageview Mimes Match" setting is woefully outdated, and the outdated setting is migrated into Angelfish.
- Strip Unique Strings from Pages - Some web servers stuff a session ID in the URL, which greatly increases Page cardinality. You can use an Advanced filter to strip these strings from your Pages.
- Enable "Ignore Inflated Visits" - This feature is enabled via a checkbox in the Settings tab, and is intended to automatically exclude visits & pageviews from robots / crawlers / scanners. The threshold is configurable (default = 100 pageviews), and this feature alleviates the need to manually maintain a list of IPs & user agents and exclude them with a filter.
Edit agf.conf settings
The agf.conf file is located in the root Angelfish directory and contains a bunch of config options for the Angelfish instance. The agf.conf settings are loaded when Angelfish starts so you'll need to restart Angelfish for any changes to take effect.
If you make any changes to agf.conf, we recommend testing the changes after restarting Angelfish - you don't want to overwhelm your own server!
- cache_size - This variable affects the amount of memory that will be used for each API thread. The default setting (50000) equates to 800 MB, which is adequate for most environments. Increasing this number will speed up API requests, but your OS may start paging if the number is too high. We typically set cache_size to 10% of available memory (e.g. if server has 8 GB of RAM, cache_size=50000), assuming no other applications are on the server.
- max_threads - API requests are split into X threads of equal size, where X depends on the max_threads variable. Each thread will run on a separate CPU core (if available). The default value is 1, and we only recommend increasing this value if your Angelfish instance has enough CPU cores to absorb additional threads AND if disk i/o is not a bottleneck.
- max_log_processors - This variable affects the number of profiles that Angelfish will process simultaneously (default value is 1). Angelfish spawns 2 processes for each processing job, and each process will use a separate CPU core (if available). Most environments can safely increase the max_log_processors value to 2. We've found that systems can become i/o bound in the 3-4 range, although this depends on your hardware resources.
Disable IT Reports
This makes sense when you have a bunch of profiles that read the same logs (e.g. a profile for each department) and it's not necessary to have IT Reports data in each profile. You can disable the Hit Info / Downloads / Stolen Bandwidth reports by clicking the "Disable" option for each in the Settings tab.
Use a Raw Filter
Raw filters are applied to each hit, before any visit logic is applied. You can apply a Raw filter to IP addresses, user agents, cookie values, or any field in your log file. Internal crawlers & monitoring agents (e.g. Google Search Appliances) are good candidates for Raw - Exclude Filters.