Squid could not be used in an ISP environment without a sophisticated access control system. Indeed, Squid should not be used in ANY environment without some kind of basic authentication system. It is amazing how fast other Internet users will find out that they can relay requests through your cache, and then proceed to do so.
Why? Sometimes to obfusticate their real identity, and other times since they have a fast line to you, but a slow line to the remainder of the Internet.
In many cases only the most basic level of access control is needed. If you have a small network, and do not wish to use things like user/password authentication or blocking by destination domain, you may find that this small section is sufficient for all your access control setup. If not, you should read chapter 6, where access control is discussed in detail.
The simplest way of restricting access is to only allow IPs that are on your network. If you wish to implement different access control, it's suggested that you put this in place later, after Squid is running. In the meantime, set it up, but only allow access from your PC's IP address.
Example access control entries are included in the default squid.conf. The included entries should help you avoid some of the more obscure problems, such as bandwidth-chewing loops, cache tunneling with SSL CONNECTs and other strange access problems. In chapter 6 we work through the config file's default config options, since some of them are pretty complex.
Access control is done on a per-protocol basis: when Squid accepts an HTTP request, the list of HTTP controls is checked. Similarly, when an ICP request is accepted, the ICP list is checked before a reply is sent.
Assume that you have a list of IP addresses that are to have access to your cache. If you want them to be able to access your cache with both HTTP and ICP, you would have to enter the list of IP addresses twice: you would have lines something like this:
Example 4-2. Theoretical Access List
http_access deny 10.0.1.0/255.255.255.0 http_access allow 10.0.0.0/255.0.0.0 icp_access allow 10.0.0.0/255.0.0.0
Rule sets like the above are great for small organisations: they are straight forward.
For large organizations, though, things are more convenient if you can create classes of users. You can then allow or deny classes of users in more complex relationships. Let's look at an example like this, where we duplicate the above example with classes of users:
Example 4-3. Access Lists using Classes
# classes acl mynetwork src 10.0.0.0/255.0.0.0 acl servernet src 10.0.1.0/255.255.255.0 # what HTTP access to allow classes http_access deny servernet http_access allow mynet # what ICP access to allow classes icp_access deny servernet icp_access allow mynet
Sure, it's more complex for this example. The benefits only become apparent if you have large access lists, or when you want to integrate refresh-times (which control how long objects are kept) and the sources of incoming requests. I am getting quite far ahead of myself, though, so let's skip back.
We need some terminology to discuss access control lists, otherwise this could become a rather long chapter. So: lines beginning with acl are (appropriately, I believe) acl lines. The lines that use these acls (such as http_access and icp_access in the above example) are called acl-operators. An acl-operator can either allow or deny a request.
So, to recap: acls are used to define classes. When Squid accepts a request it checks the list of acl-operators specific to the type of request: an HTTP request causes the http_access lines to be checked; an ICP request checks the icp_access lists.
Acl-operators are checked in the order that they occur in the file (ie from top to bottom). The frst acl-operator line that matches causes Squid to drop out of the acl list. Squid will not check through all acl-operators if the first denies the request.
In the previous example, we used a src acl: this checks that the source of the request is within the given IP range. The src acl-type accepts IP address lists in many formats, though we used the subnet/netmask in the earlier example. CIDR (Classless Internet Domain Routing) notation can also be used here. Here is an example of the same address range in either notation:
Example 4-4. CIDR vs Netmask Source-IP Notation
acl mynet1 src 10.1.0.0/255.0.0.0 acl mynet2 src 10.2.0.0/16
Access control lists inherit permissions when there is no matching acl If all acl-operators in the file are checked, and no match is found, the last acl-operator checked determines whether the request is allowed or denied. This can be confusing, so it's normally a good idea to place a final "catch-all" acl-operator at the end of the list. The simplest way to create such an operator is to create an acl that matches any IP address. This is done with a src acl with a netmask of all 0's. When the netmask arithmetic is done, Squid will find that any IP matches this acl.
Your cache server may well be on the network placed in the relevant allow lists on your cache, and if you were thus to run the client on the cache machine (as opposed to another machine somewhere on your network) the above acl and http_access rules would allow you to test the cache. In many cases, however, a program running on the cache server will end up connecting to (and from) the address '127.0.0.1' (also known as localhost). Your cache should thus allow requests to come from the address 127.0.0.1/255.255.255.255. In the below example we don't allow icp requests from the localhost address, since there is no reason to run two caches on the same machine.
The squid.conf file that comes with Squid includes acls that deny all HTTP requests. To use your cache, you need to explicitly allow incoming requests from the appropriate range. The squid.conf file includes text that reads:
# INSERT YOUR OWN RULE(S) HERE TO ALLOW ACCESS FROM YOUR CLIENTS
To allow your client machines access, you need to add rules similar to the below in this space. The default access-control rules stop people exploiting your cache, it's best to leave them in.
Example 4-5. Example Complete ACL list
# # INSERT YOUR OWN RULE(S) HERE TO ALLOW ACCESS FROM YOUR CLIENTS # # acls for my network addresses acl my-iplist-1 src 192.168.1.0/24 acl my-iplist-2 src 10.0.0.0/255.255.0.0 # Check that requests are from users on our network http_access allow my-iplist-1 http_access allow my-iplist-2 icp_access allow my-iplist-1 icp_access allow my-iplist-2 # allow requests from the local machine (for testing and the like) http_access allow localhost # End of locally-inserted rules http_access deny all
Acl-operator lines are not only used for authentication. In an earlier section we discussed communication with other cache servers. Acl lines are used to ensure that requests for specific URLs are handled by your cache, not passed on to another (further away) cache.
If you don't have a parent cache (a firewall, or you have a parent ISP cache) you can probably skip this section.
Let's assume that you connect to your ISP's cache server as a parent. A client machine (on your local network) connects to your cache and requests http://www.yourdomain.example/. Your cache server will look in the local cache store. If the page is not there, Squid wil will connect to it's configured parent (your ISP's cache: across your serial link), and request the page from there. The problem, though, is that there is no need to connect across your internet line: the web server is sitting a few feet from your cache in the machine room.
Squid cannot know that it's being very inefficient unless you give it a list of sites that are "near by". This is not the only way around this problem though: your browser could be configure to ignore the cache for certain IPs and domains, and the request will never reach the cache in the first place. Browser config is covered in Chapter 5, but in the meantime here is some info on how to configure Squid to communicate directly with internal machines.
The acl-operators always_direct and never_direct determine whether to pass the connection to a parent or to proceed directly.
The following is a set of operators are based on the final configuration created in the previous section, but using never_direct and always_direct operators. It is assumed that all servers that you wish to connect to directly are in the address ranges specified in with the my-iplist directives. In some cases you may run a web server on the same machine as the cache server, and the localhost acl is thus also considered local.
The always_direct and never_direct tags are covered in more detail in Chapter 7, where we cover hierarchies in detail.
Example 4-6. Using always and never_direct
# acls for my network addresses acl my-iplist-1 src 192.168.1.0/24 acl my-iplist-2 src 10.0.0.0/255.255.0.0 # Various programs running on the cache box connect to Squid, so it's # useful to allow connections from the localhost address. acl localhost src 127.0.0.1/255.255.255.255 # used to deny all requests: Since the netmask is all 0's, any request # matches this acl acl all src 0.0.0.0/0.0.0.0 # Check that requests are from users on our network http_access allow my-iplist-1 http_access allow my-iplist-2 icp_access allow my-iplist-1 icp_access allow my-iplist-2 # check the localhost acl as a special case http_access allow localhost # If the requests comes from any other IP, deny all access. http_access deny all # always go direct to local machines always_direct allow my-iplist-1 always_direct allow my-iplist-2 # never go direct to other hosts never_direct allow all
Squid always attempts to cache pages. If you have a large Intranet system, it's a waste of cache store disk space to cache your Intranet. Controlling which URLs and IP ranges not to cache are covered in detail in chapter 6, using the no_cache acl operator.