This is a read-only archive. Find the latest Linux articles, documentation, and answers at the new Linux.com!

Linux.com

Feature

A parent's guide to Linux Web filtering

By Joe Bolin on July 01, 2004 (8:00:00 AM)

Share    Print    Comments   

Having converted quite a few people to the world of GNU/Linux, I am often asked by parents, "Can I set up parental Web filters for my children using Linux?" The answer is yes, and here's how.

A Web filter is a software that can filter the type of content a Web browser displays. The filter checks the content of a Web page against a set of rules and replaces any unwanted content with an alternative Web page, usually an "Access Denied" page. The type of content to be filtered is usually controlled by a systems administrator or a parent. Web filters are used in schools, libraries, and homes to safeguard children from obscene content on the Internet.

Before you begin, you should be familiar with some basic networking concepts:

  • A server, as in "Web server," is nothing more than an application that runs on a computer and listens for incoming requests. It sends back, or serves, information to the source that requested the information. This information can be anything from Web pages to databases. Each server communicates through the use of an IP address and a port number.
  • Ports are logical addresses that applications on a computer use in a way similar to how we use phone numbers. Each server program must have a unique port that it uses for communications.
  • Every computer connected to the Internet has both an external IP (Internet Protocol) address, usually assigned by an Internet service provider, and an internal address of 127.0.0.1. The internal address allows the computer to "listen" and "talk" to itself and is referred to as the loopback address. Normally a server is set up to accept requests from other computers on the Internet by listening on its external address. Since this can present a security risk for our single computer, we will use the loopback address instead. This will cause our server to only listen for requests from the computer that the server resides on.
  • A firewall is an application that controls the types of communication your computer can send and receive. GNU/Linux has an excellent firewall called netfilter/iptables, or simply iptables, built right into the kernel, which we will make use of to redirect users' Web surfing through our Web filter.

Getting the software

The only software you need to set up parental filters under GNU/Linux is iptables, DansGuardian, and Squid.

DansGuardian is the actual filtering software. It supports phrase matching, which allow you to block out Web sites that contain certain phrases or words; PICS filtering, which blocks content that's been labeled as possibly objectionable material by the creator of the Web site; URL filtering, to block content from specific sites that are known to contain offensive material; and blacklists, or lists of sites that contain content you want to block. Blacklists usually come from third parties, though you can create and maintain your own.

Squid is a Web proxy server that acts as a middleman between your computer and the Internet. You need a proxy server because DansGuardian isn't able to fetch Web pages by itself. We'll configure Squid as a transparent proxy, meaning we'll hijack network traffic and redirect it to a new destination -- our filter program, in this case -- without the need for the user to know that it is happening.

Most modern distribution have packaged versions of Squid and DansGuardian available. If yours doesn't then you will need to install them from source code. Both the Squid and DansGuardian Web sites have complete instructions for how to compile and install the programs from source.

Iptables is the firewall management tool used with the 2.4.x and higher kernels. Most modern distributions provide iptables. If yours doesn't, you will need to compile a new kernel and enable iptables, which is beyond the scope of this article (and probably beyond the abilities of most parents). You'd probably be better off upgrading to a newer Linux distribution.

Configuring Squid

The default location for the Squid configuration file on most systems is /etc/squid/squid.conf. While most of the default settings for Squid are all right for our usage, you will need to edit the configuration file just a bit.

You will need to become the root user in order to make the changes and issue the commands shown in this article. You can do this by either logging in as root or with the su command.

Add or edit the following line to have Squid listen only on the loopback device on port 3128. This will cause Squid to act only as a proxy server for this computer and assigns it a specific port number to listen on:

http_port 127.0.0.1:3128

To configure Squid as a transparent proxy, add the following lines to squid.conf:

httpd_accel_host virtual
httpd_accel_port 80
httpd_accel_with_proxy on
httpd_accel_uses_host_header on

Your system should have created a user and a group named squid when you installed Squid. If it didn't, you should create them yourself by using the following two commands from the command line:

groupadd -r squid
useradd -g squid -d /var/spool/squid -s /bin/false -r squid

Since Squid is normally started by the system and run as root, you need to add the next two lines to /etc/squid/squid.conf in order to make Squid run with squid's user and group IDs:

cache_effective_user squid
cache_effective_group squid

We will later use this to identify Squid to our firewall. Then we will allow the user squid to access the Internet while we redirect all other Web traffic through our filter.

Configuring DansGuardian

Our next step is to configure DansGuardian. The default location, on most systems, for the configuration files is /etc/dansguardian/dansguardian.conf. Once again, most of the default values are fine, but we need to make a few changes.

First, add or edit the following line to make the filter use HTML templates, which are static Web pages that our filter will use to display the "Access Denied" page instead of the inappropriate sites. Using HTML templates keeps us from having to set up a Web server to display the "Access Denied" information.

reportinglevel = 3

Next, add or edit the following lines to make DansGuardian listen on the loopback address and port 8080:

filterip = 127.0.0.1
filterport = 8080

Add or edit the following line to tell DansGuardian which address and port that Squid is listening on. This enables our filter to fetch the requested Web content through the proxy.

proxyip = 127.0.0.1
proxyport = 3128

Again, to keep your filter from running as root you need to change the user that it will run as. For simplicity, we will reuse the user and group that we previously set up for Squid. Add or edit the following to make DansGuardian run with UID and GID of squid:

daemonuser = 'squid'
daemongroup = 'squid'

While DansGuardian provides an excellent filter all by itself, you may want to exercise further control over the Web filtering by editing the other files in the /etc/dansguardian directory that contain external blacklists. Blacklists from squidGuard and URLBlacklist work perfectly with DansGuardian. Each file contains a brief explanation for its contents to make configuration easier.

Putting it in action

Once you have Squid and DansGuardian set up, the final step is to implement a transparent proxy using iptables. Use the following commands at the command line to add rules to the firewall to allow the user squid to access both the Internet and the Squid proxy we set up.

iptables -t nat -A OUTPUT -p tcp --dport 80 -m owner --uid-owner squid -j ACCEPT

iptables -t nat -A OUTPUT -p tcp --dport 3128 -m owner --uid-owner squid -j ACCEPT

If you want a user to be exempt from filtering -- a parent, for example -- issue the following command. Replace EXEMPT_USER with the username that you wish to exempt from filtering:

iptables -t nat -A OUTPUT -p tcp --dport 80 -m owner --uid-owner EXEMPT_USER -j ACCEPT

The next command redirects Internet traffic from all users, other than squid and any exempt users, to the filter on port 8080:

iptables -t nat -A OUTPUT -p tcp --dport 80 -j REDIRECT --to-ports 8080

Since we have a proxy server set up, a user could configure a Web browser to bypass the filter and access the proxy directly. The Squid proxy is listening for requests from the computer, and it doesn't care which user sends the request. We could set up our firewall to deny all access to the proxy except from our filter, but let's be a little sneakier. Let's set it up so that direct requests to the Squid proxy server, except from our filter, get redirected through the filter. To do this, use the following command:

iptables -t nat -A OUTPUT -p tcp --dport 3128 -j REDIRECT --to-ports 8080

Some systems, such as MandrakeLinux, utilize an application called Shorewall to manage firewall rules. For these systems, place the above firewall rules in /etc/shorewall/start, to use the filtering when Shorewall starts, and in /etc/shorewall/stop, to make them stick if you should stop Shorewall for some reason. To implement the new rules simply restart Shorewall using the following command:

service shorewall restart

For systems using Shorewall, your firewall rules are set. For all other systems, you'll need to perform the next two steps in order to get the new firewall rules started at boot time. Issue the following command to save your firewall rules:

iptables-save > /etc/sysconfig/iptables

Now issue the following to make sure iptables is started at boot time and to start the iptables firewall:

chkconfig iptables on
service iptables restart

You may also need to make sure that DansGuardian and Squid get started at boot by using the following two commands:

chkconfig squid on
chkconfig dansguardian on

To get the filtering started, you can now enter the following commands:

service squid restart
service dansguardian restart

Access Denied Thumbnail
The "Access Denied" screen - click to enlarge

Now when users enter a forbidden Web address they will be presented with an "Access Denied" page instead of the offending site. You can customize the look of the "Access Denied" page by editing the template.html file in the appropriate language section located in /etc/dansguardian/languages.

Final thoughts

While the setup discussed in this article is intended for use on a single computer, this method of Web filtering can be applied to a wide range of scenarios. These tools can be easily and successfully implemented on a small home network, a large business infrastructure, or any environment that needs to comply with the Children's Internet Protection Act.

Bear in mind that Web filtering software of any kind is not 100% failsafe, nor is it a substitute for parental supervision. Along with installing filtering software, educate yourself and your children about the Internet.

Share    Print    Comments   

Comments

on A parent's guide to Linux Web filtering

Note: Comments are owned by the poster. We are not responsible for their content.

Re:Further info

Posted by: Anonymous Coward on June 19, 2006 08:41 PM
Sorry, possibly a silly question...
Does this mean I will have to recomplie the kernel re-install etc?
Or is this something I can switch on?

#

Re:Further info

Posted by: Anonymous Coward on June 19, 2006 10:28 PM
Depends if that feature is enabled in the default kernel! If it is not then you will have to recompile. AFAIK most distros have it enabled by default. I encountered the problem because I was using a kernel I had compiled myself, and had not enabled that feature.

#

Re:Followed proceedure and nothing works

Posted by: Anonymous Coward on July 09, 2006 07:03 AM
I had the same problem setting up DG on a dedicated firewall/filter box. DG would hang on startup and spit the error about connecting to parent proxy. The parent proxy is, of course, squid (if you're using squid). Squid is listening (by default) on port 3128. It makes sense that if DG cannot get a tcp connection with squid on port 3128, it will spit this error and quit.

Some ideas as to why this would happen:

1. Squid hasn't started.

#ps aux | grep squid

2. Either squid or DG is misconfigured (DG is talking to or squid is listening to some other port).

#nano<nobr> <wbr></nobr>/etc/dansguardian/dansguardian.conf
#nano<nobr> <wbr></nobr>/etc/squid/squid.conf

3. Your firewall is not allowing the connection.

In my case, the problem was that my iptables firewall was not allowing the connection. When I made a rule allowing this particular connection from 127.0.0.1 to 127.0.0.1 with --dport 3128, DG started up happily.

#

Re:Followed proceedure and nothing works

Posted by: Anonymous Coward on September 23, 2006 06:32 PM
check your init. files. it is a matter of which should run first.. squid or dansguardian, ofcourse squid should run first so check the initrd.
to isolate the problem.. from the shell run squid then the dansguardian.

#

Further info

Posted by: Administrator on September 02, 2004 05:19 AM
Nice article. To do transparent proxy on a single machine (as discussed here) needs IP_NF_NAT_LOCAL enabled in the kernel. In some distributions this is not enabled by default. It allows rerouting by nat on packets that originated in the box that iptables is being run on. Without this, iptables only operates on packets that are forwarded through the machine (i.e. acting as a router).

The rules I added to the shorewall start file were:iptables -t nat -A OUTPUT -p tcp --dport 80 -m owner ! --uid-owner squid -j REDIRECT --to-ports 8080
iptables -t nat -A OUTPUT -p tcp --dport 3182 -m owner ! --uid-owner squid -j REDIRECT --to-ports 8080

James

#

Re:Further info

Posted by: Administrator on February 03, 2006 03:49 PM
<a href="http://www.s2ii.com/blog/index.php/?2005/12/23/50-filtrage-de-contenu-web-avec-squid-et-dansguardian" title="s2ii.com">http://www.s2ii.com/blog/index.php/?2005/12/23/50<nobr>-<wbr></nobr> filtrage-de-contenu-web-avec-squid-et-dansguardia<nobr>n<wbr></nobr> </a s2ii.com>

This link can be usefull for debian based distribution

#

Followed proceedure and nothing works

Posted by: Administrator on June 06, 2006 01:23 PM
I am using Kanotix 2.6.17 KDE 3.5.2 on an AMD 1100 512MB box. I am trying to make Dansguardian run ontop of squid, but when I try to start DG I get Restarting DansGuardian: Error connecting to parent proxy. Now I have read that if squid is setup right without any filter ontop of it, if you point your browser to the reroute (127.0.0.1:3128) you should get through and a log of what is going on will appear in access.log. Well I setup squid as directed and setup the iptables as directed and no access to internet and nothing shows up in access.log. here is both what I did in konsole and squid and DG. There has to be an answer out there somewhere, I am a newbie to Linux, but I have visited dozens of sites, they all say basically the same thing, but I must be missing something or have something else configured wrong, because although squid is loaded, it is not doing anything. Is their anyone who knows something that can help. This is the first time I have come across anything that no one seems to know the answer.
# iptables -F
# iptables -X
# iptables -t nat -F
# iptables -t nat -X
# iptables -t mangle -F
# iptables -t mangle -X
# iptables -t nat -A OUTPUT -p tcp --dport 80 -m owner --uid-owner squid -j ACCEPT
# iptables -t nat -A OUTPUT -p tcp --dport 3128 -m owner --uid-owner squid -j ACCEPT
# iptables -t nat -A OUTPUT -p tcp --dport 80 -j REDIRECT --to-ports 8080
# iptables -t nat -A OUTPUT -p tcp --dport 3128 -j REDIRECT --to-ports 8080
# iptables-save ><nobr> <wbr></nobr>/etc/sysconfig/iptables
#<nobr> <wbr></nobr>/etc/init.d/squid restart
Restarting Squid HTTP proxy: squid.
#<nobr> <wbr></nobr>/etc/init.d/dansguardian restart
Restarting DansGuardian: Error connecting to parent proxy


squid.conf uncomented lines

http_port 3128
udp_incoming_address 192.168.7.151
udp_outgoing_address 255.255.255.255
hierarchy_stoplist cgi-bin ?
acl QUERY urlpath_regex cgi-bin \?
no_cache deny QUERY
cache_mem 32 MB
maximum_object_size 8192 KB
cache_dir ufs<nobr> <wbr></nobr>/var/spool/squid 100 16 256
cache_access_log<nobr> <wbr></nobr>/var/log/squid/access.log
cache_log<nobr> <wbr></nobr>/var/log/squid/cache.log
emulate_httpd_log off
log_ip_on_direct on
client_netmask 255.255.255.0
hosts_file<nobr> <wbr></nobr>/etc/hosts
refresh_pattern ^ftp: 1440 20% 10080
refresh_pattern ^gopher: 1440 0% 1440
refresh_pattern . 0 20% 4320
acl all src 127.0.0.1/255.255.255.255
acl manager proto cache_object
acl localhost src 0.0.0.0/0.0.0.0
acl to_localhost dst 127.0.0.1/32
acl purge method PURGE
acl CONNECT method CONNECT
http_access allow manager localhost
http_access deny manager
http_access allow purge localhost
http_access deny purge
http_access deny !Safe_ports
http_access deny CONNECT !SSL_ports
http_access deny to_localhost
acl lan src 192.168.7.0/24
http_access allow lan
http_access allow localhost
http_access deny all
http_reply_access allow all
icp_access allow all
miss_access allow all
visible_hostname 'hostname'
unique_hostname 'hostname'
httpd_accel_host virtual
httpd_accel_port 80
httpd_accel_single_host on
httpd_accel_with_proxy on
httpd_accel_uses_host_header on
coredump_dir<nobr> <wbr></nobr>/var/spool/squid
cache_effective_group squid
cache_effective_user squid


dancguardian.conf uncomented lines

reportinglevel = 3
languagedir = '/etc/dansguardian/languages'
language = 'ukenglish'
loglevel = 1
logexceptionhits = on
logfileformat = 1
loglocation = '/var/log/dansguardian/access.log'
filterip =127.0.0.1
filterport = 8080
proxyip = 127.0.0.1
proxyport = 3128
accessdeniedaddress = 'http://www.cbc.ca/'
nonstandarddelimiter = on
usecustombannedimage = 1
custombannedimagefile = '/etc/dansguardian/transparent1x1.gif'
filtergroups = 1
filtergroupslist = '/etc/dansguardian/filtergroupslist'
bannediplist = '/etc/dansguardian/bannediplist'
exceptioniplist = '/etc/dansguardian/exceptioniplist'
banneduserlist = '/etc/dansguardian/banneduserlist'
exceptionuserlist = '/etc/dansguardian/exceptionuserlist'
showweightedfound = on
weightedphrasemode = 2
urlcachenumber = 2000
urlcacheage = 900
phrasefiltermode = 2
preservecase = 0
hexdecodecontent = 0
forcequicksearch = 0
reverseaddresslookups = off
reverseclientiplookups = off
createlistcachefiles = on
maxuploadsize = -1
maxcontentfiltersize = 256
usernameidmethodproxyauth = on
usernameidmethodident = off
preemptivebanning = on
forwardedfor = off
usexforwardedfor = off
logconnectionhandlingerrors = on
maxchildren = 120
minchildren = 8
minsparechildren = 4
preforkchildren = 6
maxsparechildren = 32
maxagechildren = 500
ipcfilename = '/tmp/.dguardianipc'
urlipcfilename = '/tmp/.dguardianurlipc'
nodaemon = off
nologger = off
daemonuser = squid
daemongroup = squid
softrestart = off
virusscan = on
virusengine = 'clamav'
tricklelength = 32768
firsttrickledelay = 30
followingtrickledelay = 60
exceptionvirusmimetypelist = '/etc/dansguardian/exceptionvirusmimetypelist'
maxcontentscansize = 262144
exceptionvirusextensionlist = '/etc/dansguardian/exceptionvirusextensionlist'
downloaddir = '/tmp/dgvirus'
virusscanexceptions = on
urlcachecleanonly = on
virusscannertimeout = 60
localsocket = '/tmp/clamd'
clmaxfiles = 1500
clmaxreclevel = 3
clmaxfilesize = 10485760
clmaxratio = 250

#

transparent filtering plus monitoring

Posted by: Anonymous [ip: 192.168.2.2] on March 08, 2008 10:31 PM
I have put together a description on how to set up transparent filtering (like descibed in the above article) plus a monitoring service which periodically emails a list of web addresses that were visited. Read it on <a href="http://www.zephyrsoft.net/filter">http://www.zephyrsoft.net/filter</a>.

#

This story has been archived. Comments can no longer be posted.



 
Tableless layout Validate XHTML 1.0 Strict Validate CSS Powered by Xaraya