apache2-mpm-peruser [mirror / folk]
Peruser is an Apache 2.x module based on metuxmpm, a working implementation of the perchild MPM. The fundamental concept behind all of them is to run each apache child process as its own user and group, each handling its own set of virtual hosts. Peruser and recent metuxmpm releases can also chroot() apache processes. The result is a sane and secure web server environment for your users, without kludges like PHP’s safe_mode.
Metuxmpm creates one child process per unique user and group, which then spawns threads to handle requests. This requires you to use multithreaded versions of PHP, as well as Perl and Python if you want to use mod_perl and mod_python. Between the three of them, and all the third-party modules and libraries they link to, there can be a lot of non-threadsafe code involved. That can cause nasty crashes that are very hard to reproduce and diagnose.
Currently a non-threaded Apache, along with non-threaded PHP, Perl, and Python is the most stable solution for hosting services. Unfortunately, just removing thread support from metuxmpm leaves you with just one apache child handling requests for one or more virtual hosts. Peruser, provides multiple processes for each unique user/group/chroot. Although it’s working well so far but there is still a lot of room for improvement. Leave your comment if you have questions, suggestions, or patches 🙂
Peruser is distributed as a patch to the Apache source: httpd-2.2.16-peruser-0.4.0rc2.patch
You may browse the complete download directory containing the mirror backups of the original sources and my Debian Apache2 and PHP5 patches for Squeeze and Wheezy under https://flo.sh/4ufiles/webhosting/apache2-mpm-peruser/.
So now that you have the peruser MPM installed, you want to configure it. For this, you might want to get to know how peruser internally works, so you can make the most of it.
To get a minimal copy working, you need to specify the server environments and attach them to the virtualhosts. Note that the server environments must be specified before the virtualhosts. A single server environment can be applied to multiple virtualhosts.
To create a simple server environment, write this to your apache configuration file:
<Processor myserver> User john Group john </Processor>
This creates a server environment named myserver running under the john user and group. To apply this server environment to the virtualhost, add ServerEnvironment directive to your virtualhost:
<Virtualhost *:80> ServerName john.example.com ServerEnvironment myserver </Virtualhost>
So here you go, all requests for john.example.com will be using the myserver server environment and will be run under the john user and group.
Now just remember that all virtualhosts need to have server environment specified. Having a virtualhost without a server environment, will result in all requests for that virtualhost dropped with 503 (Internal Server Error).
Now that you have your server environments set up, you should specify the process limits. Here are your default limits:
ServerLimit 256 MaxClients 256 MinProcessors 0 MaxProcessors 10 MinSpareProcessors 2 MaxSpareProcessors 0 MinMultiplexers 3 MaxMultiplexers 10 MultiplexerIdleTimeout 0 MaxRequestsPerChild 1000 ExpireTimeout 1800 IdleTimeout 900 ProcessorWaitTimeout 5 10
- ServerLimit and MaxClients determine the total limit of all children. You should set them the same.
- MinProcessors and MaxProcessors set the minimum and maximum worker limits for a single server environment.
- MinSpareProcessors and MaxSpareProcessors sets the limits of idle workers in a single server environment.
- MinMultiplexers and MaxMultiplexers set the minimum and maximum multiplexer count.
- MultiplexerIdleTimeout sets the idle timeout (time the multiplexer can be idle, before stopped). 0 = disabled.
- ExpireTimeout sets the maximum time a child can handle a single request (you want to set this high, if you want to allow long downloads). 0 = disable
- IdleTimeout sets the maximum time a child can be idle. 0 = disable
- ProcessorWaitTimeout sets the amount of time the multiplexer waits the processor, if it is busy. First argument sets the time, second argument the amount of levels between not waiting and waiting the maximum time. See Waiting for the processor.
Peruser also allows setting the next directives inside a <Processor> tag, to make them server-environment-specific:
Now that you have your server set up and running, you want to know, how the peruser is handling itself and are the limits specified enough for the server to handle.
To do this, peruser has an in-built server-status hook, which displays the list of all children and statistic information about them. To see this, you need to enable the server-status page and ExtendedStatus:
<Location /server-status> SetHandler server-status </Location> ExtendedStatus On
After enabling this, restart the server and go to the server-status page. Under the request list, you should see the peruser status.
An example of the peruser status is below:
ID PID STATUS SB STATUS Type Processor Pss AVAIL 0 0 STANDBY DEAD PROCESSOR senv1 0/0/30 100% 1 0 STANDBY DEAD PROCESSOR senv2 0/0/30 100% 2 0 STANDBY DEAD PROCESSOR senv3 0/0/30 100% 3 2653 ACTIVE BUSY_WRITE PROCESSOR senv4 5/3/30 100% 4 2654 READY READY MULTIPLEXER Multiplexer 2/2/10 100% 5 2655 READY READY MULTIPLEXER Multiplexer 2/2/10 100% 6 2656 READY READY WORKER senv4 5/3/30 100% 7 2657 READY READY WORKER senv4 5/3/30 100% 8 2658 READY READY WORKER senv4 5/3/30 100% 9 2659 ACTIVE BUSY_WRITE WORKER senv4 5/3/30 100% 10 0 STANDBY DEAD UNKNOWN (null) 0/0/0 0% 11 0 STANDBY DEAD UNKNOWN (null) 0/0/0 0% 12 0 STANDBY DEAD UNKNOWN (null) 0/0/0 0% 13 0 STANDBY DEAD UNKNOWN (null) 0/0/0 0% 14 0 STANDBY DEAD UNKNOWN (null) 0/0/0 0%
- The ID and PID fields should be self-explanatory.
- The status displays the peruser status of the child, it can be one of:
- STANDBY (child is dying or is dead)
- STARTING (child is starting up)
- READY (child is idle)
- ACTIVE (child is handling a request)
- The SB STATUS is the child’s scoreboard status (just for reference).
- Type displays the child type (MULTIPLEXER, PROCESSOR, WORKER or UNKNOWN). See the process types for detailed explanation.
- Processor displays the server environment.
- Pss displays the server environment’s children (alive/idle/max).
- AVAIL displays the server environment’s availability. This should be 100% at all times. If this gets below 100%, then there aren’t enough free children for the server environment and the multiplexers have dropped requests to save the rest of the server from collapsing.
Also, in this list you can see some odd entries with STANDBY status and UNKNOWN child type. These slots mean that these were previously in use, but are currently free, so it’s completely normal to have these.
As of Peruser 0.4.0rc2, it is now possible to see some statistics about each server environment. To see this, use the ?peruser_stats var to access server_status, /server_status?peruser_stats for example.
This displays the list of server environments set in the server, ordered by the most active (which has the most children alive) server environment. In the list you can see the count of handled connections, handled requests and the number of requests that have been dropped because of hitting the child limit.
Under the Hood
Here’s some description how peruser internally works and how requests get handled inside the peruser.
The request cycle
Here’s an example, how a simple request gets all the way to the processing:
- Client connects to HTTP port, sends the request
- Multiplexer receives the connection, reads the request and checks for which virtualhost it is
- Multiplexer checks if a worker is available for the request, then forwards it.
- If no worker is available, it waits for some time (see Waiting for the processor)
- Multiplexer returns to waiting for new connections
- Worker receives the connection and handles the request
- If KeepAlive? is enabled, the worker listens on the connection for more requests:
- If the client happens to send a request for another virtualhost (which the worker cannot handle) within the same connection, it sends the connection back to the multiplexer
- Worker returns to waiting for new connections
The main process never handles any connections from the outside, but only handles the maintenance of the children and spawns new children if required.
This process runs under root privileges as these are required to switch users after forking.
Multiplexers basically listen on the public port (:80) and read the request to determine for which virtualhost it is. After determining the virtualhost, it forwards the request to it’s server environment (worker pool). Multiplexers run under the user/group defined by “User” and “Group” directives.
It does this by firstly determining the virtualhost to which the request is made and then passing the request to the worker pool assigned to that virtualhost.
If the request is made to an SSL-enabled ip/port then the virtualhost determination is skipped and the socket is directly passed to the first virtualhost’s worker pool defined in that ip/port. This also means that the multiplexer does no SSL handshaking – this is all done by the worker.
Processor / worker
Worker is the process where all the requests will be finally executed. Workers run under the user/group defined in the <Processor> tag.
It receives connections from the multiplexer and processes them.
Internally each <Processor> tag creates a PROCESSOR type worker in the child table that never gets cleaned up. This means that if the server limit has been reached, then a single worker can still handle requests for his virtualhosts.
Waiting for the processor
In Peruser 0.3.0 and earlier the multiplexer would send the connection to the worker pool without checking if there are any free workers to receive the connection. This would leave the multiplexer in a blocked state until some worker comes around and receives the connection. If there is some major problem with the virtualhost (eg MySQL server is not responding) then no worker may become available until one of them is killed by the parent if it’s ExpireTimeout is reached – this leaves any new request made to that virtualhost leaving a multiplexer blocked, which will gradually bring the server down to a halt.
In order to fix this problem, the multiplexers now check if there are any free workers in the pool before passing the socket and try to wait for them if there isn’t (or drop the request if none comes available).
If the workers are busy when the multiplexer starts to pass the connection, it will try to wait for them to finish – the maximum time in seconds to wait is calculated by formula: (availability / 100) * ProcessorWaitTimeout where availability is by default 100 and ProcessorWaitTimeout is the directive in the configuration (default is 5).
If the workers are still busy after waiting the maximum time, then it will reduce the availability of the worker pool by 10 and drop the request with error 503 (SERVICE UNAVAILABLE). However if any of the workers come available, the availability is reset to 100 and the request is passed.
DISCLAIMER: The software and patches are provided “as is” and you use the stuff at your own risk. The information above and the original source code is a copy of the deprecated peruser.org (trac) website. Apache2-mpm-peruser was written by Stefan Klingner (Hey! Many thanks for this great code 🙂 ) – I’ve “only” patched the stuff for Debian Squeeze and Wheezy.