PHP and long running processes
It
seems this question keeps coming up on the PHP newsgroups and, now
that I've plugged into Stack Overflow - I keep seeing it their too:
How
I do I start a PHP program which takes a long time to complete and
how do I track its progress?
While
these tend to attract lots of replies, they are usually wrong.
The
first thing to consider is that you need to seperate the the thing
which takes a long time from its initiation, the ongoing monitoring
and whatever final reporting is required.
Since
we're talking about PHP its fair to assume that in most cases, the
initiation will be a PHP script running in a webserver. However this
is not a good place to keep a long-running program.
1)
webservers are all about turning around requests quickly - indeed
most have failsafe mechanisms to prevent one request hanging about
too long.
2)
the webserver ties the request to both the execution of the script
and to the client socket connection. Typically NOT keeping a browser
window open somewhere waiting for the job to complete is an objective
for the exercise. Although the dependence on the client connection
can be reduced via ignore_user_abort() that was never its intended
purpose.
3)
long-running typically means it will have quite different resource
requirements than a typical web page script - e.g. lots of file
handles being opened and closed, more memory being consumed.
Most
commentators come back with the suggestion of spawning a seperate
thread of execution, either using fork or via the shell. The former
obviously does not solve the webserver related issues if the
interpreter is running as a module - you're just going to fork the
webserver process. You've not solved any of the web related issues
and created a whole lot of new ones.
You
need to create a new process certainly.
The
obvious type of process to create would be a standalone PHP
interpreter to process the long running job. So is there a standalone
interpreter available to the webserver? The prospective implementor
would need to check (and whether the webserver runs as chroot). So
lets assume there is, our coder writes:
print
shell_exec('/usr/bin/php -q longThing.php &');
A
brave attempt. However they will soon find that this doesn't behave
as well as they expected and keeps stopping. Why? because all the
process they created runs concurrently with the php which created it,
it is still a child of that process. Now this is where it starts to
get complicated. In our example above, the webserver process finishes
with the users script immediately after it creates the new process -
however it will probably hang around waiting to be assigned a new
request to deal with. However at some point the controller for the
webserver processes will decide to terminate it - either as a matter
of policy because it has dealt with a certain number of requests (for
apache: MaxRequestsPerChild) or because it has too many idle
processes (apache's MinSpareServers). However the webserver process
should not stop until all its child processes have terminated. How
this is dealt with varies by operating system and of course,
webserver. Regardless, the coder has created a situation which should
not have arisen.
But
on a Unix system there are lots of jobs which run independently for
long periods of time. They achieve this by:
1)
they are first started, say as pid 1234, and try to fork, say to pid
1235 after calling fork, pid 1234 exits
2) pid 1235 will become
the daemon - it closes all its open fds including those for stdin,
stdout and stderr
3) pid 1235 now calls setsid(), this dissociates
this process from the tree of processes which led to its creation
(and
typically makes it a child of the 'init' process).
You
can do all this in a PHP script, assuming you've got the posix and
pcntl extensions. However in my experience its usually a lot simpler
to ask an existing daemon to run the script for you:
print
`echo /usr/bin/php -q longThing.php | at now`;
But
how do you get progress information? Simple, just get your long
running script to report its progress to a file or a database, and
use another, web-based script to read the progress / show the final
result.
Troubleshooting (updated Sep 2014)
Following
on from the feedback I've received, there's a couple of things to
check if it doesn't go according to plan.
The atd has its own permissions system implemented via /etc/at.allow and /etc/at.deny - see you man pages for more info.
On Redhat machines, the apache uid is configured with shell /bin/nologin - this will silently discard any jobs submitted to it, hence a more complete solution is:
The atd has its own permissions system implemented via /etc/at.allow and /etc/at.deny - see you man pages for more info.
On Redhat machines, the apache uid is configured with shell /bin/nologin - this will silently discard any jobs submitted to it, hence a more complete solution is:
putenv("SHELL=/bin/bash");
print
`echo /usr/bin/php -q longThing.php | at now 2>&1`;
No comments:
Post a Comment