We utilise a variety of custom socket servers to support our applications. Most of our apps have at least one. We write RPC servers to interact with repository storage in Deploy and Codebase, the new Deploy Agent has a socket server for users to connect to, AppMail runs it's own SMTP server.
Restarting these services poses a problem. You can't start a new version, then kill the old one as the new version will be unable to bind to the socket. You can't kill the old service and start a new one, as you'll have downtime while the new service starts. You can tell your clients to retry, however this only works if you control all of the clients.
These problems lead to only restarting these services when absolutely necessary. Usually manually. If someone updates a server they must remember to restart the service after deploying it. This is a sure way to have outdated code running in production.
In a perfect world, restarts would be seamless. The old service goes away and the new one immediately starts serving requests.
Making the world more perfect
To demonstrate how we accomplished our super-slick restarting services, we'll create a simple service that writes "Hello World!" to a TCP Socket and disconnects. Nothing fancy, no concurrency. Connect, write, close.
# super_simple_service.rb
require 'socket'
class SuperSimpleService
attr_reader :bind_address, :bind_port
def initialize
@bind_address = 'localhost'
@bind_port = 12345
end
def run
@socket_server = TCPServer.new(bind_address, bind_port)
loop do
client_socket = @socket_server.accept # blocks until a new connection is made
client_socket.puts "Hello World!"
client_socket.close
end
end
end
SuperSimpleService.new.run
Controlling the restart
We've established that we need something smarter than simply stopping and starting our service. To address this we're going to hand over control of our restarts to the service itself.
Instead of sending our service a TERM
signal to stop it, we're going to send it a USR1
signal. USR1
is a user defined signal, with no fixed meaning. We're going to catch it and use it to restart our server. For more information on signals, Tim Uruski has a great blog post on catching signals in Ruby.
Restarting will involve spawning a new copy of the service in a fork. The new fork will then kill the old version once it's taken over the socket connection.
# super_simple_service.rb
class SuperSimpleService
# ...
def run
@socket_server = TCPServer.new(bind_address, bind_port)
kill_parent if ENV['RESTARTED']
setup_signal_traps
# ...
end
def setup_signal_traps
trap('USR1') { hot_restart }
end
def hot_restart
fork do
# :close_others ensures that open file descriptors are inherited by the new process
exec("RESTARTED=true ruby super_simple_service.rb", close_others: false)
end
end
def kill_parent
parent_process_id = Process.ppid
Process.kill('TERM', parent_process_id)
end
end
A new copy of the service will be spawned whenever USR1
is received. This new service is passed a RESTARTED
flag in it's environment variables. The new service upon seeing this flag sends a TERM
to the it's parent (the old copy of the service).
To prevent the old server from exiting immediately when receiving TERM
and dropping any existing connections, a graceful shutdown is implemented. This allows any active connections to complete before exiting.
class SuperSimpleService
def run
# ...
loop do
client_socket = @socket_server.accept # blocks until a new connection is made
begin
@connection_active = true # keeps track of if we have an active connection
client_socket.puts "Hello World!"
client_socket.close
ensure
@connection_active = false
end
end
end
def setup_signal_traps
# ...
trap('TERM') { graceful_shutdown }
end
def graceful_shutdown
@socket_server.close # Stop listening for new connections
sleep 0.1 while @connection_active # Wait for active connection to complete
Process.exit(0)
end
end
The service now keeps a flag in connection_active
, which indicates if a connection is currently being processed. On TERM
the service will now stop accepting new connections and wait for any existing connection to complete before exiting cleanly.
Sharing sockets
We've got our restarting process down. Unfortunately, when we send USR1
to a running service we'll get the following error:
➜ pkill -USR1 -f super_simple_service.rb
super_simple_service.rb:13:in `initialize': Address already in use - bind(2) for "localhost" port 12345 (Errno::EADDRINUSE)
from super_simple_service.rb:13:in `new'
from super_simple_service.rb:13:in `run'
from super_simple_service.rb:54:in `<main>'
This error is caused when the new version of the service attempts to bind to the port. The port is still in use by the original version, so the binding fails.
We require a way to rebind to an already open socket. Fortunately, in the world of POSIX, every open file gets a numeric ID assigned to it, a file descriptor. Open sockets are treated as files, therefore they are also given a file descriptor. You can find the file descriptor of any IO object in Ruby by calling #fileno
on it.
File.open('/tmp/my_file.txt').fileno
# => 19
Conveniently, Ruby's BasicSocket class includes a .for_fd
method, which opens a socket based on a passed file descriptor. The built in TCPServer
and UNIXServer
classes both inherit from BasicSocket
and so support binding to a descriptor out-of-the box.
Passing the file descriptor to the newly spawned service will allow it to rebind to the existing port. We already pass the RESTARTED
flag to the new server in the environment. Instead of this, we can pass the file descriptor to signify a restart, and bind to this. We also need to set a couple of options on the socket to prevent it from being closed when we start the new server.
class SuperSimpleService
# ...
def run
if ENV['SOCKET_FD']
@socket_server = TCPServer.for_fd(ENV['SOCKET_FD'].to_i)
kill_parent
else
@socket_server = TCPServer.new(bind_address, bind_port)
end
@socket_server.autoclose = false
@socket_server.close_on_exec = false
# ...
end
def hot_restart
fork do
# :close_others ensures that open file descriptors are inherited by the new process
exec("SOCKET_FD=#{@socket_server.fileno} ruby super_simple_service.rb", close_others: false)
end
end
end
Voila! Your server is complete, you can see it in all of it's finished glory here. However, it's grown from 20 lines to over 60, that's a lot of extra code. If only someone could make this into a nice, easy to use gem.
Enter Uninterruptible
Uninterruptible is a gem that takes all of the pain out of the hot-restart process. It manages all of the signals, ports and file descriptors. It supports both UNIX and TCP sockets for maximum flexibility. To implement our SuperSimpleService
with Uninterruptible is simple:
class SuperSimpleService
include Uninterruptible::Server
def handle_request(client_socket)
client_socket.puts("Hello World!")
end
end
server = SuperSimpleService.new
server.configure do |config|
config.bind = "tcp://localhost:12345"
config.start_command = 'ruby super_simple_service.rb'
end
server.run
Uninterruptible is currently running many of the socket servers across our applications and is production ready. If you've got any questions, give us a shout, contact details are below.