Resque

Introduction

Resque (pronounced like "rescue") is a Redis-backed library for creating background jobs, placing those jobs on multiple queues, and processing them later.

For the backstory, philosophy, and history of Resque's beginnings, please see the blog post (2009).

Background jobs can be any Ruby class or module that responds to perform. Your existing classes can easily be converted to background jobs or you can create new classes specifically to do work. Or, you can do both.

Resque is heavily inspired by DelayedJob (which rocks) and comprises three parts:

A Ruby library for creating, querying, and processing jobs

A Rake task for starting a worker which processes jobs

A Sinatra app for monitoring queues, jobs, and workers.

Resque workers can be given multiple queues (a "queue list"), distributed between multiple machines, run anywhere with network access to the Redis server, support priorities, are resilient to memory bloat / "leaks," tell you what they're doing, and expect failure.

Resque queues are persistent; support constant time, atomic push and pop (thanks to Redis); provide visibility into their contents; and store jobs as simple JSON packages.

The Resque frontend tells you what workers are doing, what workers are not doing, what queues you're using, what's in those queues, provides general usage stats, and helps you track failures.

Resque now supports Ruby 2.3.0 and above. We will also only be supporting Redis 3.0 and above going forward.

Note on the future of Resque

Would you like to be involved in Resque? Do you have thoughts about what Resque should be and do going forward? There's currently an open discussion here on just that topic, so please feel free to join in. We'd love to hear your thoughts and/or have people volunteer to be a part of the project!

Example

Resque jobs are Ruby classes (or modules) which respond to the perform method. Here's an example:

``` ruby
class Archive
@queue = :file_serve
def self.perform(repo_id, branch = 'master')
repo = Repository.find(repo_id)
repo.create_archive(branch)
end
end
```

The @queue class instance variable determines which queue Archive jobs will be placed in. Queues are arbitrary and created on the fly - you can name them whatever you want and have as many as you want.

To place an Archive job on the file_serve queue, we might add this to our application's pre-existing Repository class:

``` ruby
class Repository
def async_create_archive(branch)
Resque.enqueue(Archive, self.id, branch)
end
end
```

Now when we call repo.async_create_archive('masterbrew') in our application, a job will be created and placed on the file_serve queue.

Later, a worker will run something like this code to process the job:

``` ruby
klass, args = Resque.reserve(:file_serve)
klass.perform(*args) if klass.respond_to? :perform
```

Which translates to:

``` ruby
Archive.perform(44, 'masterbrew')
```

Let's start a worker to run file_serve jobs:

``` sh
$ cd app_root
$ QUEUE=file_serve rake resque:work
```

This starts one Resque worker and tells it to work off the file_serve queue. As soon as it's ready it'll try to run the Resque.reserve code snippet above and process jobs until it can't find any more, at which point it will sleep for a small period and repeatedly poll the queue for more jobs.

Installation

Add the gem to your Gemfile:

``` sh
gem 'resque'
```

Next, install it with Bundler:

``` sh
$ bundle
```

Rack

In your Rakefile, or some other file in lib/tasks (ex: lib/tasks/resque.rake ), load the resque rake tasks:

``` ruby
require 'resque'
require 'resque/tasks'
require 'your/app' # Include this line if you want your workers to have access to your application
```

Rails

To make resque specific changes, you can override the resque:setup job in lib/tasks (ex: lib/tasks/resque.rake ). GitHub's setup task looks like this:

``` ruby
task "resque:setup" => :environment do
Grit::Git.git_timeout = 10.minutes
end
```

We don't want the git_timeout as high as 10 minutes in our web app, but in the Resque workers it's fine.

Running Workers

Resque workers are rake tasks that run forever. They basically do this:

``` ruby
start
loop do
if job = reserve
job.process
else
sleep 5 # Polling frequency = 5
end
end
shutdown
```

Starting a worker is simple:

``` sh
$ QUEUE=* rake resque:work
```

Or, you can start multiple workers:

``` sh
$ COUNT=2 QUEUE=* rake resque:workers
```

This will spawn two Resque workers, each in its own process. Hitting ctrl-c should be sufficient to stop them all.

Priorities and Queue Lists

Resque doesn't support numeric priorities but instead uses the order of queues you give it. We call this list of queues the "queue list."

Let's say we add a warm_cache queue in addition to our file_serve queue. We'd now start a worker like so:

``` sh
$ QUEUES=file_serve,warm_cache rake resque:work
```

When the worker looks for new jobs, it will first check file_serve. If it finds a job, it'll process it then check file_serve again. It will keep checking file_serve until no more jobs are available. At that point, it will check warm_cache. If it finds a job it'll process it then check file_serve (repeating the whole process).

In this way you can prioritize certain queues. At GitHub we start our workers with something like this:

``` sh
$ QUEUES=critical,archive,high,low rake resque:work
```

Notice the archive queue - it is specialized and in our future architecture will only be run from a single machine.

At that point we'll start workers on our generalized background machines with this command:

``` sh
$ QUEUES=critical,high,low rake resque:work
```

And workers on our specialized archive machine with this command:

``` sh
$ QUEUE=archive rake resque:work
```

Running All Queues

If you want your workers to work off of every queue, including new queues created on the fly, you can use a splat:

``` sh
$ QUEUE=* rake resque:work
```

Queues will be processed in alphabetical order.

Or, prioritize some queues above * :

``` sh
# QUEUE=critical,* rake resque:work
```

Running All Queues Except for Some

If you want your workers to work off of all queues except for some, you can use negation:

``` sh
$ QUEUE=*,!low rake resque:work
```

Negated globs also work. The following will instruct workers to work off of all queues except those beginning with file_ :

``` sh
$ QUEUE=*,!file_* rake resque:work
```

Note that the order in which negated queues are specified does not matter, so `QUEUE=,!file_ and QUEUE=!file_,` will have the same effect.

Process IDs (PIDs)

There are scenarios where it's helpful to record the PID of a resque worker process. Use the PIDFILE option for easy access to the PID:

``` sh
$ PIDFILE=./resque.pid QUEUE=file_serve rake resque:work
```

Running in the background

There are scenarios where it's helpful for the resque worker to run itself in the background (usually in combination with PIDFILE). Use the BACKGROUND option so that rake will return as soon as the worker is started.

``` sh
$ PIDFILE=./resque.pid BACKGROUND=yes QUEUE=file_serve rake resque:work
```

Polling frequency

You can pass an INTERVAL option which is a float representing the polling frequency. The default is 5 seconds, but for a semi-active app you may want to use a smaller value.

``` sh
$ INTERVAL=0.1 QUEUE=file_serve rake resque:work
```

The Front End

Resque comes with a Sinatra-based front end for seeing what's up with your queue.

Standalone

If you've installed Resque as a gem running the front end standalone is easy:

``` sh
$ resque-web
```

It's a thin layer around rackup so it's configurable as well:

``` sh
$ resque-web -p 8282
```

If you have a Resque config file you want evaluated just pass it to the script as the final argument:

``` sh
$ resque-web -p 8282 rails_root/config/initializers/resque.rb
```

You can also set the namespace directly using resque-web :

``` sh
$ resque-web -p 8282 -N myapp
```

or set the Redis connection string if you need to do something like select a different database:

``` sh
$ resque-web -p 8282 -r localhost:6379:2
```

Passenger

Using Passenger? Resque ships with a config.ru you can use. See Phusion's guide:

Apache: https://www.phusionpassenger.com/library/deploy/apache/deploy/ruby/ Nginx: https://www.phusionpassenger.com/library/deploy/nginx/deploy/ruby/

Rack::URLMap

If you want to load Resque on a subpath, possibly alongside other apps, it's easy to do with Rack's URLMap :

``` ruby
require 'resque/server'
run Rack::URLMap.new \
"/" => Your::App.new,
"/resque" => Resque::Server.new
```

Check examples/demo/config.ru for a functional example (including HTTP basic auth).

Rails

You can also mount Resque on a subpath in your existing Rails app by adding require 'resque/server' to the top of your routes file or in an initializer then adding this to routes.rb :

``` ruby
mount Resque::Server.new, :at => "/resque"
```

Jobs

What should you run in the background? Anything that takes any time at all. Slow INSERT statements, disk manipulating, data processing, etc.

At GitHub we use Resque to process the following types of jobs:

Warming caches

Counting disk usage

Building tarballs

Building Rubygems

Firing off web hooks

Creating events in the db and pre-caching them

Building graphs

Deleting users

Updating our search index

As of writing we have about 35 different types of background jobs.

Keep in mind that you don't need a web app to use Resque - we just mention "foreground" and "background" because they make conceptual sense. You could easily be spidering sites and sticking data which needs to be crunched later into a queue.

Persistence

Jobs are persisted to queues as JSON objects. Let's take our Archive example from above. We'll run the following code to create a job:

``` ruby
repo = Repository.find(44)
repo.async_create_archive('masterbrew')
```

The following JSON will be stored in the file_serve queue:

``` js
{
'class': 'Archive',
'args': [ 44, 'masterbrew' ]
}
```

Because of this your jobs must only accept arguments that can be JSON encoded.

So instead of doing this:

``` ruby
Resque.enqueue(Archive, self, branch)
```

do this:

``` ruby
Resque.enqueue(Archive, self.id, branch)
```

This is why our above example (and all the examples in examples/ ) uses object IDs instead of passing around the objects.

While this is less convenient than just sticking a marshaled object in the database, it gives you a slight advantage: your jobs will be run against the most recent version of an object because they need to pull from the DB or cache.

If your jobs were run against marshaled objects, they could potentially be operating on a stale record with out-of-date information.

send_later / async

Want something like DelayedJob's send_later or the ability to use instance methods instead of just methods for jobs? See the examples/ directory for goodies.

We plan to provide first class async support in a future release.

Failure

If a job raises an exception, it is logged and handed off to the Resque::Failure module. Failures are logged either locally in Redis or using some different backend. To see exceptions while developing, see details below under Logging.

For example, Resque ships with Airbrake support. To configure it, put the following into an initialisation file or into your rake job:

``` ruby
# send errors which occur in background jobs to redis and airbrake
require 'resque/failure/multiple'
require 'resque/failure/redis'
require 'resque/failure/airbrake'
Resque::Failure::Multiple.classes = [Resque::Failure::Redis, Resque::Failure::Airbrake]
Resque::Failure.backend = Resque::Failure::Multiple
```

Keep this in mind when writing your jobs: you may want to throw exceptions you would not normally throw in order to assist debugging.

Rails example

If you are using ActiveJob here's how your job definition will look:

``` ruby
class ArchiveJob < ApplicationJob
queue_as :file_serve
def perform(repo_id, branch = 'master')
repo = Repository.find(repo_id)
repo.create_archive(branch)
end
end
```

``` ruby
class Repository
def async_create_archive(branch)
ArchiveJob.perform_later(self.id, branch)
end
end
```

It is important to run ArchiveJob.perform_later(self.id, branch) rather than Resque.enqueue(Archive, self.id, branch). Otherwise Resque will process the job without actually doing anything. Even if you put an obviously buggy line like 0/0 in the perform method, the job will still succeed.

Configuration

Redis

You may want to change the Redis host and port Resque connects to, or set various other options at startup.

Resque has a redis setter which can be given a string or a Redis object. This means if you're already using Redis in your app, Resque can re-use the existing connection.

String: Resque.redis = 'localhost:6379'

Redis: Resque.redis = $redis

For our rails app we have a config/initializers/resque.rb file where we load config/resque.yml by hand and set the Redis information appropriately.

Here's our config/resque.yml :

``` sh
development: localhost:6379
test: localhost:6379
staging: redis1.se.github.com:6379
fi: localhost:6379
production: <%= ENV['REDIS_URL'] %>
```

And our initializer:

``` ruby
rails_root = ENV['RAILS_ROOT'] || File.dirname(__FILE__) + '/../..'
rails_env = ENV['RAILS_ENV'] || 'development'
config_file = rails_root + '/config/resque.yml'
resque_config = YAML::load(ERB.new(IO.read(config_file)).result)
Resque.redis = resque_config[rails_env]
```

Easy peasy! Why not just use RAILS_ROOT and RAILS_ENV ? Because this way we can tell our Sinatra app about the config file:

``` sh
$ RAILS_ENV=production resque-web rails_root/config/initializers/resque.rb
```

Now everyone is on the same page.

Also, you could disable jobs queueing by setting 'inline' attribute. For example, if you want to run all jobs in the same process for cucumber, try:

``` ruby
Resque.inline = ENV['RAILS_ENV'] == "cucumber"
```

Logging

Workers support basic logging to STDOUT.

You can control the logging threshold using Resque.logger.level :

``` ruby
# config/initializers/resque.rb
Resque.logger.level = Logger::DEBUG
```

If you want Resque to log to a file, in Rails do:

``` ruby
# config/initializers/resque.rb
Resque.logger = Logger.new(Rails.root.join('log', "#{Rails.env}_resque.log"))
```

Namespaces

If you're running multiple, separate instances of Resque you may want to namespace the keyspaces so they do not overlap. This is not unlike the approach taken by many memcached clients.

This feature is provided by the redis-namespace