Changes to WSGI to Support WebSockets

In my earlier post, I referred to some minor changes that I had to make to the wsgi server to seamlessly support a websocket client.  I think these changes are generic and simple enough that they, or something similar in spirit, should make it into the WSGI standard someday.  Being able to support websockets from within WSGI applications is just too powerful of a use case to ignore.

Get the Socket Out of the Environment

The first change I made was to add a get_socket() method to the file-like object that appears in 'wsgi.input'.  This method returns the underlying socket object directly, which is then used in the WebSocket object to handle message transfer.

One reason that this might not be the most ideal of implementations is buffering.  It happens that eventlet.wsgi is implemented such that when calling the wsgi application, the very next byte to be read off the socket is the first byte of the first websocket messsage.  It seems plausible that other WSGI servers might not have this property.

Cancel Post-Application Processing

The second change is that when eventlet.wsgi sees that the application returned a special flag value ALREADY_HANDLED, it aborts all post-application processing and skips straight to closing the socket.  Normally it would write the headers and return value to the socket, but since the application has already handled that stuff, we don’t want the WSGI server throwing junk on the line (or complaining that the app didn’t call start_response).

The downside of this approach is that most middleware would have to be changed (slightly) to support ALREADY_HANDLED.  Raising a special exception seems to have the same problem.  But then again, changing the behavior of middleware is kinda the point.

So, those are my two changes.  I’m certain that there is a way to do this that is even more seamless with the intent of the WSGI standard, but these changes required three lines of code to implement, and work today.  :)

Nice Words

Ted Dziuba has some nice things to say about Eventlet.  Thanks, Ted!  The rest of his blog is pretty entertaining to read, while you’re over there.  :)

Scalable, WSGI-compatible Websockets

I wanted to mess around with websockets, and create something cool and Eventlet-y for using them.  This was my first experience using websockets, and I found these sites helpful in learning about it.

One of the design patterns that I observed was that people were running their websocket server as a separate process on a separate port, because there are just enough differences between the websocket protocol and HTTP that it’s difficult to get them into a normal web server.  That’s crap; if websockets are going to replace xmlhttprequest, it has to be just as easy to write the server part of web sockets as it is to write a web page.

So here’s a websocket implementation that works inside (a slight modification to) WSGI.  It requires the latest development version of Eventlet (see sidebar to the right for instructions on getting that). It’s very very rough at this point, but it’s powerful enough that it seemed worth sharing this initial version.

Now, here’s the code for a simple echo server:

def handle(ws):
    while True:
        m = ws.wait()
        if m is None:
            break
        ws.send(m)

It shows the basics in a nutshell, which are incredibly simple:

  • call wait() to receive a message from the browser
  • call send() to send a message to the browser
  • wait() returns None when the browser closes the socket
  • return from the function to close the socket from the server side

You’re free to spawn off new greenthreads to do complex stuff with reading from and writing to the ws object concurrently; it will just work.

Here’s how you set up the server:

from eventlet.green import socket
from eventlet import wsgi
listener = socket.socket()
listener.bind(('localhost', 7000))
listener.listen(500)
wsgi.server(listener, WebSocketWSGI(handle, 'http://localhost:7000'))

That’s it! Most of that is socket creation boilerplate; the WebSocketWSGI class handles the work of converting an incoming WSGI request to a websocket call. The “http://localhost:7000/” is the websocket ORIGIN field, which needs to be hardcoded to some degree in order to prevent XSS attacks.

So, that’s how the code looks, but you want to run something right away. In your copy of the eventlet tree, run this command:

PYTHONPATH=. python examples/websocket.py

Now, fire up Google Chrome and navigate to http://localhost:7000.  It should show you a page that looks like this:

Google Chrome showing a graph of random values between 0 and 1It’s a graph that fills in dynamically based on values that it gets from the websocket.  The cool thing about this is that the HTML page and the websocket are both served from the same port, and the same WSGI application.  It simply dispatches to the websocket handler like any other url-path-based dispatcher.  It’s flashy, self-contained, and scalable, and all it took was ~30 lines of code.

Take a look at the code!  Have fun with it!  I’ll make a post shortly about the slight modifications I had to make to eventlet.wsgi to get this working seamlessly.

The Beauty of Eventlet

Here’s a nice little article about using node.js to implement a port forwarder.  The two Python examples he cited were butt-ugly (actually, he linked to one butt-ugly example twice accidentally).  Surely we can make Python look better!

from eventlet.green import socket
import eventlet
def callback():
    print "called back"

def forward(source, dest, cb = lambda: None):
    while True:
        d = source.recv(1024)
        if d == '':
            cb()
            break
        dest.sendall(d)

listener = socket.socket()
listener.setsockopt(socket.SOL_SOCKET,socket.SO_REUSEADDR, 1)
listener.bind(('localhost', 7000))
listener.listen(500)
while True:
    client, addr = listener.accept()
    server = socket.create_connection(('localhost', 22))
    eventlet.spawn_n(forward, client, server, callback)
    eventlet.spawn_n(forward, server, client)

To me that seems a little bit more readable than the node.js version, and it’s a little bit shorter, as well.  I’m particularly happy about how the forward function is used bidirectionally, so there’s no duplication of that logic.  Thanks to Eventlet, it’s just as scalable as node.js; you could connect thousands of clients to this thing.

Multiple Concurrent Connections with py-amqplib and Eventlet

AMQP is a nice and efficient standard for message queueing applications.  This is a pretty cool little sector of applications that has been growing lately; there’s a huge demand for the sort of fire-and-forget application model that MQ-style software supports.  Last year we evaluated a bunch of them.

Anyhow, our favorite Python client is py-amqplib.  It’s small, it’s fast-starting, and its interface is as clean as any AMQP client can be.  One use case I ran into recently was that I needed to consume queues from two separate hosts at the same time.  Both were meant to handle high volumes of traffic, so I wanted it to be as efficient as possible.

Enter Eventlet!  I think this illustrates how convenient it is to use Eventlet for this sort of use case.  There’s about three lines of Eventlet-specific code in this, and it instantly transforms this “non-asynchronous” library into an event-driven, asynchronous, yet still easy-to-use one.  Don’t bother switching to a different library, just use Eventlet with the library you like.  It amazes me every time.

My code, let me show you it.

from eventlet import patcher
amqp = patcher.import_patched('amqplib.client_0_8')
import eventlet

EXCHANGE='example'

def main_callback(msg):
    msg.channel.basic_ack(msg.delivery_tag)
    print "Got message", msg.routing_key, msg.body

def connect_to_host(hostname, keys):
    print "Connecting to ", hostname
    conn = amqp.Connection(HOST_B, userid='guest', password='guest', ssl=False, insist=True)
    ch = conn.channel()
    ch.access_request('/data', active=True, read=True)
    ch.exchange_declare(EXCHANGE, 'topic', auto_delete=True, durable=False)
    qname, _, _ = ch.queue_declare(hostname + '_test_queue', auto_delete=True, durable=False)
    for key in keys:
        ch.queue_bind(qname, EXCHANGE, key)
    ch.basic_consume(qname, callback=main_callback)
    while ch.callbacks:
        ch.wait()
if __name__ == '__main__':
    eventlet.spawn(connect_to_host, 'host_a.example.com', ['key_a'])
    connect_to_host('host_b.lindenlab.com', ['key_b'])

This code was written using the development version of Eventlet, which you can get yourself using the instructions in the sidebar on the right.  If you’re curious about that patcher module, check out the patcher documentation.

Converting the Convenience Functions

[Update: Based on feedback from Sergey and others that this conversion process is too wordy, we decided to reinstate the convenience methods but with a better API.  See it here.]

The original Eventlet API came packed with socket convenience methods that did common socket tasks in a single line.  These were pretty convenient and got a lot of use, but unfortunately had a number of problems that caused us to decide to remove them.  The most basic reason is that they needlessly expanded the Eventlet API and confused people about whether the “true” way was the convenience method, or the socket module.

Now they’re deprecated, and we’re in the process of migrating applications and tests onto the real APIs.  Here’s some quick transition help.  (Note that I only tested these by verifying that they didn’t give interpreter errors when run; I believe that they work as intended, but running that level of verification would have meant this post wouldn’t have gotten written.)

tcp_listener

This convenience function creates a listening socket bound to a specified address.  You can replace code that looks like this:

  from eventlet import api
  sock = api.tcp_listener(('localhost', 9010))

with this:

  from eventlet.green import socket
  sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
  sock.setsockopt(socket.SOL_SOCKET,socket.SO_REUSEADDR, 1)
  sock.bind(('localhost', 9010))
  sock.listen(50)

Yes, it’s longer, and it’s uglier, but it does have the advantage of being perfectly congruent with the standard Python socket library.

connect_tcp

The most common use case ever, connecting to a server, is accomplished with this method.  Here’s the convenience code:

  from eventlet import api
  sock = api.connect_tcp(('eventlet.net', 80))

If you’re running Python 2.6, you’re in luck, because you can use a standard function that’s just as convenient:

  from eventlet.green import socket
  sock = socket.create_connection(('eventlet.net', 80))

If you’re using Python 2.5 or earlier, it’s a little less convenient, but not too bad:

  from eventlet.green import socket
  sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
  sock.connect(('eventlet.net', 80))

ssl_listener

This one’s a bit of a doozy, because one of the conveniences provided by this function is that the caller doesn’t have to know whether they’re using PyOpenSSL or 2.6′s ssl module.  As we deprecate it, the caller now has to know which of those two options they’re using.  Here’s the old code:

  from eventlet import api
  sock = api.ssl_listener(('localhost', 443), '/tmp/server.crt',
                          '/tmp/server.key')

And the new, using Python 2.6:

  from eventlet.green import socket, ssl
  sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
  sock.setsockopt(socket.SOL_SOCKET,socket.SO_REUSEADDR, 1)
  sock.bind(('localhost', 80443))
  sock.listen(50)
  sock = ssl.wrap_socket(sock,
    keyfile='/tmp/server.key',
    certfile='/tmp/server.crt',
    server_side=True)

Using PyOpenSSL (hoo boy, is this library wordy):

  from eventlet.green import socket
  from eventlet.green.OpenSSL import SSL
  sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
  sock.setsockopt(socket.SOL_SOCKET,socket.SO_REUSEADDR, 1)
  sock.bind(('localhost', 80443))
  sock.listen(50)
  context = SSL.Context(SSL.SSLv23_METHOD)
  context.use_certificate_file('/tmp/server.crt')
  context.use_privatekey_file('/tmp/server.key')
  context.set_verify(SSL.VERIFY_NONE, lambda *x: True)
  sock = SSL.Connection(context, sock)
  sock.set_accept_state()

That’s all of the convenience methods, deconstructed!  Hopefully you can see how not having the equivalent of all this logic in Eventlet makes it easier to maintain and support.

Dependency Graph Fun

Check it out!  I was thinking about circular dependencies, and Eventlet, and how I don’t really want the two of them to be mentioned in the same sentence ever again.  I found a great little set of scripts to generate module dependency graphs on the internet, and applied them to Eventlet.  Here’s the result (click to get actually-readable full size):

Module dependency graph of Eventlet 0.9.4

I’m happy to say that nearly all of the circular-looking dependencies are between modules that are already slated for removal and/or refactoring, such as api, proc, coros, pool, and util.  This graph is going to get even cleaner when it hits 1.0.

Compare this to the dependency graph of Eventlet 0.8.11:

Module dependency graph of Python 0.8.11Whoa!  That api module is getting ridiculous amounts of attention and circularity.  Glad we’ve moved on.

Just as an implementation note if you’re tempted to try this on your own project, I had trouble getting the py2depgraph script to run over the unit tests, so I wrote a little file called import_all.py that did nothing but import every module in eventlet, and ran the following command:

PYTHONPATH=. python2.5 py2depgraph.py import_all.py | \
  python2.5 depgraph2dot.py | \
  dot -T png -o depgraph.png

It didn’t work with Python 2.6 for me for some reason. This is my lazyweb request: someone make a dependency graph generator that is more robust. Maybe it could become a part of the ohloh code analysis system.

Roadmap to 1.0

You can’t get to where you’re going without a roadmap! The #eventlet-dwellers spent some time coming up with a fairly detailed roadmap that ends with a 1.0 release.

I want to put this out here as a starting point — I expect to prune some stuff or add some stuff in as we go along. I’m interested in feedback from those of you developing using Eventlet — anything that’s burning you up to have in there, or anything that’ll be useless to you and therefore you’d prefer dropped? I’m particularly curious about the twisted and stackless support.

0.9.5

  • support psycopg in db_pool
  • smart patcher that does the right patching when importing without needing to understand plumbing of patched module
  • patcher.monkey_patch() method replacing util.wrap_*
  • monkeypatch threading support
  • find a new home for api.named
  • move api_tests into greenthread_tests and hub_tests, or wherever’s appropriate
  • import timeout class from gevent, replace exc_after and with_timeout()
  • replace call_after with spawn_after; this is so that users don’t see the Timer class; use kill() instead of cancel() (simplification observed in gevent)
  • eventlet.green.os with patched read() and write() etc
  • move crap from wrap_pipes_with_coroutine_pipe into green.os
  • eventlet.green.subprocess instead of eventlet.processes
  • improve patching docs, explaining more about patcher and why you’d use eventlet.green
  • better documentation on greenpiles
  • docs for (moved) api.named
  • deprecate api.py completely
  • deprecate util.py completely (possibly, leave it as a “utility module” containing named and, uh, some other stuff)
  • deprecate saranwrap
  • performance improvements

0.9.6

  • stability release
  • more example apps: proxy, and feed parser
  • convert eventlet’s usage of deprecated apis
  • convert tests’ usage of deprecated apis

1.0

  • either maintain or remove support.stacklesss
  • either maintain or remove twistedr
  • either maintain or remove twistedutils
  • extended Pool semantics that handle expiration and parameterized
  • refactored db_pool module that emulates db-api 2.0 modules
  • nonblocking stdin/stdout (if possible)
  • better docs for backdoor, explaining what it’s for
  • continued performance improvements
  • remove deprecated stuff

The “performance improvements” are deliberately vague, because a lot of little things add up to give a performance boost.  For example, Eventlet 0.9.3 contained some subtle refactorings of the greenio module and event loops that resulted in a modest overall performance gain (about 10% in my sample app).  It’s a continual process!

0.9.4 out

A stability release, as promised! I also ended up deprecating coros.Queue and coros.Channel, which is a little outside the purview of a stability release, but it is wafer-thin, so it’s ok, right?

Here’s the changelog:

  • Deprecated coros.Queue and coros.Channel (use queue.Queue instead) * Added putting and getting methods to queue.Queue.
  • Added eventlet.green.Queue which is a greened clone of stdlib Queue, along with running the stdlib tests.
  • Changed __init__.py so that the version number is readable even if greenlet’s not installed. * Bugfixes in wsgi, greenpool

0.9.4 plans

I’m planning on releasing a stability release, 0.9.4, this Thursday. Please help find bugs before then! That would be lovely.

As a note, I normally would remove deprecated stuff in 0.9.5, but I think that there are enough changes in 0.9.3 that I’ll hold off on removal until 0.9.7 or 1.0, whichever comes first. So that should give everyone (including me) some more time to migrate. :-)

Follow

Get every new post delivered to your Inbox.