2012-06-08

High-performance tuning for Ruby (REE) garbage collector

This post is going to be short :) For those people who use REE (Ruby Enterprise Edition) - it might be very useful.

Default REE garbage collector settings are not really cool. Add these environment variables:

RUBY_GC_MALLOC_LIMIT=50000000
RUBY_HEAP_MIN_SLOTS=500000
RUBY_HEAP_SLOTS_GROWTH_FACTOR=1
RUBY_HEAP_SLOTS_INCREMENT=250000
(these values are mentioned in documentation as a Twitter's example).

Result is very cool: having 5000-7000 requests per minute:
1. Load Average went down from 6-7 to 2.5, which is really cool
2. Average Response Time went down from 100ms to 70ms
3. CPU usage reduced twice

But memory usage went up (with memory leaks). To protect your web application from that - gently restart your workers every X requests. We use Passenger, so here is a config example:


PassengerHighPerformance on
PassengerPoolIdleTime 0
PassengerMaxRequests 2000
PassengerStatThrottleRate 2
RailsFrameworkSpawnerIdleTime 600
RailsAppSpawnerIdleTime 600
PassengerUseGlobalQueue on
PassengerMaxPoolSize 24

So, each passenger worker is restarted on every 2000nd request. Memory graph for 30mins is displayed on image below:




With 1 small change 1 server can handle ~30% more traffic. We've even removed few servers from the cloud :)

2012-03-13

Slow number helpers? - Write your own for performance boost

Imagine once you need to show hundreds/thousands formatted numbers on the page with no pagination. What you would probably like to use are helper methods number_with_precision and number_with_delimiter.

My customers asked me to display table with hundreds rows with ~20 numbers each (some are hidden in tool-tips, but still required to be rendered).

Let me show you how slow these helpers are. Let's prepare an Array with 1000 arrays with 100 floats each (1000 rows, 100 columns). And let print it on a view like:

<table class="very-small-font">
  <% @data.each do |row| %>
      <tr>
        <% row.each do |cell| %>
            <td>
              <%= number_with_precision cell, :precision => 3 %>
              <br/>
              <%= number_with_delimiter cell %>
            </td>
        <% end %>
      </tr>  
  <% end %>
</table>


<%= "I'm rendered in #{Time.now.to_f - @start.to_f}" %>


It takes 28-34 sec on my laptop (with i3).

And now, lets implement our own  methods with same functionality for our needs (see code here: https://github.com/alexkazeko/Performance-Tests/blob/master/app/helpers/tests_helper.rb);


  def lite_number_with_delimiter(number)

    return lite_number_with_delimiter(number.to_i) + "." + (number.to_s.split(".").last) if number.is_a?(Float)
    result = number.to_s.reverse.gsub(/\d\d\d/) { |m| m + "," }
    ((number.to_s.size) % 3 == 0) ? result.chop.reverse : result.reverse
  end


  def lite_number_with_precision(number, options = {})
    base = 10 ** (options[:precision] || 1)
    number = number.to_f
    return "NaN" if number.nan? or not number.finite?
    (number * base).to_i.to_f / base
  end


Page rendering is now taking 7-9 seconds. 3-4 times faster.

And this code doesn't pretend to be the fastest one...

I've added a simple Rails project to Github in case you want to try it yourself or see the code.

If you know some other examples like this - share in comments!

2012-03-01

A little bit more about Faye performance and Linux

Few months ago I wrote about Faye's high performance and I'm still happy with it. But Faye is doing better than Linux with default configuration :) Once, we faced with 2 problems:

  1. We mentioned that some users can't connect to Node.
  2. Sometimes response time of Faye was higher than usual

So, here are some useful configuration changes to default Linux settings (we are using CentOS, so something might be different with other distros).


1. Increase File Descriptors Max Amount. I've already wrote about it in one of my previous posts
Usual practice to change it persistently is to edit file /etc/security/limits.conf. Add these lines to the end:

 user soft nofile 256000
 user hard nofile 256000

where 'user' is user you use to start stunnel and/or faye. I actually suggest to change it for all users you use by adding

  user soft nofile 256000
  user hard nofile 256000

  user1 soft nofile 256000
  user2 hard nofile 256000
  ....

Default value is 1024 per user (or another - check it by running ulimit -n), which is definitely not enough. Stunnel opens file descriptors for every connection + you probably have other stuff running. So that means you can handle only 1024 minus other_applications_stuff connections.
Note: ulimit -n 256000 won't change it persistently. It will change it only for current session.
Note2: after modifying /etc/security/limits.conf simply re-login your user.


2. Tune Linux's network defaults by modifying /etc/sysctl.conf. Add these lines:

 net.ipv6.conf.all.disable_ipv6 = 1
 net.ipv4.tcp_synack_retries = 2
 net.ipv4.tcp_keepalive_time = 1800
 net.ipv4.tcp_tw_reuse = 1
 net.ipv4.tcp_tw_recycle = 1
 net.ipv4.tcp_tw_recycle = 0
 net.ipv4.tcp_tw_reuse = 0
 net.ipv4.tcp_fin_timeout = 15
 net.ipv4.ip_local_port_range = 15768    61000
 net.ipv4.tcp_rmem = 4096 87380 8388608
 net.ipv4.tcp_wmem = 4096 87380 8388608
 net.netfilter.nf_conntrack_max = 3097152
 net.nf_conntrack_max = 3097152
 net.core.wmem_max = 8388608
 net.core.rmem_max = 8388608
 kernel.shmmax = 107374182


You can google to see what every line means - I've collected them by reading forums and speaking to Linux geeks.  
To see you current values run sysctl -a


After all these changes (actually main changes are increasing amount of FD and net.netfilter.nf_conntrack_max) our server handles 50,000 requests per minute to Faye from RoR and keeps >5k websocket connections with users. No load, no latency, good uptime.


Appreciate you comments and thoughts about it. If you faced with same problems and have an advice - you are welcome :)

2011-11-04

Resque vs. DJ (Delayed_job): comparing performance

This post is dedicated to guys who think delayed_job is a good solution for high-loaded production application.


First, few words for people who don't know what it is:
Both DJ and Resque are gems which help to run heavy operations in background (not to freeze Rails). The difference is that DJ stores jobs in DB but Resque in Redis. Both things are easy to integrate with. And there is one thing in Resque which is brilliant - possibility to split queues.


So...
One of our high-loaded apps processes about 2 million background tasks per day (~1500 per minute). 
But initially, we've developed app with DJ. When we reached about 100k-150k jobs per day (~100 per minute) - we had to increase amount of workers because some of tasks work with slow external rest API and queue started growing. Doubling amount of workers caused huge growing of DB load. That happened because DJ started fighting for jobs. The algorithm of reserving job is simple - DJ fetches 5 jobs and checks if they are not locked by other workers. If they all are already locked - it fetched 5 new jobs. If traffic of your application depend on time of a day - you need to change amount of jobs every hour, otherwise workers will start executing a lot of SELECT requests.


Our first step was easy - we've set up 1 more DB and pointed delayed_jobs there. But that server was also very loaded: lots of SELECT, UPDATE and DELETE almost killed it.


What we knew is that we need a solution which temporary stores jobs in memory and do atomic push and pop. And we found it: Resque


We run workers on all our web servers but not on one. We do it because worker loads Rails environment and uses pretty much memory. All workers are connected to one Redis instance, which uses few megabytes of memory only. Both workers and Redis doesn't load system at all (just use some memory, which is cheap nowadays).


In my next post I'll share Rake tasks to start/stop workers on many servers using configuration stored in yaml file.

2011-11-02

Monitoring Faye shards with GOD

Faye is the thing you want to configure, start and forget about. Maybe update to latest version sometimes...

That's what we did on our project. We continue developing business logic while node is working pretty stable. Sometimes node can crash, for example: https://github.com/jcoglan/faye/issues/106 or https://github.com/jcoglan/faye/issues/37. Yes, those issues are fixed, but new ones can appear (after update for example). Or what if we want to restart Faye if it's using more than X megabytes of memory? Or how to restart all shards at one time?

So, to keep Faye working 24/7 I've configured God - very nice tool to monitor processes.
Note: I've been thinking about using monit for it, but I've chosen God because of Ruby syntax and nicer way to configure everything.


This post is mostly about monitoring shards of Faye distributed to many servers/ports, but you can also use God configs (pasted below) to monitor 1 instance.

2011-11-01

Sharding for Faye using gem faye_shards

In this post I want to describe basic features of new gem faye_shards.

If you are not sure do you need sharding or not and why to shard - read next topics:
Lets say there are 3 ways to use pub/sub:
  1. Users subscribe to his own, unique chat
  2. Users subscribe to global channel
  3. Users subscribe to both unique and global channel.
If your application uses only 2nd option and it's high-loaded - this gem is not for you. But it perfectly fits if your app uses 1st option and it's OK to use this gem with 3rd schema. Basically sharding for 1st schema means "we are going to connect user to some fixed shard and we always know to what one".
Actually this game can be used even if you run only 1 instance of Faye - it will just provide some helper methods ;)

What kind of sharding it supports:
  • It doesn't matter if you use Redis or not.
  • Works with both HTTP and HTTPS. It supports cases when you run HTTP Faye with SSL Offloading (using Stunnel for example. In my previous post I've described how to configure it).
  • It doesn't matter how many servers/ports you use for sharding.
Basic features it provides:
  • Helpers to get URLs for users.
  • Helpers to push data to faye.

2011-10-29

SSL support for high-loaded Faye (Node.js + Stunnel)

In my previous post I wrote about custom sharding of Faye. One of limitations was unavailability of SSL. Both ways of running Faye (Node.js and Thin) support SSL, but it's extremely slow.

So, what we need is a SSL Offloading which supports 'keep-alive' connections (otherwise Faye will do polling instead of establishing web-socket connection). 

One of the best tools for it is Stunnel. It's pretty fast - check author's benchmarks here: http://www.stunnel.org/?page=perf. I can tell you author didn't lie about it's speed :)

Stunnel installation 

Stunnel can be simply installed from almost all popular repos:
  # yum install stunnel
or
  # apt-get install stunnel

But by default it is compiled with libwrap, which I think not really needed for most web applications and it slows down Stunnel. So, I recommend to install it in this way:
  # wget ftp://ftp.nluug.nl/pub/networking/stunnel/stunnel-4.45.tar.gz
  # tar zxf stunnel-4.45.tar.gz
  # cd stunnel-4.45
  # ./configure --disable-libwrap
  # make && make install

Stunnel configuration