Thursday, February 26, 2009

data_fabric

DataFabric provides flexible database connection switching for ActiveRecord.

We needed two features to scale our mysql database: application-level sharding and master/slave replication. Sharding is the process of splitting a dataset across many independent databases. This often happens based on geographical region (e.g. craigslist) or category (e.g. ebay). Replication provides a near-real-time copy of a database which can be used for fault tolerance and to reduce load on the master node. Combined, you get a scalable database solution which does not require huge hardware to scale to huge volumes. Or: DPAYEIOB - don’t put all your eggs in one basket. :-)

Installation

  gem install data_fabric

How does it work?

You describe the topology for your database infrastructure in your model(s). Different models can use different topologies.

  class MyHugeVolumeOfDataModel < ActiveRecord::Base
data_fabric :replicated => true, :shard_by => :city
end

There are four supported modes of operation, depending on the options given to the data_fabric method. The plugin will look for connections in your config/database.yml with the following convention:

No connection topology: #{environment} - this is the default, as with ActiveRecord, e.g. "production"

  data_fabric :replicated => true

#{environment}_#{role} - no sharding, just replication, where role is "master" or "slave", e.g. "production_master"

  data_fabric :shard_by => :city

#{group}_#{shard}_#{environment} - sharding, no replication, e.g. "city_austin_production"

  data_fabric :replicated => true, :shard_by => :city

#{group}_#{shard}_#{environment}_#{role} - sharding with replication, e.g. "city_austin_production_master"

When marked as replicated, all write and transactional operations for the model go to the master, whereas read operations go to the slave.

Since sharding is an application-level concern, your application must set the shard to use based on the current request or environment. The current shard is set on a thread local variable. For example, you can set the shard in an ActionController around_filter based on the user as follows:

  class ApplicationController < ActionController::Base
around_filter :select_shard

private
def select_shard(&block)
DataFabric.activate_shard(:city => @current_user.city, &block)
end
end

Warnings

  • Sharded models should never be placed in the session store or you will get "Shard not set" errors when the session is persisted.
  • DataFabric does not support running with ActiveRecord’s allow_concurrency = true in AR 2.0 and 2.1. allow_concurrency is gone in AR 2.2.
http://github.com/fiveruns/data_fabric/tree/master

No comments: