DataFabric provides flexible database connection switching for ActiveRecord.
We needed two features to scale our mysql database: application-level sharding and master/slave replication. Sharding is the process of splitting a dataset across many independent databases. This often happens based on geographical region (e.g. craigslist) or category (e.g. ebay). Replication provides a near-real-time copy of a database which can be used for fault tolerance and to reduce load on the master node. Combined, you get a scalable database solution which does not require huge hardware to scale to huge volumes. Or: DPAYEIOB - don’t put all your eggs in one basket. :-)
Installation
gem install data_fabric
How does it work?
You describe the topology for your database infrastructure in your model(s). Different models can use different topologies.
class MyHugeVolumeOfDataModel < ActiveRecord::Base
data_fabric :replicated => true, :shard_by => :city
end
There are four supported modes of operation, depending on the options given to the data_fabric method. The plugin will look for connections in your config/database.yml with the following convention:
No connection topology: #{environment} - this is the default, as with ActiveRecord, e.g. "production"
data_fabric :replicated => true
#{environment}_#{role} - no sharding, just replication, where role is "master" or "slave", e.g. "production_master"
data_fabric :shard_by => :city
#{group}_#{shard}_#{environment} - sharding, no replication, e.g. "city_austin_production"
data_fabric :replicated => true, :shard_by => :city
#{group}_#{shard}_#{environment}_#{role} - sharding with replication, e.g. "city_austin_production_master"
When marked as replicated, all write and transactional operations for the model go to the master, whereas read operations go to the slave.
Since sharding is an application-level concern, your application must set the shard to use based on the current request or environment. The current shard is set on a thread local variable. For example, you can set the shard in an ActionController around_filter based on the user as follows:
class ApplicationController < ActionController::Base
around_filter :select_shard
private
def select_shard(&block)
DataFabric.activate_shard(:city => @current_user.city, &block)
end
end
Warnings
- Sharded models should never be placed in the session store or you will get "Shard not set" errors when the session is persisted.
- DataFabric does not support running with ActiveRecord’s allow_concurrency = true in AR 2.0 and 2.1. allow_concurrency is gone in AR 2.2.
No comments:
Post a Comment