Custom fields in Rails

Posted by | Development | 5 Replies.

What approach do you take when adding dynamic attributes to model objects stored in a traditional database?  Let’s say a bug model with static fields like name and species also needs to support any number of additional dynamic fields specified by different bug collectors.  There are a number of possible solutions to this common problem, so let’s consider the options.

  1. Serialize a hash ({ color: "brown", arms: 2 }) into a column at save, and de-serialize at load.  This serialization would be easy from a configuration standpoint, but would scale poorly and would be rubbish for searching on key values.  Rejected.
  2. Integrate a NoSQL datastore like redis.  A NoSQL datastore would offer speed and a task-appropriate API at the expense of integrating, configuring, and maintaining a new storage component for a minor feature.  Reasonable, but not appropriate in the context.
  3. Add a series of “CustomField#{n}” columns to the bugs table.  No, just no.  Don’t make me cry.
  4. Dynamically add columns to the bugs table (or to per-collector dynamic tables) when new custom fields are defined.  These columns would allow strict type validation at the DB layer, but would limit the total number of columns and would be difficult to maintain with standard ORMs.  Decidedly too much trouble.
  5. Design a standard relational mapping from the bugs table to a custom field table with key and value columns.  This approach is widely used as it fits a standard RDBMS schema.  Its only obvious downside is increasing the number of table rows and objects that need to be managed and read.
  6. Use PostgreSQL’s hstore extension, which provides a key/value hash column type.  If PostgreSQL is already your RDBMS or is an easy switch, using hstore essentially mixes in a NoSQL approach with minimal extra configuration.

Let’s consider the custom field table (cftable) and PostgreSQL hstore (hstore) approaches in more detail, in the context of an application running on Ruby on Rails.

Configuration

Assume that PostgreSQL is already configured (or watch the RailsCast on migrating to PostgreSQL) with the contrib extensions downloaded (part of some postgres homebrews, or sudo apt-get install postgresql-contrib-9.0 on Ubuntu).

For cftable, there’s no additional configuration.  We’re just using standard database tables.

For hstore, we need the activerecord-postgres-hstore gem added to the Gemfile:

gem 'activerecord-postgres-hstore', github: 'engageis/activerecord-postgres-hstore'

after which we run bundler to install the gem:

bundle install

We then create and run a migration that adds hstore support to the PostgreSQL server:

rails g hstore:setup
rake db:migrate

which creates a simple migration that looks like this:

class SetupHstore < ActiveRecord::Migration
  def self.up
    execute "CREATE EXTENSION IF NOT EXISTS hstore"
  end

  def self.down
    execute "DROP EXTENSION IF EXISTS hstore"
  end
end

Schema

To compare the two approaches in code, we’ll create two bug tables, crickets and horseflies, for custom fields and hstores respectively. Each will have a name column, and each will have a set of custom fields collectively called parts.

For cftable, we’ll create a custom field table (duh) with a polymorphic association to any other table that wants to store custom fields.  The migration with GIN indexes for full-text search looks like:

class CreateCustomFields < ActiveRecord::Migration
   def up
      create_table :custom_fields do |t|
         t.text :key
         t.text :value
         t.references :custom_fieldable, :polymorphic => true
         t.timestamps
      end
      execute "CREATE INDEX custom_fields_gin_key ON custom_fields (key)"
      execute "CREATE INDEX custom_fields_gin_value ON custom_fields (value)"
   end

   def down
      drop_table :custom_fields
      execute "DROP INDEX custom_fields_gin_key"
      execute "DROP INDEX custom_fields_gin_value"
   end
end

and the table structure looks like:

Table "public.custom_fields"
        Column         |            Type       
-----------------------+----------------------------
 id                    | integer                    
 key                   | text                       
 value                 | text                       
 custom_fieldable_id   | integer                    
 custom_fieldable_type | character varying(255)
 created_at            | timestamp without time zone
 updated_at            | timestamp without time zone

The custom_fieldable_* columns support the polymorphic association, so that any other table row can be linked to a custom field value.

For hstore, we’ll add a parts column of type hstore to the horseflies table.  The migration, again with a GIN index, looks like:

class CreateHorseflies < ActiveRecord::Migration
   def up
      create_table :horseflies do |t|
         t.text :name
         t.hstore :parts
         t.timestamps
      end
      execute "CREATE INDEX horseflies_gin_parts ON horseflies USING GIN(parts)"
   end

   def down
      drop_table :horseflies
      execute "DROP INDEX horseflies_gin_parts"
   end
end

and the table structure looks like:

Table "public.custom_fields"
        Column         |            Type            
-----------------------+----------------------------
 id                    | integer                    
 key                   | text                       
 value                 | text                       
 custom_fieldable_id   | integer                    
 custom_fieldable_type | character varying(255)     
 created_at            | timestamp without time zone
 updated_at            | timestamp without time zone

Model

We’ll need two model classes for the cftable approach, one for the cricket records and another for the custom field records.

The CustomField model class just has an accessible key and value, and a child relationship to any other custom_fieldable model.

class CustomField < ActiveRecord::Base
   attr_accessible :key, :value
   belongs_to :custom_fieldable, :polymorphic => true
end

The Cricket model class has a parts association linked to custom fields.  It saves its associated custom fields whenever a cricket record is saved, and it destroys them when the cricket is accidentally stepped on.

class Cricket < ActiveRecord::Base
   attr_accessible :name
   has_many :parts, :as => :custom_fieldable, :autosave => true,
    :class_name => 'CustomField', :dependent => :destroy
end

A cricket’s array of parts can be accessed like:

cricket.parts.detect{ |p| p.key == 'arms' }.value
=> '2'

The Horsefly model class is naturally simpler, with just a serializable parts attribute (note that the serialization method is obviously distinct from Ruby’s standard hash serialization, which wouldn’t result in data accessible by the RDBMS).

class Horsefly < ActiveRecord::Base
   attr_accessible :name, :parts
   # Necessary for serializing parts as an hstore
   serialize :parts, ActiveRecord::Coders::Hstore
end

A horsefly’s hash of parts can be accessed like:

horsefly.parts['arms']
=> '2'

 Getting and setting values

Let’s encapsulate getting and setting the bug parts a little, so users of the models don’t need to directly access the parts attribute.  To that end, we’ll add helpers to each of the model classes.

For both approaches, getting and setting parts should look like:

bug.part :arms, '2'
=> '2'
bug.part :arm
=> '2'
bug.part :thumb
=> nil # bugs don't have thumbs, silly

The helper methods could be written in a generic module for reuse in other models with custom fields, but for now we’ll add them to the Cricket and Horsefly model classes.

For the Cricket model, the helper method needs to find or create a custom field record with the appropriate key:

# Find a part in memory by key.
# The speed of this method when there are large numbers of custom fields
# could be increased by using a SQL lookup unless parts_changed?
def part_record key
   parts.detect { |p| p.key == key }
end

# Set or get a part.
#   o.part :arm, 2
#   o.part :arm
def part(*args)
   key, value = args
   key = key && key.to_s
   if args.size == 1
      p = part_record(key) and p.value
   elsif args.size == 2
      raise ArgumentError, "invalid key #{key.inspect}" unless key
      # Find or create a custom field object with the appropriate key
      p = part_record(key) || self.parts.new(key: key)
      p.value = value
   else raise ArgumentError, "wrong number of arguments (#{args.size} for 1 or 2)"
   end
end

For the Horsefly model, all the helper method is really doing is making sure the parts attribute is a hash (rather than nil) and marking the horsefly object as dirty after a key value changes (since Rails’ ORM won’t notice the record change otherwise).

# Set or get a part.
#   o.part :arm, 2
#   o.part :arm
def part(*args)
   key, value = args
   key = key && key.to_s
   if args.size == 1
      parts && parts[key]
   elsif args.size == 2
      raise ArgumentError, "invalid key #{key.inspect}" unless key
      parts_will_change!
      self.parts = (parts || {}).merge(key => value)
      self.parts[key]
   else raise ArgumentError, "wrong number of arguments (#{args.size} for 1 or 2)"
   end
end

At this point, comparisons of getting and setting cricket and horsefly parts aren’t very interesting:

cricket.part :arms, '4'
=> '4'
cricket.part :arms
=> '4'
horsefly.part :arms, '4'
=> '4'
horsefly.part :arms
=> '4'

Searching

So let’s move on to searching for bugs with certain keys and values, where the code for the two approaches starts to further diverge.

Searching for crickets based on the existence or value of a key uses standard SQL joins (note that because we’re referencing custom_fields as parts, the join and where methods take different table aliases):

# Find crickets with any arms
Cricket.joins(:parts).where(custom_fields: {key: 'arms'})

# Find crickets with 2 arms
Cricket.joins(:parts).where(custom_fields: {key: 'arms', value: '2'})

# Find crickets with a partial text search
Cricket.joins(:parts)
 .where("custom_fields.key = :key AND custom_fields.value LIKE :value",
        key: 'middle_name', value: '%calvin%')

# Delete all arms values for all crickets
CustomField.find_by_custom_fieldable_type_and_key('Cricket', 'arms')
 .destroy

Searching for horseflies uses custom hstore functions and operators which require custom SQL expressions:

# Find horseflies with any arms
Horsefly.where("parts ? :key", key: 'arms')

# Find horseflies with 2 arms
Horsefly.where("parts @> (:key => :value)", key: 'arms', value: '2')

# Find horseflies with a partial text search
Horsefly.where("parts -> :key LIKE :value",
               key: 'middle_name', value: "%hamlet%")

# Delete all arms values for all horseflies
Horsefly.delete_key(:parts, :arms)

Wrapping up

From a development standpoint, I can’t say that either approach is categorically better.  I prefer the organizational structure that hstore columns provide by keeping the custom fields inside the related record.  In another post, I’ll look at performance considerations of these two approaches.

For more on getting started using hstore in Rails: