Kequc.com
Possibly the only thing more difficult than saying Kequc out loud is trying to remember how to spell it.

A week ago I shared an article with you regarding referral tracking and linkback. It was a very simple implementation of the concept and it quickly became necessary to make the implementation more robust.

Since I am restructuring the way we are going to store a lot of this data I needed to build a simple Rake task that would clear the existing Referral model instances from the database. Now I am free to more or less start over.

Below is the new Referral model.

models/referral.rb (ruby)
class Referral
  include Mongoid::Document

  field :domain
  field :addresses, type: Array, default: []
  field :weight, type: Integer, default: 0

  index :domain
  embedded_in :referrable, polymorphic: true

  after_save :calculate_weight

  def best_address
    self.addresses.max_by { |a| a['weight'] }['address']
  end

  protected

  def calculate_weight
    # Weight scoring based on number of unique visitors
    self.reload
    ips = self.addresses.collect { |a| a['visitors'] }.flatten.collect { |a| a['ip'] }
    self.set(:weight, ips.uniq.compact.length)
  end
end
[example of new referral format] (plain)
  # domain: "boogle.com", weight: 2, addresses:
  #   address: "http://www.boogle.com/page/", weight: 2, visitors:
  #     ip: "nn:nn:n:nnn", created_at: Date
  #     ip: "nnn:nnn:n:nnn", created_at: Date

You can see that I’ve set the polymorphic attribute to true for embedded_in.

This is because I’ve implemented the corresponding embeds_many :referrals method call in a reusable module. I’ve stored this new module in a directory within my models directory. With it I can now call include HasVisitors at the top of my Post model, which is where I currently want to use the methods contained within.

If I want to later I’ll be able to start using these same methods on other models easily.

models/post.rb (ruby)
class Post
  include Mongoid::Document
  include Mongoid::Timestamps
  include HasVisitors

  [etc..]
end
models/include/has_visitors.rb (ruby)
module HasVisitors
  extend ActiveSupport::Concern

  included do
    embeds_many :referrals, as: :referrable
  end

  def add_visitor(request, referrer=nil)
    referrer ||= request.referrer
    return if referrer.blank?
    uri = uri_from_address(referrer)
    uri_domain = domain_from_uri(uri)
    if uri_domain and domain_from_address(request.url) != uri_domain
      # Referrer is external website
      ignore = [
        /(www\.)?google.*/,
        /([^\.]+.)?search\.yahoo.*/,
        /search\.bing.*/,
        /(www\.)?baidu.*/,
        /([^\.]+.)?rambler.ru/,
        /(www\.)?yandex.ru/,
        /search\.msn.*/,
        /(www\.)?\.aol.*/,
        /(www\.)?altavista.*/,
        /(www\.)?feedster.*/,
        /search\.lycos.*/,
        /(www\.)?alltheweb.*/
      ]
      return if ignore.find { |a| a =~ uri_domain }
      visitor = {ip: request.ip, created_at: Time.now}
      referral = self.referrals.find_or_initialize_by(domain: uri_domain)
      address = referral.addresses.find { |a| a['address'] == uri.to_s }
      if address
        # Address exists second + visitor from this address
        address['visitors'].push(visitor)
        weight = address['visitors'].collect { |x| x['ip'] }.uniq.compact.length
        address['weight'] = weight
      else
        # First visitor from this address
        referral.addresses.push({address: uri.to_s, visitors: [visitor], weight: 1})
      end
      referral.save
    end
  end

  def weighted_referrals
    self.referrals.reject { |x| x['weight'] < 2 }.sort_by { |x| x['weight'] }.reverse
  end

  protected

  def uri_from_address(address)
    URI.parse(URI.encode(address)).normalize
  end

  def domain_from_address(address)
    domain_from_uri(uri_from_address(address))
  end

  def domain_from_uri(uri)
    uri.host ? uri.host.gsub(/^www\./, '') : nil
  end
end

I chose to keep has_visitors.rb near to my models, in the models directory because the only place I’ll be using it is in my other models. I named the directory include but it can be named anything you like. I know there are a few developers out there who like to do something similar and they often prefer the directory be named concerns. It’s not particularly important what you name the directory it will automatically be initialised in Padrino.

I filter out any domains that come from searches such as Google because it’s not important to linkback to Google search results. My focus with this feature is to link back primarily to similar websites which are linking to me.

There is quite a lot going on here with regard to editing and modifying Hashes and Arrays. I tried to make the code as clear as possible.

It is designed the way it is because I wanted each domain only to show up once as a referral per Post. That Referral object will then contain all of the different addresses which have been detected as linking to my Post from that domain. Addresses are scored separately for the number of unique visitors I have received as traffic.

Keeping it organised this way comes from considering that one page on one domain may be linking to me with several subtle changes in the URL. As a complicated set of querystring parameters for instance shouldn’t count as different pages that link to me. It should only count as one page but which has several viable addresses to link back to.

The best_address method as can be seen earlier in my Referral model, figures out which version of the address from that domain is sending me the most traffic. In my view it is easy to provide a link back to whichever one that is.

app/views/posts/_post.haml (haml)
  [etc..]
  - if !post.weighted_referrals.empty?
    .referrals
      - post.weighted_referrals.each do |referral|
        = link_to referral.domain, referral.best_address, :class => "referral"

New visitors are tracked from the show action in the posts controller. Controllers in Padrino have access to Sinatra’s request method which is getting passed to the add_visitor method in our HasVisitors module.

app/controllers/posts.rb (ruby)
  get :show, :map => "/:year/:month/:day/:slug/" do
    @post.add_visitor(request)
    render 'posts/show'
  end

Those are all the changes that I made.

There looks like there is quite a lot to this code, I feel like the next logical step would be to put all of this into a gem so that it’ll be easier to use, that is something I may do in the future.

I’ll update this article with a big orange banner at the top if I do that.

Apr
8
2012

Kequc.com is the personal website of Nathan Lunde-Berry it is centred mainly around the Ruby programming language.

This website is meant as a portfolio, for my thoughts and as a place to show my work.

Current location: Berlin, Germany

Time: 13:47