A week ago I shared an article with you regarding referral tracking and linkback. It was a very simple implementation of the concept and it quickly became necessary to make the implementation more robust.
Since I am restructuring the way we are going to store a lot of this data I needed to build a simple Rake task that would clear the existing Referral model instances from the database. Now I am free to more or less start over.
Below is the new Referral model.
class Referral include Mongoid::Document field :domain field :addresses, type: Array, default: [] field :weight, type: Integer, default: 0 index :domain embedded_in :referrable, polymorphic: true after_save :calculate_weight def best_address self.addresses.max_by { |a| a['weight'] }['address'] end protected def calculate_weight # Weight scoring based on number of unique visitors self.reload ips = self.addresses.collect { |a| a['visitors'] }.flatten.collect { |a| a['ip'] } self.set(:weight, ips.uniq.compact.length) end end
# domain: "boogle.com", weight: 2, addresses: # address: "http://www.boogle.com/page/", weight: 2, visitors: # ip: "nn:nn:n:nnn", created_at: Date # ip: "nnn:nnn:n:nnn", created_at: Date
You can see that I’ve set the polymorphic attribute to true for embedded_in.
This is because I’ve implemented the corresponding embeds_many :referrals method call in a reusable module. I’ve stored this new module in a directory within my models directory. With it I can now call include HasVisitors at the top of my Post model, which is where I currently want to use the methods contained within.
If I want to later I’ll be able to start using these same methods on other models easily.
class Post include Mongoid::Document include Mongoid::Timestamps include HasVisitors [etc..] end
module HasVisitors extend ActiveSupport::Concern included do embeds_many :referrals, as: :referrable end def add_visitor(request, referrer=nil) referrer ||= request.referrer return if referrer.blank? uri = uri_from_address(referrer) uri_domain = domain_from_uri(uri) if uri_domain and domain_from_address(request.url) != uri_domain # Referrer is external website ignore = [ /(www\.)?google.*/, /([^\.]+.)?search\.yahoo.*/, /search\.bing.*/, /(www\.)?baidu.*/, /([^\.]+.)?rambler.ru/, /(www\.)?yandex.ru/, /search\.msn.*/, /(www\.)?\.aol.*/, /(www\.)?altavista.*/, /(www\.)?feedster.*/, /search\.lycos.*/, /(www\.)?alltheweb.*/ ] return if ignore.find { |a| a =~ uri_domain } visitor = {ip: request.ip, created_at: Time.now} referral = self.referrals.find_or_initialize_by(domain: uri_domain) address = referral.addresses.find { |a| a['address'] == uri.to_s } if address # Address exists second + visitor from this address address['visitors'].push(visitor) weight = address['visitors'].collect { |x| x['ip'] }.uniq.compact.length address['weight'] = weight else # First visitor from this address referral.addresses.push({address: uri.to_s, visitors: [visitor], weight: 1}) end referral.save end end def weighted_referrals self.referrals.reject { |x| x['weight'] < 2 }.sort_by { |x| x['weight'] }.reverse end protected def uri_from_address(address) URI.parse(URI.encode(address)).normalize end def domain_from_address(address) domain_from_uri(uri_from_address(address)) end def domain_from_uri(uri) uri.host ? uri.host.gsub(/^www\./, '') : nil end end
I chose to keep has_visitors.rb near to my models, in the models directory because the only place I’ll be using it is in my other models. I named the directory include but it can be named anything you like. I know there are a few developers out there who like to do something similar and they often prefer the directory be named concerns. It’s not particularly important what you name the directory it will automatically be initialised in Padrino.
I filter out any domains that come from searches such as Google because it’s not important to linkback to Google search results. My focus with this feature is to link back primarily to similar websites which are linking to me.
There is quite a lot going on here with regard to editing and modifying Hashes and Arrays. I tried to make the code as clear as possible.
It is designed the way it is because I wanted each domain only to show up once as a referral per Post. That Referral object will then contain all of the different addresses which have been detected as linking to my Post from that domain. Addresses are scored separately for the number of unique visitors I have received as traffic.
Keeping it organised this way comes from considering that one page on one domain may be linking to me with several subtle changes in the URL. As a complicated set of querystring parameters for instance shouldn’t count as different pages that link to me. It should only count as one page but which has several viable addresses to link back to.
The best_address method as can be seen earlier in my Referral model, figures out which version of the address from that domain is sending me the most traffic. In my view it is easy to provide a link back to whichever one that is.
[etc..] - if !post.weighted_referrals.empty? .referrals - post.weighted_referrals.each do |referral| = link_to referral.domain, referral.best_address, :class => "referral"
New visitors are tracked from the show action in the posts controller. Controllers in Padrino have access to Sinatra’s request method which is getting passed to the add_visitor method in our HasVisitors module.
get :show, :map => "/:year/:month/:day/:slug/" do @post.add_visitor(request) render 'posts/show' end
Those are all the changes that I made.
There looks like there is quite a lot to this code, I feel like the next logical step would be to put all of this into a gem so that it’ll be easier to use, that is something I may do in the future.
I’ll update this article with a big orange banner at the top if I do that.

