Rails and SQL

Paul A. Jungwirth

PDX.rb

May 2015

Plan

Inspections Schema

restaurants --< inspections --< violations
-----------     -----------     ----------
id              id              id
name            restaurant_id   inspection_id
                score           name
                inspected_at
r.name       i.score  i.inspected_at   v.name
------       -------  --------------   ------
Bob's Diner       77      2015-01-07   Rats
Bob's Diner       71      2015-01-15   Rats
Bob's Diner       71      2015-01-15   Zombies
Bob's Diner       75      2015-03-15   Bats
Joe's Place 
Crystal Palace   100      2015-01-07

Courtesy of Robb Schecter at eaternet.io.

SQL: Easy Stuff

SELECT  id, name
FROM    inspections
WHERE   score > 80
ORDER BY inspected_at DESC
SELECT  date_trunc('month', inspected_at) AS m,
        MAX(score) AS winner
FROM    inspections
GROUP BY m
HAVING  MAX(score) > 90
ORDER BY m

ActiveRecord: Easy Stuff

@inspections = Inspection.where("score > 80").
                          order(inspected_at: :desc)

@violations = @inspection.violations.where(name: "Rats")

puts @inspection.violations.where(name: "Rats").to_sql

ActiveRecord where

Inspection.where(score: 90)
Inspection.where("score > ?", 90)
Inspection.where("inspections.score > ?", 90)

Restaurant.where(name: "This is safe")
Restaurant.where("name = ?", "Still safe")
Restaurant.where("name = '#{oops}'")

SQL joins: INNER vs OUTER

SELECT  *
FROM    restaurants r
INNER JOIN inspections i
ON      i.restaurant_id = r.id
SELECT  *
FROM    restaurants r
LEFT OUTER JOIN inspections i
ON      i.restaurant_id = r.id

ON vs WHERE: inner join

Equivalent with inner joins:

SELECT  *
FROM    restaurants r
INNER JOIN inspections i
ON      i.restaurant_id = r.id
AND     i.score > 80

equals:

SELECT  *
FROM    restaurants r
INNER JOIN inspections i
ON      i.restaurant_id = r.id
WHERE   i.score > 80

ON vs WHERE: outer join

Different with outer joins!

SELECT  *
FROM    restaurants r
LEFT OUTER JOIN inspections i
ON      i.restaurant_id = r.id
AND     i.score > 80

returns more rows than:

SELECT  *
FROM    restaurants r
LEFT OUTER JOIN inspections i
ON      i.restaurant_id = r.id
WHERE   i.score > 80
  1. Apply the ON fitlers.
  2. Add NULL rows where necessary.
  3. Apply the WHERE filters.

Too many OUTER JOINs

SELECT  r.name,
        AVG(i.score)
FROM    restaurants r
LEFT OUTER JOIN inspections i
ON      i.restaurant_id = r.id
LEFT OUTER JOIN violations v
ON      v.inspection_id = i.id
GROUP BY r.id

ActiveRecord joins

joins gives an inner join:

@good_restaurants = Restaurant.joins(:inspections).
                               where("inspections.score > ?", 80)
- @good_restaurants.each do |r|
  %li= r.name + ": " + r.inspections.first.score

ActiveRecord includes

includes gives a LEFT OUTER JOIN (in effect):

@restaurants = Restaurant.includes(:inspections)
                       # .where("inspections.score > ?", 80)

might produce:

SELECT  *
FROM    restaurants
LEFT OUTER JOIN inspections
ON      inspections.restaurant_id = restaurants.id
WHERE   

Extra attributes with select

class Restaurant
  def inspections_with_violation_counts
    inspections.
      select("inspections.*").
      select("COUNT(violations.id) AS violations_count").
      joins(<<-EOQ).
        LEFT OUTER JOIN violations
        ON violations.inspection_id = inspections.id
      EOQ
      group("inspections.id")
  end 
end

Raw SQL to Models

Inspection.find_by_sql([<<-EOQ, 1.2])
  SELECT  *,
          score * ? AS curved_score
  FROM    inspections
EOQ

Raw SQL

connection.select_rows("SELECT a, b FROM bar")
# [["1","2"], ["4","5"]]

connection.select_all("SELECT a, b FROM bar")
# #<ActiveRecord::Result:0x007ff34710a5f8>

connection.select_all("SELECT a, b, c FROM bar").to_a
# [{"a"=>"1", "b"=>"2"}, {"a"=>"4","b"=>"5"}]

Raw SQL 2

connection.select_values("SELECT id FROM inspections")
# ["1", "4"]

connection.select_value("SELECT MIN(id) FROM inspections")
# ["1"]

connection.select_one("SELECT MIN(id) AS id FROM inspections")
# {"id"=>"1"}

Raw SQL quoting

connection.select_rows <<-EOQ
  SELECT  id, name
  FROM    inspections
  WHERE   name = '#{connection.quote_string(foo)}'
  OR      id = #{bar.to_i}
EOQ

Simple scope

class Inspection

  scope :passing, where("score > 80")

  scope :between, -> {|start_day, end_day|
    where("inspected_at BETWEEN ? AND ?", start_day, end_day)
  }

end

Correlated Subqueries

All restaurants with no inspections yet:

SELECT  *
FROM    restaurants r
WHERE   NOT EXISTS (SELECT  1
                    FROM    inspections i
                    WHERE   i.restaurant_id = r.id)

Subqueries with scope

scope :without_inspection, -> {
  where(<<-EOQ)
    NOT EXISTS (SELECT  1
                FROM    inspections i
                WHERE   i.restaurant_id = restaurants.id)
  EOQ
}

Merging scopes

class Inspection
  scope :perfect, -> { where(score: 100) }
end

class Restaurant
  scope :with_a_perfect_score, -> {
    joins(:inspections).merge(Inspection.perfect)
  }
end

Time Series with naive GROUP BY

Count the inspections in each month of 2015:

SELECT  EXTRACT('MONTH' FROM inspected_at) m,
        COUNT(*)
FROM    inspections
WHERE   EXTRACT('YEAR' FROM inspected_at) = 2015
GROUP BY m
ORDER BY m

Time Series with generate_series

SELECT  s.m + 1 AS month,
        COUNT(id)
FROM    generate_series(0, 11) s(m)
LEFT OUTER JOIN inspections
ON      inspected_at
        BETWEEN '2015-01-01'::date + (s.m || ' MONTHS')::interval
        AND     '2015-01-01'::date + ((s.m + 1) || ' MONTHS')::interval
                                   - INTERVAL '1 DAY'
GROUP BY s.m
ORDER BY s.m

generate_series in from

class Restaurant
  def inspections_per_month(year)
    Inspection.
      select("s.m + 1 AS inspection_month").
      select("COUNT(inspections.id) AS inspections_count").
      from("generate_series(0, 11) s(m)").
      joins(<<-EOQ).
        LEFT OUTER JOIN inspections
        ON  inspections.inspected_at
            BETWEEN '#{year.to_i}-01-01'::date
                    + (s.m || ' MONTHS')::interval
            AND     '#{year.to_i}-01-01'::date
                    + ((s.m + 1) || ' MONTHS')::interval
                    - INTERVAL '1 DAY'
        AND inspections.restaurant_id = #{id.to_i}
      EOQ
      group("inspection_month").
      order("inspection_month")
  end
end

generate_series CTE

CTE: Common Table Expression

WITH t(month) AS (
  SELECT  '2015-01-01'::date + (s.m || ' MONTHS')::interval
  FROM    generate_series(0, 11) s(m)
)
SELECT  t.month,
        COUNT(id)
FROM    t
LEFT OUTER JOIN inspections
ON      inspected_at BETWEEN month AND month + INTERVAL '1 MONTH' - INTERVAL '1 DAY'
GROUP BY month
ORDER BY month

generate_series in with

Inspection.
  with("t(month)" => <<-EOQ).
    SELECT  '2015-01-01'::date + (s.m || ' MONTHS')::interval
    FROM    generate_series(0, 11) s(m)
  EOQ
  select("t.month").
  select("COUNT(inspections.id) AS inspections_count").
  from("t").
  joins(<<-EOQ).
    LEFT OUTER JOIN inspections
    ON  inspections.inspected_at BETWEEN t.month
                                 AND t.month + INTERVAL '1 MONTH'
                                             - INTERVAL '1 DAY'
    AND inspections.restaurant_id = #{id.to_i}
  EOQ
  group("t.month").
  order("t.month")

Score of most recent inspection

Grouping and aggregates

Define a FIRST aggregate function:

def up
  # https://wiki.postgresql.org/wiki/First/last_%28aggregate%29

  connection.execute <<-EOQ
    CREATE OR REPLACE FUNCTION public.first_agg ( anyelement, anyelement )
    RETURNS anyelement LANGUAGE sql IMMUTABLE STRICT AS $$
    SELECT $1;
    $$;
  EOQ

  connection.execute <<-EOQ
    CREATE AGGREGATE public.first (
      sfunc    = public.first_agg,
      basetype = anyelement,
      stype    = anyelement
    );
  EOQ
end

Grouping and aggregates

scope :with_latest_score, -> {
  select("restaurants.*").
    select(<<-EOQ).
      FIRST(
        inspections.score
        ORDER BY inspections.inspected_at DESC
      ) AS latest_score
    EOQ
    joins(<<-EOQ).
      LEFT OUTER JOIN inspections
      ON inspections.restaurant_id = restaurants.id
    EOQ
    group("restaurants.id")
}

Latest score with Lateral Joins

SELECT  r.name,
        latest.inspected_at,
        latest.score
FROM    restaurants r
LEFT OUTER JOIN LATERAL (
  SELECT  *
  FROM    inspections i
  WHERE   i.restaurant_id = r.id
  ORDER BY i.inspected_at DESC
  LIMIT 1
) latest
ON true
ORDER BY latest.score DESC NULLS LAST

Lateral Joins in Rails

Restaurant.
  select(:name).
  select("latest.inspected_at AS latest_inspected_at").
  select("latest.score AS latest_score").
  joins(<<-EOQ).
    LEFT OUTER JOIN LATERAL (
      SELECT  *
      FROM    inspections i
      WHERE   i.restaurant_id = restaurants.id
      ORDER BY i.inspected_at DESC
      LIMIT 1
    ) latest
    ON true
  EOQ
  order("latest.score DESC NULLS LAST")

Window functions

Compare each inspection score to that restaurant's average score:

SELECT  i.inspected_at,
        r.name,
        i.score,
        AVG(i.score) OVER (PARTITION BY i.restaurant_id)
FROM    inspections i
INNER JOIN restaurants r
ON      r.id = i.restaurant_id
ORDER BY i.inspected_at, r.name

Latest score by Window functions

WITH latest AS (
  SELECT  restaurant_id,
          rank() OVER (
            PARTITION BY restaurant_id
            ORDER BY inspected_at DESC
          ) AS rank,
          first_value(score) OVER (
            PARTITION BY restaurant_id
            ORDER BY inspected_at DESC
          ) AS score
  FROM    inspections
)
SELECT  r.name,
        latest.score
FROM    restaurants r
LEFT OUTER JOIN latest
ON      latest.restaurant_id = r.id
AND     latest.rank = 1
ORDER BY latest.score

More

Thank You!

Read More:

This talk:

[contents]

deck.rb presentation

/

#