Lambda Blues: List Comprehensions And Beyond

From Simon Peyton-Jones web site I find a link to a paper discussing extensions to list comprehensions to make them more like SQL select statements.

To make use of these we need to do two things: firstly warn the GHCi that I will be using the language extensions:

:set -XTransformListComp

And then import some extra functionality it will need:

:module +GHC.Exts

As an example we have a table of people with the Name (String), Dept (String) and Salary (Int).
To see the contents of this table on the database we issue a basic SQL select command:

select
*
from
employees

And the results are

Name Dept Salary
----------------------------
Simon MS 80
Erik MS 100
Phil Ed 40
Gordon Ed 45
Paul Yale 60
Polly MS 45
Sue Ed 90
Jill Yale 100
Jackie MS 95

And how we would replicate this in Haskell code?

Start by setting out a couple of type synonyms to make the database fields clearer:

type Name = String
type Dept = String
type Salary = Int

And here is our database table as a Haskell list of tuples:

employees :: [(Name, Dept, Salary)]
employees = [("Simon", "MS", 80)
,("Erik", "MS", 100)
,("Phil", "Ed", 40)
,("Gordon", "Ed", 45)
,("Paul", "Yale", 60)
,("Polly", "MS", 45)
,("Sue", "Ed", 90)
,("Jill", "Yale", 100)
,("Jackie", "MS", 95)]

In Haskell the query to select all the records comes out as a simple list comprehension:

selectAll = [ (name, dept, salary) |
(name, dept, salary) <- employees ]

The result is so:

*Main> selectAll
[("Simon","MS",80),("Erik","MS",100),("Phil","Ed",40),("Gordon","Ed",45),("Paul","Yale",60),("Polly","MS",45),("Sue","Ed",90),("Jill","Yale",100),("Jackie","MS",95)]

So far so normal. For our first enhancement we want to be able to specify an order for the output records. For example let us see the records in order of increasing salary. This is simple enough with an "order by" clause in SQL:

select
name, salary
from
employees
order by salary

name salary
Phil 40
Gordon 45
Polly 45
Paul 60
Simon 80
Sue 90
Jackie 95
Jill 100
Erik 100

To do the same in a list comprehension we add an extra use for the keyword then, which
here lets us introduce the function sortWith into the expression:

selectWithOrder = [ (name, salary) |
(name, dept, salary) <- employees,
then sortWith by salary ]

The sortWith function sorts a list of elements using the user supplied function to project something out of each element

*Main> :t sortWith
sortWith :: Ord b => (a -> b) -> [a] -> [a]

So here we are sorting by the salary field of the record:

*Main> selectWithOrder
[("Phil",40),("Gordon",45),("Polly",45),("Paul",60),
("Simon",80),("Sue",90),("Jackie",95),("Erik",100),
("Jill",100)]

But nobody wants to see who is getting the lowest salary. In SQL we add the keyword desc to see the results sorted into descending order:

select
name, salary
from
employees
order by salary desc

name salary
Erik 100
Jill 100
Jackie 95
Sue 90
Simon 80
Paul 60
Polly 45
Gordon 45
Phil 40

To do the same in our list comprehension we use the newtype Down like this:

selectWithReverseOrder =
[ (name, salary) |
(name, dept, salary) <- employees,
then sortWith by Down salary ]

Down has the ingenious function of inverting the sort order:

newtype Down a = Down a deriving (Eq, Show, Read)

instance Ord a => Ord (Down a) where
compare (Down x) (Down y) = y `compare` x

So if you compare Down x with Down y you get the reverse of what you'd get if you compare x with y. So our comprehension now comes up with the top salary first:

*Main> selectWithReverseOrder
[("Erik",100),("Jill",100),("Jackie",95),("Sue",90),
("Simon",80),("Paul",60),("Gordon",45),("Polly",45),("Phil",40)]

We can also create selections with group functions such as sum: for example we need to list the departments and their total salary spends. In SQL we group by a field and apply functions such as sum():

select
dept, sum(salary) as [total salary]
from
employees
group by
dept

Which looks like this:

dept total salary
Ed 175
MS 320
Yale 160

The corresponding list comprehension is this:

selectByGroup = [ (the dept, sum salary) |
(name, dept, salary) <- employees,
then group by dept using groupWith ]

*Main> selectByGroup
[("Ed",175),("MS",320),("Yale",160)]

There are several new things happening here.

Here we have a group by clause added to the list comprehension. This does something dodgy to the fields: what we are returning is not individual dept and salary fields but lists of fields, so we add the the function and the sum function to reduce these back to a single record.

The function the is a new one: it checks first that all the elements of the list are identical and then returns just that duplicated element. (It bears no resemblance to the Common Lisp operator of the same name, I checked)

*Main> :t the
the :: Eq a => [a] -> a

The function groupWith uses the function that you supply, with type (a -> b), to projects out an element (the one of type b) from each list element in your list - - it uses the b-type item to sort the input list and then to form groups by equality on these projected elements.

*Main> :t groupWith
groupWith :: Ord b => (a -> b) -> [a] -> [[a]]

But I just want to see the top one... the department with the largest spend...

In SQL I can opt to see the top so many of the results, like this:

select top 1
dept, sum(salary) as [total salary]
from
employees
group by
dept
order by sum(salary) desc

Which comes out as:

dept total salary
MS 320

In our newly empowered list comprehensions we can use the then keyword this time to bring in the standard function take, to take just the first record from the result set:

selectMaxGroup = [ (the dept, sum salary) |
(name, dept, salary) <- employees,
then group by dept using groupWith,
then sortWith by Down salary,
then take 1 ]

*Main> selectMaxGroup
[("MS",320)]

Lambda Blues

Monday, 3 August 2015

List Comprehensions And Beyond

No comments:

Post a Comment