From
Simon Peyton-Jones web site I find a link to a
paper discussing extensions to list comprehensions to make them more like SQL select statements.
To make use of these we need to do two things: firstly warn the GHCi that I will be using the language extensions:
:set -XTransformListComp
And then import some extra functionality it will need:
:module +GHC.Exts
As an example we have a table of people with the Name (String), Dept (String) and Salary (Int).
To see the contents of this table on the database we issue a basic SQL select command:
select
*
from
employees
And the results are
Name Dept Salary
----------------------------
Simon MS 80
Erik MS 100
Phil Ed 40
Gordon Ed 45
Paul Yale 60
Polly MS 45
Sue Ed 90
Jill Yale 100
Jackie MS 95
And how we would replicate this in Haskell code?
Start by setting out a couple of type synonyms to make the database fields clearer:
type Name = String
type Dept = String
type Salary = Int
And here is our database table as a Haskell list of tuples:
employees :: [(Name, Dept, Salary)]
employees = [("Simon", "MS", 80)
,("Erik", "MS", 100)
,("Phil", "Ed", 40)
,("Gordon", "Ed", 45)
,("Paul", "Yale", 60)
,("Polly", "MS", 45)
,("Sue", "Ed", 90)
,("Jill", "Yale", 100)
,("Jackie", "MS", 95)]
In Haskell the query to select all the records comes out as a simple list comprehension:
selectAll = [ (name, dept, salary) |
(name, dept, salary) <- employees ]
The result is so:
*Main> selectAll
[("Simon","MS",80),("Erik","MS",100),("Phil","Ed",40),("Gordon","Ed",45),("Paul","Yale",60),("Polly","MS",45),("Sue","Ed",90),("Jill","Yale",100),("Jackie","MS",95)]
So far so normal. For our first enhancement we want to be able to specify an
order for the output records. For example let us see the records in order of increasing salary. This is simple enough with an "order by" clause in SQL:
select
name, salary
from
employees
order by salary
name salary
Phil 40
Gordon 45
Polly 45
Paul 60
Simon 80
Sue 90
Jackie 95
Jill 100
Erik 100
To do the same in a list comprehension we add an extra use for the keyword
then, which
here lets us introduce the function
sortWith into the expression:
selectWithOrder = [ (name, salary) |
(name, dept, salary) <- employees,
then sortWith by salary ]
The
sortWith function sorts a list of elements using the user supplied function to project something out of each element
*Main> :t sortWith
sortWith :: Ord b => (a -> b) -> [a] -> [a]
So here we are sorting by the salary field of the record:
*Main> selectWithOrder
[("Phil",40),("Gordon",45),("Polly",45),("Paul",60),
("Simon",80),("Sue",90),("Jackie",95),("Erik",100),
("Jill",100)]
But nobody wants to see who is getting the lowest salary. In SQL we add the keyword
desc to see the results sorted into descending order:
select
name, salary
from
employees
order by salary desc
name salary
Erik 100
Jill 100
Jackie 95
Sue 90
Simon 80
Paul 60
Polly 45
Gordon 45
Phil 40
To do the same in our list comprehension we use the newtype
Down like this:
selectWithReverseOrder =
[ (name, salary) |
(name, dept, salary) <- employees,
then sortWith by Down salary ]
Down has the ingenious function of inverting the sort order:
newtype Down a = Down a deriving (Eq, Show, Read)
instance Ord a => Ord (Down a) where
compare (Down x) (Down y) = y `compare` x
So if you compare
Down x with
Down y you get the reverse of what you'd get if you compare
x with
y. So our comprehension now comes up with the top salary first:
*Main> selectWithReverseOrder
[("Erik",100),("Jill",100),("Jackie",95),("Sue",90),
("Simon",80),("Paul",60),("Gordon",45),("Polly",45),("Phil",40)]
We can also create selections with group functions such as sum: for example we need to list the departments and their total salary spends. In SQL we
group by a field and apply functions such as sum():
select
dept, sum(salary) as [total salary]
from
employees
group by
dept
Which looks like this:
dept total salary
Ed 175
MS 320
Yale 160
The corresponding list comprehension is this:
selectByGroup = [ (the dept, sum salary) |
(name, dept, salary) <- employees,
then group by dept using groupWith ]
*Main> selectByGroup
[("Ed",175),("MS",320),("Yale",160)]
There are several new things happening here.
Here we have a
group by clause added to the list comprehension. This does something dodgy to the fields: what we are returning is not individual dept and salary fields but lists of fields, so we add the
the function and the
sum function to reduce these back to a single record.
The function
the is a new one: it checks first that all the elements of the list are identical and then returns just that duplicated element. (It bears no resemblance to the Common Lisp operator of the same name, I checked)
*Main> :t the
the :: Eq a => [a] -> a
The function
groupWith uses the function that you supply, with type (a -> b), to projects out an element (the one of type b) from each list element in your list - - it uses the b-type item to sort the input list and then to form groups by equality on these projected elements.
*Main> :t groupWith
groupWith :: Ord b => (a -> b) -> [a] -> [[a]]
But I just want to see the top one... the department with the largest spend...
In SQL I can opt to see the
top so many of the results, like this:
select top 1
dept, sum(salary) as [total salary]
from
employees
group by
dept
order by sum(salary) desc
Which comes out as:
dept total salary
MS 320
In our newly empowered list comprehensions we can use the then keyword this time to bring in the standard function take, to take just the first record from the result set:
selectMaxGroup = [ (the dept, sum salary) |
(name, dept, salary) <- employees,
then group by dept using groupWith,
then sortWith by Down salary,
then take 1 ]
*Main> selectMaxGroup
[("MS",320)]