(Simplest solution is at the end, which also **returns #2 alone** ....)

I have seen many tricks in SELECT queries (Most using joins - Fine.), but now if I must name one keyword to be the most powerful yet unappreciated keyword, it has to be

1) CASE.

2) The combined power of case and group-by is even more impressive.

3) Yet more unthinkable is the combination of case/group/self-join.

4) correlated subquery (slow?) in SELECT, FROM, WHERE and HAVING. http://www.informit.com/articles/article.aspx?p=26856 explains "In addition to subqueries in the WHERE and HAVING clauses, the ANSI standard allows a subquery in the FROM clause, and some relational database management systems (RDBMSs) permit a subquery in the SELECT clause."

Key to understanding case_group_self-join is the intermediate work table. Mind you, intermediate table is the key to _every_ tricky self-join. P30 [[ transact-sql cookbook]] among many chapters, has a wonderful solution to find the top 5 values in a table without a 5-way self-join. Based on that, here's a simplified but full example, showing several distinct solutions. These solutions can be adapted to find 5th largest value in a table, too.

--- setting up the data ----

drop table public.students

create table public.students(

student varchar(44),

score decimal(4,1),

primary key (student)

)

insert into students values( 'Andrew', 15.6)

insert into students values( 'Becky', 13)

insert into students values( 'Chuck', 12.2)

insert into students values( 'Dan', 25.6)

insert into students values( 'Eric', 15.6)

insert into students values( 'Fred', 5.6)

insert into students values( 'Greg', 5.6)

select * from students

Solution 1

----- top 3 scores

-- intermediate table

select * from students h right join students L

on h.score > L.score

select L.student, 'is lower than', count(h.student), 'competitors'

from students h right join students L

on h.score > L.score

group by L.student

having count(L.student) < 3

----- lowest 4 scores

-- intermediate table

select * from students h left join students L

on h.score > L.score

select h.student, 'defeats', count(L.student), 'competitors'

from students h left join students L

on h.score > L.score

group by h.student

having count(L.student) < 4

---- Solution 2, using case

-- intermediate table

select *, (case when h.score > L.score then 1 else 0 end)

from students h left join -- inner join ok since cartesian

students L on 1=1

select h.student, 'defeats', sum(case when h.score > L.score then 1 else 0 end), 'competitors'

from students h left join -- inner join ok since cartesian

students L on 1=1

group by h.student

having sum(case when h.score > L.score then 1 else 0 end) < 4

--same solution tested on http://sqlzoo.net/howto/source/z.dir/tip915069/sqlserver

select h.name, h.area, 'is smaller than', sum(case when h.area < L.area then 1 else 0 end), 'countries'

from cia h, cia L

group by h.name, h.area

having sum(case when h.area < L.area then 1 else 0 end) < 4

----- (concise) Solution 3, using correlated sub-select, P295 [[sql hacks]] -- without

select * from students o where (select count(1) from students where score < o.score) < 4

--Same solution tested in http://sqlzoo.net/howto/source/z.dir/tip915069/sqlserver:

select * from cia o where (select count(1) from cia where area > o.area) < 4 -- shows top 4

...and (select count(1) from cia where area > o.area) > =3 -- returns #4 alone

...and (select count(1) from cia where area > o.area) > =2 -- returns #4 #3 exactly

--This technique shows its power when you want top 2 in each continent, without

select * from cia o where (select count(1) from cia where area > o.area and region=o.region) < 2

order by o.region, o.area

I feel for a student/practitioner, it pays to think in terms of CASE. This strikingly simple solution can be rewritten using (messy) CASE.

## No comments:

Post a Comment