This might be simple, but no matter how I try, I cannot get it working.
I want to select a sample, say 3 rows, of a table: SELECT TOP 3 * FROM table - that was easy.
Now say this table has a criteria column, an integer, with around 50 unique values. The whole table has 3 mill rows, so each criteria is seen between "a lot" and maybe only once for some others - but that should not be the issue here.
I'd like to get 3 sample rows FOR EACH value in the criteria column - or only one or two if only so many matches exist.
How could this be done? The table does have a unique primary key, int, if this is needed for the operation. Any ideas?
To select random samples take advantage of (1) ROW_NUMBER and (2) NEWID() function; for example:
declare @.example table
( rid integer,
criteria tinyint,
sampleValue smallint,
filler char(200)
)insert into @.example
select iter,
1 + 49.9999 * dbo.rand(),
1200 * dbo.rand(),
'Record # ' + convert(varchar(6), iter)
from small_iterator (nolock)select criteria,
rid,
seq,
sampleValue
from ( select criteria,
rid,
row_number ()
over ( partition by criteria
order by newid()
)
as seq,
sampleValue
from @.example
) x
where seq <= 3
order by criteria, seq-- criteria rid seq sampleValue
-- -- -- -- --
-- 1 2242 1 197
-- 1 10438 2 204
-- 1 9782 3 881
-- ...
-- 50 15673 2 455
-- 50 5825 3 482
The SMALL_ITERATOR and DBO.RAND() functions used to create the sample data can be found here:
http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=1330536&SiteID=1
No comments:
Post a Comment