What is the main purpose of using CROSS APPLY?
I have read (vaguely, through posts on the Internet) that cross apply can be more efficient when selecting over large data sets if you are partitioning. (Paging comes to mind)
I also know that CROSS APPLY doesn't require a UDF as the right-table.
In most INNER JOIN queries (one-to-many relationships), I could rewrite them to use CROSS APPLY, but they always give me equivalent execution plans. 
Can anyone give me a good example of when CROSS APPLY makes a difference in those cases where INNER JOIN will work as well?
Edit:
Here's a trivial example, where the execution plans are exactly the same.  (Show me one where they differ and where cross apply is faster/more efficient)
create table Company (
    companyId int identity(1,1)
,   companyName varchar(100)
,   zipcode varchar(10) 
,   constraint PK_Company primary key (companyId)
)
GO
create table Person (
    personId int identity(1,1)
,   personName varchar(100)
,   companyId int
,   constraint FK_Person_CompanyId foreign key (companyId) references dbo.Company(companyId)
,   constraint PK_Person primary key (personId)
)
GO
insert Company
select 'ABC Company', '19808' union
select 'XYZ Company', '08534' union
select '123 Company', '10016'
insert Person
select 'Alan', 1 union
select 'Bobby', 1 union
select 'Chris', 1 union
select 'Xavier', 2 union
select 'Yoshi', 2 union
select 'Zambrano', 2 union
select 'Player 1', 3 union
select 'Player 2', 3 union
select 'Player 3', 3 
/* using CROSS APPLY */
select *
from Person p
cross apply (
    select *
    from Company c
    where p.companyid = c.companyId
) Czip
/* the equivalent query using INNER JOIN */
select *
from Person p
inner join Company c on p.companyid = c.companyId
Answer
Can anyone give me a good example of when CROSS APPLY makes a difference in those cases where INNER JOIN will work as well?
See the article in my blog for detailed performance comparison:
CROSS APPLY works better on things that have no simple JOIN condition.
This one selects 3 last records from t2 for each record from t1:
SELECT  t1.*, t2o.*
FROM    t1
CROSS APPLY
        (
        SELECT  TOP 3 *
        FROM    t2
        WHERE   t2.t1_id = t1.id
        ORDER BY
                t2.rank DESC
        ) t2o
It cannot be easily formulated with an INNER JOIN condition.
You could probably do something like that using CTE's and window function:
WITH    t2o AS
        (
        SELECT  t2.*, ROW_NUMBER() OVER (PARTITION BY t1_id ORDER BY rank) AS rn
        FROM    t2
        )
SELECT  t1.*, t2o.*
FROM    t1
INNER JOIN
        t2o
ON      t2o.t1_id = t1.id
        AND t2o.rn <= 3
, but this is less readable and probably less efficient.
Update:
Just checked.
master is a table of about 20,000,000 records with a PRIMARY KEY on id.
This query:
WITH    q AS
        (
        SELECT  *, ROW_NUMBER() OVER (ORDER BY id) AS rn
        FROM    master
        ),
        t AS 
        (
        SELECT  1 AS id
        UNION ALL
        SELECT  2
        )
SELECT  *
FROM    t
JOIN    q
ON      q.rn <= t.id
runs for almost 30 seconds, while this one:
WITH    t AS 
        (
        SELECT  1 AS id
        UNION ALL
        SELECT  2
        )
SELECT  *
FROM    t
CROSS APPLY
        (
        SELECT  TOP (t.id) m.*
        FROM    master m
        ORDER BY
                id
        ) q
is instant.
 
No comments:
Post a Comment