mẹo hay Top List Top Left join one

Ms access kết hợp select top 1 join distinct năm 2024

I’ve quietly resolved performance issues by re-writing slow queries to avoid DISTINCT. Often, the DISTINCT is there only to serve as a “join-fixer,” and I can explain what that means using an example.

Let’s say we have the following grossly simplified schema, representing customers, products, and product categories:

CREATE TABLE dbo.Customers

(

CustomerID int NOT NULL,

Name nvarchar(255) NOT NULL,

CONSTRAINT PK_Customers PRIMARY KEY (CustomerID)

);

CREATE TABLE dbo.Categories

(

CategoryID int NOT NULL,

Name nvarchar(255) NOT NULL,

CONSTRAINT PK_Categories PRIMARY KEY (CategoryID),

CONSTRAINT UQ_Categories UNIQUE (Name)

);

CREATE TABLE dbo.Products

(

ProductID int NOT NULL,

CategoryID int NOT NULL,

Name nvarchar(255) NOT NULL,

CONSTRAINT PK_Products PRIMARY KEY (ProductID)

);

And then we have tables for orders and order details:

CREATE TABLE dbo.Orders

(

OrderID int NOT NULL,

CustomerID int NOT NULL,

OrderDate date,

OrderTotal decimal(12,2),

CONSTRAINT PK_Orders PRIMARY KEY (OrderID)

);

CREATE TABLE dbo.OrderDetails

(

OrderID int NOT NULL,

LineItemID int NOT NULL,

ProductID int NOT NULL,

Quantity int NOT NULL,

CONSTRAINT PK_OrderDetails PRIMARY KEY (OrderID, LineItemID),

INDEX IX_OrderDetails_OrderID_ProductID (OrderID, ProductID)

);

And some sample data:

INSERT dbo.Customers (CustomerID, Name)

VALUES (1,N'Aaron'), (2,N'Bob');

INSERT dbo.Categories (CategoryID, Name)

VALUES(1,N'Beauty'), (2,N'Grocery');

INSERT dbo.Products (ProductID, CategoryID, Name)

VALUES (1,1,N'Lipstick'), (2,1,N'Mascara'), (3,2,N'Strawberries');

INSERT dbo.Orders (OrderID, CustomerID, OrderDate, OrderTotal)

VALUES (1,1,getdate(),32.50), (2,2,getdate(),47.05);

INSERT dbo.OrderDetails (OrderID, LineItemID, ProductID, Quantity)

VALUES (1,1,1,5), (2,1,3,10);

Marketing says we want to send an e-mail or give a discount code to all the customers who have ordered a product from the beauty category. The initial attempt at a query for this might be something like this:

SELECT c.CustomerID, c.Name

FROM dbo.Customers AS c

INNER JOIN dbo.Orders AS o

ON c.CustomerID \= o.CustomerID

INNER JOIN dbo.OrderDetails AS od

ON o.OrderID \= od.OrderID

INNER JOIN dbo.Products AS p

ON od.ProductID \= p.ProductID

INNER JOIN dbo.Categories AS cat

ON p.CategoryID \= cat.CategoryID

WHERE cat.Name \= N'Beauty';

The plan doesn’t look so bad (yet):

And in local or test data, the output might look right, since we may have inserted a single row into OrderDetails to match our criteria (and to make our tests pass). But what if I have ordered two products from the beauty category (in the same order, or across multiple orders)?

INSERT dbo.OrderDetails (OrderID, LineItemID, ProductID, Quantity)

VALUES(1,2,2,1);

Now the query returns that customer twice! We certainly don’t want to send them two e-mails, or issue multiple discount codes to the same customer. And the plan, on its own, can’t really provide any obvious clues that there are duplicate rows:

But you sure will notice if you inspect the results, or an end user will notice if you unleash this in production. The quick fix tends to be: slap a big ol’ DISTINCT on there which, indeed, fixes the symptom by eliminating duplicates:

SELECT DISTINCT c.CustomerID, c.Name

---

FROM dbo.Customers AS c

INNER JOIN dbo.Orders AS o

ON c.CustomerID \= o.CustomerID

INNER JOIN dbo.OrderDetails AS od

ON o.OrderID \= od.OrderID

INNER JOIN dbo.Products AS p

ON od.ProductID \= p.ProductID

INNER JOIN dbo.Categories AS cat

ON p.CategoryID \= cat.CategoryID

WHERE cat.Name \= N'Beauty';

But at what cost? A distinct sort, that’s what!

If I’m testing changes to this query in my local environment, and maybe just testing the output and that it returned the data quickly, I might miss clues in the plan and be pretty satisfied that adding DISTINCT fixed the issue without impacting performance.

This will only get worse with more data.

And while we could spend a lot of time tuning indexes on all the involved tables to make that sort hurt less, this multi-table join is always going to produce rows you never ultimately need. Think about SQL Server’s job: yes, it needs to return correct results, but it also should do that in the most efficient way possible. Reading all the data (and then sorting it), only to throw away some or most of it, is very wasteful.

Can we express the query without DISTINCT?

When I know I need to “join” to tables but only care about existence of rows and not any of the output from those tables, I turn to EXISTS. I also try to eliminate looking up values that I know are going to be the same on every row. In this case, I don’t need to join to Categories every time if CategoryID is effectively a constant.

One way to express this same query, ensuring no duplicate customers and, hopefully, reducing the cost of sorting:

DECLARE @CategoryID int;

SELECT @CategoryID \= CategoryID

FROM dbo.Categories WHERE Name \= N'Beauty';

SELECT c.CustomerID, c.Name

FROM dbo.Customers AS c

WHERE EXISTS

(

SELECT 1 FROM dbo.OrderDetails AS od

INNER JOIN dbo.Orders AS o

ON od.OrderID \= o.OrderID

INNER JOIN dbo.Products AS p

ON od.ProductID \= p.ProductID

WHERE o.CustomerID \= c.CustomerID

AND p.CategoryID \= @CategoryID

);

There’s a simple, additional index seek against Categories, of course, but the plan for the overall query has been made drastically more efficient (we’re down to 2 scans and 2 seeks)

Another way to express the same query is to force Orders to be scanned later:

DECLARE @CategoryID int;

SELECT @CategoryID \= CategoryID

FROM dbo.Categories WHERE Name \= N'Beauty';

SELECT c.CustomerID, c.Name

FROM dbo.Customers AS c

WHERE EXISTS

(

SELECT 1

FROM dbo.Orders AS o

WHERE o.CustomerID \= c.CustomerID

AND EXISTS

(

SELECT 1 FROM dbo.OrderDetails AS od

INNER JOIN dbo.Products AS p

ON od.ProductID \= p.ProductID

WHERE od.OrderID \= o.OrderID

AND p.CategoryID \= @CategoryID

)

);

This can be beneficial if you have more Orders than DISTINCT`1 (I certainly hope that’s the case). Notice in the plan that `Orders is scanned later, hopefully after many irrelevant orders have been filtered out.

Conclusion

DISTINCT is often hiding flaws in the underlying logic, and it can really pay off to explore other ways to write your queries without it. There was another interesting use case I wrote about a few years ago that showed how changing DISTINCT to `DISTINCT`5 – even though it carries the same semantics and produces the same results – can help SQL Server filter out duplicates earlier and have a serious impact on performance.

Ms access kết hợp select top 1 join distinct năm 2024

This will only get worse with more data.

Can we express the query without DISTINCT?

Conclusion

Bài Viết Liên Quan

Bài tập hình học không gian lớp 9 toanmath năm 2024

Bác hồ với nhân dân thanh hóa mp3 năm 2024

Lỗi wifi trên win 10 cant connect to this network năm 2024

Cách viết lại so sánh câu kép năm 2024

Bài tập phương trình bậc 2 có đáp án năm 2024

Bài 32 sgk toán 7 tập 2 trng 40 năm 2024

Lấy dữ liệu từ mysql ra php bị lỗi font năm 2024

Chất nào không phải là oxi hóa khử năm 2024

Số nhà 121 chợ đầu mối đông hưng thanh hóa năm 2024

Địa chỉ văn phòng nhà xe văn minh năm 2024

Toplist

Top 10 thuyết trình về tái chế chai nhựa

Top 9 app đếm giờ học trên máy tính

Top 8 phim đam mỹ h

Top 8 bài hát về nước cho trẻ mầm non

Top 8 phần mềm chiếu màn hình điện thoại lên tường

Top 9 viết đoạn văn bằng tiếng trung về du lịch 2022

Top 10 bảng giá hình xăm mini

Top 9 hình xăm hồ ly 9 đuôi hợp với tuổi nào

Top 8 bảng chữ cái tiếng nhật hiragana và katakana 2022

Bài mới nhất

Khac phục lỗi bị ngược khi chụp ảnh năm 2024

Công thức quan trọng trong hóa học thcs năm 2024

Bài tập về hàm số bậc nhất lơ p 9 năm 2024

Bình trung tam hải núi thành quảng nam năm 2024

Bài tập tốt cho bà bầu 3 thang cuoi năm 2024

Bài văn tả em gái của em lớp 5 năm 2024

Lỗi pin nhanh hết xạc nhanh đầy trên iphone năm 2024

Lỗi chạy quá tốc độ xe ô tô năm 2024

Top tien dao tre hay nhat hien nay fo4 năm 2024

Bảng giá cước vận chuyển hàng hóa của bưu điện năm 2024

Chủ đề