Selecting unique records based on date of effect, ending on date of discontinue - sql-server

I have an interesting conundrum and I am using SQL Server 2012 or SQL Server 2016 (T-SQL obviously). I have a list of products, each with their own UPC code. These products have a discontinue date and the UPC code gets recycled to a new product after the discontinue date. So let's say I have the following in the Item_UPCs table:
Item Key | Item Desc | UPC | UPC Discontinue Date
123456 | Shovel | 0009595959 | 2018-04-01
123456 | Shovel | 0007878787 | NULL
234567 | Rake | 0009595959 | NULL
As you can see, I have a UPC that gets recycled to a new product. Unfortunately, I don't have an effective date for the item UPC table, but I do in an items table for when an item was added to the system. But let's ignore that.
Here's what I want to do:
For every inventory record up to the discontinue date, show the unique UPC associated with that date. An inventory record consists of the "Inventory Date", the "Purchase Cost", the "Purchase Quantity", the "Item Description", and the "Item UPC".
Once the discontinue date is over with (e.g.: it's the next day), start showing only the UPC that is in effect.
Make sure that no duplicate data exists and the UPCs are truly being "attached" to each row per whatever the date is in the query.
Here is an example of the inventory details table:
Inv_Key | Trans_Date | Item_Key | Purch_Qty | Purch_Cost
123 | 2018-05-12 | 123456 | 12.00 | 24.00
108 | 2018-03-22 | 123456 | 8.00 | 16.00
167 | 2018-07-03 | 234567 | 12.00 | 12.00
An example query:
SELECT DISTINCT
s.SiteID
,id.Item_Key
,iu.Item_Desc
,iu.Item_Department
,iu.Item_Category
,iu.Item_Subcategory
,iu.UPC
,iu.UPC_Discontinue_Date
,id.Trans_Date
,id.Purch_Cost
,id.Purch_Qty
FROM Inventory_Details id
INNER JOIN Item_UPCs iu ON iu.Item_Key = id.Item_Key
INNER JOIN Sites s ON s.Site_Key = id.Site_Key
The real query I have is far too long to post here. It has three CTEs and the resultant query. This is simply a mockup. Here is an example result set:
Site_ID | Item_Key | Item_Desc | Item_Department | Item_Category | UPC | UPC_Discontinue Date | Trans_Date | Purch_Cost | Purch_Qty
2457 | 123456 | Shovel | Digging Tools | Shovels | 0009595959 | 2018-04-01 | 2018-03-22 | 16.00 | 8.00
2457 | 123456 | Shovel | Digging Tools | Shovels | 0007878787 | NULL | 2018-03-22 | 16.00 | 8.00
2457 | 234567 | Rakes | Garden Tools | Rakes | 0009595959 | NULL | 2018-07-03 | 12.00 | 12.00
2457 | 123456 | Shovel | Digging Tools | Shovels | 0007878787 | NULL | 2018-05-12 | 24.00 | 12.00
Do any of you know how I can "assign" a UPC to a specific range of dates in my query and then "assign" an updated UPC to the item for every effective date thereafter?
Many thanks!

Given your current Item_UPC table, you can generate effective start dates from the Discontinue Date using the LAG analytic function:
With Effective_UPCs as (
select [Item_Key]
, [Item_Desc]
, [UPC]
, coalesce(lag([UPC_Discontinue_Date])
over (partition by [Item_Key]
order by coalesce( [UPC_Discontinue_Date]
, datefromparts(9999,12,31))
),
lag([UPC_Discontinue_Date])
over (partition by [UPC]
order by coalesce( [UPC_Discontinue_Date]
, datefromparts(9999,12,31))
)) [UPC_Start_Date]
, [UPC_Discontinue_Date]
from Item_UPCs i
)
select * from Effective_UPCs;
Which yields the following Results:
| Item_Key | Item_Desc | UPC | UPC_Start_Date | UPC_Discontinue_Date |
|----------|-----------|------------|----------------|----------------------|
| 123456 | Shovel | 0007878787 | 2018-04-01 | (null) |
| 123456 | Shovel | 0009595959 | (null) | 2018-04-01 |
| 234567 | Rake | 0009595959 | 2018-04-01 | (null) |
This function produces a fully open ended interval where both the start and discontinue dates could be null indicating that it's effective for all time. To use this in your query simply reference the Effective_UPCs CTE in place of the Item_UPCs table and add a couple additional predicates to take the effective dates into consideration:
SELECT DISTINCT
s.SiteID
,id.Item_Key
,iu.Item_Desc
,iu.Item_Department
,iu.Item_Category
,iu.Item_Subcategory
,iu.UPC
,iu.UPC_Discontinue_Date
,id.Trans_Date
,id.Purch_Cost
,id.Purch_Qty
FROM Inventory_Details id
INNER JOIN Effective_UPCs iu
ON iu.Item_Key = id.Item_Key
and (iu.UPC_Start_Date is null or iu.UPC_Start_Date < id.Trans_Date)
and (iu.UPC_Discontinue_Date is null or id.Trans_Date <= iu.UPC_Discontinue_Date)
INNER JOIN Sites s ON s.Site_Key = id.Site_Key
Note that the above query uses a partially open range (UPC_Start_Date < trans_date <= UPC_Discontinue_Date instead of <= for both inequalities) this prevents transactions occurring exactly on the discontinue date from matching both the prior and next Item_Key record. If transactions that occur exactly on the discontinue date should match the new record and not the old simply swap the two inequalities:
and (iu.UPC_Start_Date is null or iu.UPC_Start_Date <= id.Trans_Date)
and (iu.UPC_Discontinue_Date is null or id.Trans_Date < iu.UPC_Discontinue_Date)
instead of
and (iu.UPC_Start_Date is null or iu.UPC_Start_Date < id.Trans_Date)
and (iu.UPC_Discontinue_Date is null or id.Trans_Date <= iu.UPC_Discontinue_Date)

Related

T-SQL Select one row for multiple groups from one table

I have the following table:
NUMBER | DATE | VALUE_1 | VALUE_2
145789 | 2016-10-01 | A | Carrot
145789 | 2016-10-03 | B | Apple
145789 | 2016-10-14 | C | Banana
748596 | 2016-10-07 | Mango | Watermelon
748596 | 2016-10-19 | Pear | Strawberry
748596 | 2016-10-30 | Orange | Avocado
I want to select the first record for each number (the record with the minimum date).
How can I have a result like this?
NUMBER | DATE | VALUE_A | VALUE_B
145789 | 2016-10-01 | A | Carrot
748596 | 2016-10-07 | Mango | Watermelon
Very simple. You need to use row_number() for this, like below. Below we have generated unique numbers(Using Row_number) for each Number group rows based on date. On top of it we have selected only minimum date record (By using where clause ). For More about row_number click here.
SELECT [NUMBER], [DATE], [VALUE_1], [VALUE_2]
FROM
(
SELECT *,ROW_NUMBER() OVER(PARTITION BY NUMBER ORDER BY DATE ASC) RNO
FROM TABLE1)A
WHERE RNO=1

Delete partial dulicate rows - sql

I have some troubles with deleting partial duplicate rows
The structure is like this:
+-----+--------+--+-----------+--+------+
| id | userid | | location | | week |
+-----+--------+--+-----------+--+------+
| 1 | 001 | | amsterdam | | 11 |
| 2 | 001 | | amsterdam | | 23 |
| 3 | 002 | | berlin | | 28 |
| 4 | 002 | | berlin | | 22 |
| 5 | 003 | | paris | | 19 |
| 6 | 003 | | paris | | 35 |
+-----+--------+--+-----------+--+------+
I only need to keep one row from each userid, it doesn't matter which week number it has.
Thanks,
Maxcim
This should work across most databases:
DELETE
FROM yourTable
WHERE id <> (SELECT MIN(id)
FROM yourTable t
WHERE t.userid = userid)
This query would delete from each userid group all records except for the record having the lowest id for that group. I assume that id is a unique column.
This method is tested, try it.
We are getting the number of rows occuring at each record, and then we are deleting only the ones with more than 1 row occruring... keeping the original one.
BEGIN TRANSACTION
SELECT UserID, Location,
RN = ROW_NUMBER()OVER(PARTITION BY UserID, Location ORDER BY UserID, Location)
into #test1
FROM dbo.MyTbl
Delete MyTbl
From MyTbll
INNER JOIN #test1
ON #test1.UserID= MyTbl.UserID
WHERE RN > 1
if ##Error <> 0 GOTO Errlbl
Commit Transaction
RETURN
Errlbl:
RollBack Transaction
GO

Hive Contiguous Date Ranges

I am using Hive and I would like to take a table with a historical list of customers, subscription events, and subscription types and summarize by contiguous runs of subscription types for each customer.
Example Input (db.cust_hist):
customer_id | eff_dt | exp_dt | sub_cd | sub_type
---------------------------------------------------------
1 | 02/01/2015 | 03/01/2015 | active | A
1 | 03/01/2015 | 04/01/2015 | active | A
1 | 03/15/2015 | 12/31/9999 | cancel | A
1 | 04/01/2015 | 05/01/2015 | active | A
1 | 05/01/2015 | 06/01/2015 | active | A
1 | 02/01/2015 | 03/01/2015 | active | B
1 | 03/01/2015 | 04/01/2015 | active | B
The sub_cd in this case refers to the type of event that is effective over the date range for that row. For example, the user canceled their A subscription type on 3/15 and resumed on 4/01.
The output I'm trying to get looks like this (db.cust_snapshot):
customer_id | eff_dt | exp_dt | sub_type
------------------------------------------------
1 | 02/01/2015 | 03/15/2015 | A
1 | 04/01/2015 | 06/01/2015 | A
1 | 02/01/2015 | 04/01/2015 | B
and reflects the gap in coverage.
From what I have read in this link from BetterAtOracle (specific to SQL) which does a very good job of laying things out, I need to use row numbers and a lagging window, but I can't seem to apply it to my situation in Hive (perhaps because of the 12/31/9999 notation/subscription code?)
I tried:
SELECT customer_id
, eff_dt
, exp_dt
, sub_cd
, sub_type
, CASE WHEN DATEDIFF(TO_DATE(eff_dt), TO_DATE(lag(exp_dt) OVER (PARTITION BY customer_id, sub_type ORDER BY eff_dt)) <=1 THEN NULL
ELSE row_number() OVER(PARTITION BY customer_id, sub_type ORDER BY eff_dt)
END) as grp
FROM db.cust_hist
ORDER BY TO_DATE(eff_dt)
As you can see, I haven't applied the subscription event code. This sort of gets me there as I can start to see different groups based on subscription type, but I feel like I'm stuck from here on out.
Any help or pointers would be greatly appreciated. Before this task, I never understood the true power of ranks, rows, lag, and other window functions!

Add columns to query based on previous queries [duplicate]

This question already has an answer here:
SQL Server: Examples of PIVOTing String data
8 answers
I have two tables:
+-----------+
| Customer |
+-----------+
| ID | Name |
+----+------+
| 1 | Jack |
+----+------+
| 2 | John |
+----+------+
+----------------------------------------+
| Bill |
+----------------------------------------+
| ID | Customer_ID | date | amount |
+----+-------------+------------+--------+
| 1 | 1 | 01.01.2015 | 10$ |
+----+-------------+------------+--------+
| 2 | 1 | 01.01.2014 | 20$ |
+----+-------------+------------+--------+
| 3 | 2 | 01.01.2015 | 5$ |
+----+-------------+------------+--------+
| 4 | 2 | 01.02.2015 | 50$ |
+----+-------------+------------+--------+
| 5 | 2 | 01.01.2014 | 15$ |
+----+-------------+------------+--------+
I need to know the sum of all the bills a customer got in a year.
That's pretty easy:
SELECT
SUM(Bill.amount), Customer.Name
FROM
Customer
INNER JOIN
Bill ON Customer.ID = Bill.Customer_ID
WHERE
Bill.date BETWEEN #20150101# AND #20151231#
GROUP BY
Customer.Name
The difficult part is that i need to display the results of that query for multiple years in a single table like this:
+-------------------------------------------+
| sales to customer |
+-------------------------------------------+
| Customer_ID | Customer_Name | 2015 | 2014 |
+-------------+---------------+------+------+
| 1 | jack | 10$ | 20$ |
+-------------+---------------+------+------+
| 2 | john | 55$ | 20$ |
+-------------+---------------+------+------+
I'm using SQL Server 2005.
I'm very grateful for every answer.
sincerly
Andahari
As stated you need to use a PIVOT in order to achieve the results you are looking for, like this:
Select Customer_ID, Customer_Name, [2015], [2014]
from
(select Customer_ID, Name Customer_Name, YEAR(_date) Yr, amount
from Bill b
inner join Customer c on c.ID = b.Customer_ID
) as src
PIVOT
(SUM(amount) for Yr in ([2015],[2014])) as pvt
Use a case to only sum values corresponding to your time period. Example:
SELECT sum(case when Bill.date BETWEEN #20150101# AND #20151231# then Bill.amount else 0 end) as 2015,
Customer.Name
FROM Customer INNER JOIN Bill ON Customer.ID = Bill.Customer_ID
GROUP BY Customer.Name

SQL Server - tricky query to update contact id using ROW_NUMBER to reference index value

A previous developer used an index rather than the actual contactID to reference which of the associated contacts are the primary contact. The index works well when the app gets the contacts and sets the primary contact in the list on the page, but try joining for a report! Not easy; so I want to update the main table with the actual contact ID to make for a simple join and to avoid this buggary.
In this particular case, I need to update tblInquiry with the claimantContactID and agentContactID. Those two fields I just created and defaulted to 0. However, the challenge is to use the claimantContactIndex and agentContactIndex values from tblInquiry, to get the respective nth row from tblContacts. The index is 0 based, so if the index value is 2, then get the ID of the 3rd contact, for example.
Also, claimantContactIndex and agentContactIndex can either be NULL or some number. If NULL, then assume the first contact (index 0).
I will also add that the contacts index cannot have an order by on it because the application relies upon the natural order when getting the contacts list (there is no order by in the stored procedure), and selects then the index accordingly.
DB Platform: SQL Server 2008 R2 Express Edition.
I have the following table structure:
tblInquiry
id | claimantID | agentID | claimantContactIndex | agentContactIndex | claimantContactID | agentContactID
--------------------------------
1 | 1001 | 2001 | 2 | 0 | 0 | 0
2 | 1002 | NULL | 0 | NULL | 0 | 0
tblClaimant
id | name | address | phone | email
--------------------------------
1001 | Widgets Inc. | 123 W. Main | 5550000 | widgets#here.com
1002 | Thingies LLC. | 456 W. Main | 5551111 | thingies#here.com
tblAgent
id | name | address | phone | email
--------------------------------
2001 | Simon Bros. | 789 W. Main | 5552222 | simon#here.com
tblContacts
id | claimantID | agentID | fn | ln | phone | email
--------------------------------
3001 | 1001 | NULL | John | Doe | 5553333 | john#here.com
3002 | 1001 | NULL | Fred | Flynn | 5554444 | fred#here.com
3003 | 1001 | NULL | Mike | Brown | 55555555 | mike#here.com
3004 | 1001 | NULL | Susan | Pierce | 5556666 | susan#here.com
3005 | NULL | 2001 | Jeff | Bridges | 5557777 | jeff#here.com
3006 | NULL | 2001 | Karry | Sinclair | 5558888 | Karry#here.com
3007 | NULL | 2001 | Steve | Green | 5559999 | steve#here.com
3008 | NULL | 2001 | Peter | White | 5550001 | peter#here.com
Update:
I have worked out the select part of this solution and I can now get the correct claimant contact info using ROW_NUMBER() and a JOIN. I will add more to get correct agent contact info. I also handled the case where an index is NULL. And ultimately I will work this out to update the inquiry table now that I have the right contactID.
SELECT
i.id inquiryID, i.claimantContactIndex, i.agentContactIndex, i.claimantContactID, i.agentContactID
,r.id contactID, r.claimantID, r.agentID
,r.*
FROM
(
SELECT ROW_NUMBER()
OVER (Partition by con.claimantid Order by (SELECT NULL)) AS RowNumber, *
FROM tblContacts con
) r
INNER JOIN
tblInquiry i on i.claimantid = r.claimantid and ((isnull(i.claimantContactIndex, 0) + 1 = r.RowNumber ))
WHERE
i.id in (1, 2, 3, 4, 5)
ORDER BY
i.id
You could do something like:
Using ideas from here:
https://msdn.microsoft.com/en-us/library/ms186734.aspx
SELECT
ROW_NUMBER() OVER (Order by Id) AS RowNumber,
claimantID, agentID, (etc...)
FROM
tblContacts
To get an index based resultset. I'd drop it into a temp table and select from that where RowNumber = Whatever index you want.
This issue was resolved by doing the following:
As I posted above, using ROW_NUMBER() and (SELECT NULL()) along with an isnull to handle null values to get the correct contacts.
I selected the results into a temp table.
I then updated the inquiry table by joining it to the temp table.
dropped temp table
I had to do this in two passes, once for claimants, a second time for agents.
Thx #EricH for pointing me in the right direction.

Resources