Home Ignoring Duplicate Records SQL
Reply: 4

Ignoring Duplicate Records SQL

squib1996
1#
squib1996 Published in 2017-11-13 14:28:30Z

In need of some help :)

So I have a table of records with the following columns:

Key (PK, FK, int) DT (smalldatetime) Value (real)

The DT is a datetime for every half hour of the day with an associated value

E.g.

Key       DT                       VALUE
1000      2010-01-01 08:00:00      80
1000      2010-01-01 08:30:00      75
1000      2010-01-01 09:00:00      100

I have a Query that finds the max value every 24 hour period and its associated time however, on one day the max value occurs twice and hence duplicates the date which is causing processing issues. I have tried using rownumber() which works but I can't use a calculated column in my where clause? Currently I have:

SELECT       cast(T1.DT as date) as 'Date',Cast(T1.DT as time(0)) as 'HH', ROW_NUMBER() over (PARTITION BY  cast(DT as date) ORDER BY DT) AS 'RowNumber'
FROM        TABLE_1 AS T1
INNER JOIN  (
                SELECT CAST([DT] as date) as 'DATE'
                ,       MAX([VALUE]) as 'MAX_HH'
                FROM    TABLE_1
                WHERE   DT > '6-nov-2016' and [KEY] = '1000'
                GROUP BY CAST([DT] as date)
            ) AS MAX_DT
        ON  MAX_DT.[DATE] = CAST(T1.[DT] as date)
        AND T1.VALUE = MAX_DT.MAX_HH
WHERE       DT > '6-nov-2016' and [KEY] = '1000'
ORDER BY DT

This results in

Key       DT               VALUE       HH
1000      2010-01-01       80          07:00:00
1000      2010-02-01       100         17:30:00
1000      2010-02-01       100         18:00:00

I need to remove the duplicate date (I Have no preference which HH it takes)

I think I've explained that terribly, let me know if it makes no sense and i'll try and re write

Any ideas?

Daniel E.
2#
Daniel E. Reply to 2017-11-13 14:59:19Z

Can you try this the new code is in ** **:

 SELECT       cast(T1.DT as date) as 'Date', ** MIN(Cast(T1.DT as time(0))) as 'HH' **
    FROM        TABLE_1 AS T1
    INNER JOIN  (
                    SELECT CAST([DT] as date) as 'DATE'
                    ,       MAX([VALUE]) as 'MAX_HH'
                FROM    TABLE_1
                WHERE   DT > '6-nov-2016' and [KEY] = '1000'
                GROUP BY CAST([DT] as date)
            ) AS MAX_DT
        ON  MAX_DT.[DATE] = CAST(T1.[DT] as date)
        AND T1.VALUE = MAX_DT.MAX_HH
WHERE       DT > '6-nov-2016' and [KEY] = '1000'

here put the group by

GROUP BY cast(T1.DT as date)
ORDER BY DT
DatNoHand
3#
DatNoHand Reply to 2017-11-13 14:32:14Z

With SQL you can use SELECT DISTINCT,

The SELECT DISTINCT statement is used to return only distinct (different) values.

Inside a table, a column often contains many duplicate values; and sometimes you only want to list the different (distinct) values.

The SELECT DISTINCT statement is used to return only distinct (different) values.

aljassi
4#
aljassi Reply to 2017-11-13 15:45:57Z

i would do something like this i didnt try it but i think it s correct.

SELECT  cast(T1.DT as date) as 'Date',Cast(T1.DT as time(0)) as 'HH', VALUE 
FROM TABLE_1 T1      
       WHERE [DT] IN (       
       --select the max date from Table_1 for each day
            SELECT MAX([DT]) max_date FROM TABLE_1
            WHERE (CAST([DT] as date) ,value) IN 
            (
             SELECT CAST([DT] as date) as 'CAST_DATE'
              ,MAX([VALUE]) as 'MAX_HH'
              FROM    TABLE_1
              WHERE   DT > '6-nov-2016' and [KEY] = '1000'
             GROUP BY CAST([DT] as date
            )group by [DT]
           )
 WHERE       DT > '6-nov-2016' and [KEY] = '1000'
Joel Coehoorn
5#
Joel Coehoorn Reply to 2017-11-13 16:03:20Z

Change the JOIN to an APPLY.

The APPLY operation will allow you to limit the connected relation to just one result for each source relation.

SELECT v.[Key], cast(v.DT As Date) as "Date", v.[Value], cast(v.DT as Time(0)) as "HH"
FROM
(   -- First a projection to get just the exact dates you want
    SELECT DISTINCT [Key], CAST(DT as DATE) as DT 
    FROM Table_1 
    WHERE [Key] = '1000' AMD DT > '20161106'
) dates
CROSS APPLY (
    -- Then use APPLY rather than JOIN to find just the exact one record you need for each date
    SELECT TOP 1 * 
    FROM Table_1 
    WHERE [Key] = dates.[Key] AND cast(DT as DATE) = dates.DT ORDER BY [Value] DESC
) v

A final note: Both this query and your sample query in the question will include values from Nov 6, 2016. The query says > 2016-11-05 with an exlusive inequality, but the original was still comparing using full DateTime values, meaning there is a implied 0 as a time component. So 12:01 AM on Nov 6 is still greater than 12:00:00.001 AM on Nov 6. If you want to exclude all Nov 6 dates from the query, you either need to change this to use a time value at the end of the date, or cast to date before making that > comparison.

You need to login account before you can post.

About| Privacy statement| Terms of Service| Advertising| Contact us| Help| Sitemap|
Processed in 0.327203 second(s) , Gzip On .

© 2016 Powered by mzan.com design MATCHINFO