Home mysql query efficiency pulling distinct records from similar groups
Reply: 1

mysql query efficiency pulling distinct records from similar groups

Brian Powell
1#
Brian Powell Published in 2018-01-12 21:08:29Z

I'm building a table that shows instances where more than 1 record contains the same btc, but for different customer_names, also showing me the lowest instance of cost for each customer.

This query works - but it's very inefficient and it takes over a minute to run on an 80,000 row table - so I feel I must be doing something wrong.

select btc,customer_name,min(cost) from table where table.btc in
 (select btc from table group by 1 having count(distinct customer_name) > 1) 
 group by 1,2

This outputs a table like this:

+---------+---------------+---------+
|   btc   | customer_name |  cost   |
+---------+---------------+---------+
| asd32   | Sony          | 1.45863 |
| asd32   | Nintendo      | 1.84839 |
| bf33940 | Sony          | 2.49188 |
| bf33940 | Nintendo      | 2.49188 |
| a43c3f  | Sony          | 2.84142 |
| a43c3f  | Nintendo      | 2.45    |
| a43c3f  | Sega          | 2.689   |
+---------+---------------+---------+

I'd like to take this a step further and NOT include any result where the cost for both customer_name fields is the same, (so - removing btc bf33940 from the above table since Sony and Nintendo have the same cost)

I'd also like to know if there is a more efficient way to do what I'm doing.

Table Structure

+------------------+--------------+------+-----+---------+
|      field       |     type     | null | key | default |
+------------------+--------------+------+-----+---------+
| btc              | varchar(100) | NO   | MUL | NULL    |
| mpn              | varchar(100) | YES  |     | NULL    |
| supplier         | varchar(100) | YES  |     | NULL    |
| invoice          | varchar(100) | YES  |     | NULL    |
| invoice_date     | datetime     | YES  |     | NULL    |
| qtr              | varchar(5)   | YES  |     | NULL    |
| qty              | double(10,0) | YES  |     | NULL    |
| resale           | double(15,5) | YES  |     | NULL    |
| ext_resale       | double(15,5) | YES  |     | NULL    |
| cost             | double(15,5) | YES  |     | NULL    |
| ext_cost         | double(15,5) | YES  |     | NULL    |
| gpp              | double(15,5) | YES  |     | NULL    |
| project          | varchar(100) | YES  |     | NULL    |
| team             | double(15,5) | YES  |     | NULL    |
| build_type       | varchar(50)  | YES  |     | NULL    |
| customer_name    | varchar(100) | YES  |     | NULL    |
| customer_address | varchar(100) | YES  |     | NULL    |
| customer_type    | varchar(100) | YES  |     | NULL    |
| customer_group   | varchar(100) | YES  |     | NULL    |
| sps              | varchar(100) | YES  |     | NULL    |
| fps              | varchar(100) | YES  |     | NULL    |
| gps              | varchar(100) | YES  |     | NULL    |
| hps              | varchar(100) | YES  |     | NULL    |
+------------------+--------------+------+-----+---------+

sample CSV file here: https://ufile.io/os0as

rlanvin
2#
rlanvin Reply to 2018-01-12 22:04:59Z

You could try to replace your where...in by a join though it's hard to say how much more efficient it'll be without testing.

Something like this:

select t1.btc, customer_name, min(cost)
from xxx t1
join (
  select btc
  from xxx
  group by btc
  having count(*) > 1
) t2 on t1.btc = t2.btc
group by t1.btc, t1.customer_name

For your second question, you can further group by btc and cost to remove duplicates:

select t3.btc, group_concat(t3.customer_name), min_cost
from (
   select t1.btc, t1.customer_name, min(cost) as min_cost
   from xxx t1
   join (
      select btc
      from xxx
      group by btc
      having count(distinct customer_name) > 1
   ) t2 on t1.btc = t2.btc
) t3
group by t1.btc, t1.cost

Again, hard to say if that's going to work without testing, but hopefully you get the idea.

To make it faster I would make a separate table with each btc and a counter of how many customers have it, so you don't need to create a temp table with having count() > 1.

You need to login account before you can post.

About| Privacy statement| Terms of Service| Advertising| Contact us| Help| Sitemap|
Processed in 0.322719 second(s) , Gzip On .

© 2016 Powered by mzan.com design MATCHINFO