Home Efficiently selecting rows where a row in a related table exists
Reply: 0

Efficiently selecting rows where a row in a related table exists

user15465
1#
user15465 Published in April 20, 2018, 10:48 am

I've got a recurring pattern in a system that I'm currently working on, where, for example, I need to select all the users that have orders under a list of possible companies. Or needing to select users if a record of this user being flagged exists.

My users table contains 430,825 records, so this shouldn't really be that difficult to deal with. Right now I'm close, I have a query that get's that .047s execution time that I look for, but if I add one more piece to this, it gets real slow.

Here's my current query, the fast one:

select`UserID`
from`users`
where(`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
    or`UserID`in(select*
        from(select`UserID`
            from`invoices`
            where`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
            and`__Active`=1)`a`)
    or`UserID`in(select*
        from(select`UserID`
            from`quoterequests`
            where`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
            and`__Active`=1)`a`))
and(`UserID`in(select*
        from(select`UserID`
            from`userassociations`
            where`_Email`='brian@yeet.com'
            and`__Active`=1)`a`))
and(`UserID`in(select*
        from(select`UserID`
            from`usercustomerflags`
            where`CustomerFlagID`in(10,27,17,1,2,3,4,5,6)
            and`__Active`=1)`a`)
    or not exists(select 1 
        from`usercustomerflags`
        where`__Active`=1 
        and`users`.`UserID`=`UserID`))
and`Deleted`=0 
order by`DateTimeAdded`desc 
limit 50;

(The extra select*from(...) is because of this https://stackoverflow.com/a/1434712/728236)

In the middle there, I'm pulling users by further by email address, while checking other related tables for emails that might be related to this user. Like, the next piece searches users including their CC addresses when quotes are sent out to customers.

select`UserID`
from`users`
where(`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
    or`UserID`in(select*
        from(select`UserID`
            from`invoices`
            where`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
            and`__Active`=1)`a`)
    or`UserID`in(select*
        from(select`UserID`
            from`quoterequests`
            where`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
            and`__Active`=1)`a`))
and(`UserID`in(select*
        from(select`UserID`
            from`userassociations`
            where`_Email`='brian@yeet.com'
            and`__Active`=1)`a`)
    or`UserID`in(select*
        from(select`UserID`
            from`userquotesemails`
            where`Email`='brian@yeet.com'
            and`__Active`=1)`a`))
and(`UserID`in(select*
        from(select`UserID`
            from`usercustomerflags`
            where`CustomerFlagID`in(10,27,17,1,2,3,4,5,6)
            and`__Active`=1)`a`)
    or not exists(select 1 
        from`usercustomerflags`
        where`__Active`=1 
        and`users`.`UserID`=`UserID`))
and`Deleted`=0 
order by`DateTimeAdded`desc 
limit 50;

I've added the alternate table to search for emails, but now the query takes 3.016s, which is way, way slower. It seems odd that, as I was building this query, that last part seemed to be the tipping point in performance here, what would be the cause of this?

The first and second explains, respectively

+----+--------------------+-------------------+--+----------------+---------------------------------------------------------------------------------------------+------------------------------+------+-----------------------+---+-------+---------------------------------+
|  1 | PRIMARY            | <subquery6>       |  | ALL            |                                                                                             |                              |      |                       |   | 0.00  | Using temporary; Using filesort |
|  1 | PRIMARY            | users             |  | eq_ref         | PRIMARY,UserID_UNIQUE,fk_users_1_idx,users_Customers                                        | PRIMARY                      |  144 | <subquery6>.UserID    | 1 | 50.00 | Using where                     |
|  6 | MATERIALIZED       | userassociations  |  | ref            | userassociations_UserID,userassociations__Email                                             | userassociations__Email      | 1026 | const                 | 3 | 10.00 | Using where                     |
| 10 | DEPENDENT SUBQUERY | usercustomerflags |  | ref            | usercustomerflags_UserID_idx                                                                | usercustomerflags_UserID_idx |  144 | sterling.users.UserID | 1 | 10.00 | Using where                     |
|  8 | DEPENDENT SUBQUERY | usercustomerflags |  | index_subquery | usercustomerflags_CustomerFlagID_idx,usercustomerflags_UserID_idx                           | usercustomerflags_UserID_idx |  144 | func                  | 1 | 4.95  | Using where                     |
|  4 | DEPENDENT SUBQUERY | quoterequests     |  | index_subquery | quoterequests_CompanyID,quoterequests_UserID,quoterequests__Latest,quoterequests_UserQuotes | quoterequests__Latest        |  145 | func                  | 2 | 5.00  | Using where                     |
|  2 | DEPENDENT SUBQUERY | invoices          |  | index_subquery | Invoice_UserID_idx,Invoice_CompanyID_idx,invoices_SampleRequests,invoices_LateOrdersBubble  | Invoice_UserID_idx           |  145 | func                  | 1 | 3.33  | Using where                     |
+----+--------------------+-------------------+--+----------------+---------------------------------------------------------------------------------------------+------------------------------+------+-----------------------+---+-------+---------------------------------+

+----+--------------------+-------------------+--+-----+---------------------------------------------------------------------------------------------+--------------------------------+------+-----------------------+--------+--------+-------------+
|  1 | PRIMARY            | users             |  | ref | fk_users_1_idx,users_Customers                                                              | users_Customers                |    4 | const                 | 227515 | 100.00 | Using where |
| 12 | DEPENDENT SUBQUERY | usercustomerflags |  | ref | usercustomerflags_UserID_idx                                                                | usercustomerflags_UserID_idx   |  144 | sterling.users.UserID |      1 | 10.00  | Using where |
| 10 | SUBQUERY           | usercustomerflags |  | ALL | usercustomerflags_CustomerFlagID_idx,usercustomerflags_UserID_idx                           |                                |      |                       |   3509 | 4.94   | Using where |
|  8 | SUBQUERY           | userquotesemails  |  | ref | userquotesemails_Email__Active,userquotesemails_UserID                                      | userquotesemails_Email__Active | 1027 | const,const           |      1 | 100.00 |             |
|  6 | SUBQUERY           | userassociations  |  | ref | userassociations_UserID,userassociations__Email                                             | userassociations__Email        | 1026 | const                 |      3 | 10.00  | Using where |
|  4 | SUBQUERY           | quoterequests     |  | ref | quoterequests_CompanyID,quoterequests_UserID,quoterequests__Latest,quoterequests_UserQuotes | quoterequests_CompanyID        |  144 | const                 |  16702 | 10.00  | Using where |
|  2 | SUBQUERY           | invoices          |  | ref | Invoice_UserID_idx,Invoice_CompanyID_idx,invoices_SampleRequests,invoices_LateOrdersBubble  | Invoice_CompanyID_idx          |  144 | const                 |  17678 | 10.00  | Using where |
+----+--------------------+-------------------+--+-----+---------------------------------------------------------------------------------------------+--------------------------------+------+-----------------------+--------+--------+-------------+

Also, I've tried using joins, e.g. joining the invoices table, etc. but then I get the issue of having duplicate user rows per each invoice or quoterequest the joins receive, and grouping/distinct & ordering the resulting data became immensely slow, in the minutes.

I've also tried just the "exists" version of the first query, as suggesting by the doc https://dev.mysql.com/doc/refman/5.7/en/subquery-optimization-with-exists.html like so

select`UserID`
from`users`
where(`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
    or exists(select 1 
        from`invoices`
        where`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
        and`__Active`=1 
        and`users`.`UserID`=`UserID`)
    or exists(select 1 
        from`quoterequests`
        where`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
        and`__Active`=1 
        and`users`.`UserID`=`UserID`))
and(exists(select 1 
        from`userassociations`
        where`_Email`='brian@yeet.com'
        and`__Active`=1 
        and`users`.`UserID`=`UserID`))
and(exists(select 1 
        from`usercustomerflags`
        where`CustomerFlagID`in(10,27,17,1,2,3,4,5,6)
        and`__Active`=1 
        and`users`.`UserID`=`UserID`)
    or not exists(select 1 
        from`usercustomerflags`
        where`__Active`=1 
        and`users`.`UserID`=`UserID`))
and`Deleted`=0 
order by`DateTimeAdded`desc 
limit 50;

But that gets me up to 5.516s, so that definitely doesn't seem like the right direction to make.

What is the most efficient way to select data the way that I'm trying? Or do I need to restructure some of my tables to get the performance I'm looking for?


I've isolated the smallest sub-problem and bottle neck I think I have. Here's my lighter query

select`users`.`UserID`,`users`.`_Customer`
from`users`
left join`userassociations`on`userassociations`.`UserID`=`users`.`UserID`
and`userassociations`.`__Active`=1 
where(`users`.`Email`='brian@stumpyinc.com'
    or`userassociations`.`_Email`='brian@stumpyinc.com')
and`users`.`Deleted`=0 
order by`users`.`DateTimeAdded`desc 
limit 50;

And the explain

+---+--------+------------------+--+-----+--------------------------------------------------------+-------------------------+-----+-----------------------+--------+--------+-------------+
| 1 | SIMPLE | users            |  | ref | users_getemail_INDEX,unify_email_INDEX,users_Customers | users_Customers         |   4 | const                 | 221463 | 100.00 | Using where |
| 1 | SIMPLE | userassociations |  | ref | userassociations_UserID                                | userassociations_UserID | 144 | sterling.users.UserID |      1 | 100.00 | Using where |
+---+--------+------------------+--+-----+--------------------------------------------------------+-------------------------+-----+-----------------------+--------+--------+-------------+

This query takes about 1.5 seconds to execute


CREATE TABLE `users` (
  `UserID` char(36) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
    ...
  `Email` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
    ...
  `DateTimeAdded` datetime DEFAULT NULL,
    ...
  `Deleted` int(1) NOT NULL DEFAULT '0',
    ...
  `_LatestInvoiceDateTimeAdded` datetime DEFAULT NULL,
  `_InvoiceCount` int(11) NOT NULL DEFAULT '0',
  `_Customer` varchar(512) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
    ...
  PRIMARY KEY (`UserID`),
  UNIQUE KEY `UserID_UNIQUE` (`UserID`),
    ...
  KEY `users_getemail_INDEX` (`Email`(191),`_InvoiceCount`,`_LatestInvoiceDateTimeAdded`,`DateTimeAdded`),
  KEY `unify_email_INDEX` (`Email`(191),`UserID`),
    ...
  KEY `users_Customers` (`Deleted`,`DateTimeAdded`),
    ...
  KEY `users_DateTimeAdded` (`DateTimeAdded`,`UserID`),
  FULLTEXT KEY `users_FULLTEXT__Customer` (`_Customer`),
    ...
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;


CREATE TABLE `userassociations` (
   `UserAssociationID` binary(16) NOT NULL,
   `UserID` char(36) COLLATE utf8mb4_unicode_ci NOT NULL,
   `AssociatedUserID` char(36) COLLATE utf8mb4_unicode_ci NOT NULL,
   `_Email` varchar(256) COLLATE utf8mb4_unicode_ci NOT NULL,
   `__UserID` char(36) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
   `__Active` tinyint(1) NOT NULL DEFAULT '1',
   `__Added` timestamp(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6),
   `__Updated` timestamp(6) NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP(6),
   PRIMARY KEY (`UserAssociationID`),
   KEY `userassociations_UserID` (`UserID`),
   KEY `userassociations_AssociatedUserID` (`AssociatedUserID`),
   KEY `userassociations___UserID` (`__UserID`),
   KEY `userassociations__Email` (`_Email`),
   CONSTRAINT `userassociations_AssociatedUserID` FOREIGN KEY (`AssociatedUserID`) REFERENCES `users` (`UserID`) ON DELETE NO ACTION ON UPDATE NO ACTION,
   CONSTRAINT `userassociations_UserID` FOREIGN KEY (`UserID`) REFERENCES `users` (`UserID`) ON DELETE NO ACTION ON UPDATE NO ACTION,
   CONSTRAINT `userassociations___UserID` FOREIGN KEY (`__UserID`) REFERENCES `users` (`UserID`) ON DELETE NO ACTION ON UPDATE NO ACTION
 ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

Hmm... So it seemed like it was working, but I've found a pair of tables that it doesn't seem to be as efficient with, which is my users and invoices tables.

I've have these indexes:

users:    INDEX(`CompanyID`, `Deleted`, `DateTimeAdded`) 
invoices: INDEX(`UserID`, `__Active`) 
invoices: INDEX(`CompanyID`)
users:    INDEX(`UserID`, `Deleted`)

and the query

select`users`.`UserID`,`users`.`DateTimeAdded`
from`users`
join`invoices`on`invoices`.`UserID`=`users`.`UserID`
and`invoices`.`__Active`=1 
where`invoices`.`CompanyID`='3e55c8b4-d8b6-11e4-b38f-b8ca3a83b4c8'
and`users`.`Deleted`=0 
order by`DateTimeAdded`desc 
limit 200;

This query alone takes .3 seconds, which feels slow to me, like it isn't making the best use of the indexes, especially because users only has 430,997 rows and invoices only has 194,180, and this looks like it should be a pretty simple query.

EDIT: Actually it's much worse than that, if the CompanyID given only includes ~4 rows, then this query takes 3.5 seconds

+---+--------+----------+--+-----+------------------------------------------------+-----------------------+-----+--------------------------------+------+--------+----------------------------------------------+
| 1 | SIMPLE | invoices |  | ref | Invoice_CompanyID_idx,invoices_UserID___Active | Invoice_CompanyID_idx | 144 | const                          | 7750 | 10.00  | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | users    |  | ref | users_UserID_Deleted                           | users_UserID_Deleted  | 148 | sterling.invoices.UserID,const |    1 | 100.00 |                                              |
+---+--------+----------+--+-----+------------------------------------------------+-----------------------+-----+--------------------------------+------+--------+----------------------------------------------+
You need to login account before you can post.

About| Privacy statement| Terms of Service| Advertising| Contact us| Help| Sitemap|
Processed in 0.392783 second(s) , Gzip On .

© 2016 Powered by mzan.com design MATCHINFO