Welcome Guest, Not a member yet? Register   Sign In
Efficiency of specific queries
#1

I've got a project I'm working on where due to NDA I can't reveal what I'm doing, but I can generalize in hopes of determining if my queries are the most efficient for a particular purpose. If you look at the following query, it's doing a couple of things:

1. By grouping, I am finding people that have more than one shipping address.
2. By ordering, I am finding datetime values in a "last_called" field that are the oldest possible, but they have to be at least 5 days old.

Code:
SELECT
 id
FROM
 (SELECT
   p.id,
   p.last_called,
   COUNT(a.addr_id) AS cnt
 FROM
   people p
   JOIN addresses a
     ON p.id = a.person_id
 WHERE p.last_called <= DATE_SUB(NOW(), INTERVAL 5 DAY)
   AND a.type = "shipping"
 GROUP BY p.id
 HAVING COUNT(a.addr_id) > 1) foo
ORDER BY last_called ASC
LIMIT 10

This works as expected. I'm getting exactly the results I expect, and that is a bunch of people IDs. I take those people IDs and plug them into this query:


Code:
SELECT
 *
FROM
 people p
 LEFT JOIN addresses a
   ON p.id = a.person_id
WHERE p.id IN (...)

I need this second query, because along with the people's shipping addresses, I also need their billing address.

I think one mistake I am making is that I am fetching the result of the first query, and using PHP to implode the people IDs. I should just let MySQL do all the work, but when I try to put the first query in the second query, I get an error:

Error Code: 1235
This version of MariaDB doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'


This is that attempted query:

Code:
SELECT
 *
FROM
 people p
 LEFT JOIN addresses a
   ON p.id = a.person_id
WHERE p.id IN (SELECT
   id
 FROM
   (SELECT
     p.id,
     p.last_called,
     COUNT(a.addr_id) AS cnt
   FROM
     people p
     JOIN addresses a
       ON p.id = a.person_id
   WHERE p.last_called <= DATE_SUB(NOW(), INTERVAL 5 DAY)
     AND a.type = "shipping"
   GROUP BY p.id
   HAVING COUNT(a.addr_id) > 1) foo
 ORDER BY last_called ASC
 LIMIT 10
)

I understand the error / limitation, so I'm not asking about that, but would still like to do something to make this as efficient as possible.

The people table has about 50K records. The addresses table has about 250K rows.

When I need to change the limit in the first query to something like 300, these queries take about 10 seconds. I'm wondering if anyone has any ideas that will allow me to shave a few seconds off and still get the same results.
Reply
#2

Two things that might help.

1) Do you have an index on all fields that end up in your joins, like a.person_id in your first query?
2) COUNT is notoriously slow, and I've found it's more of an estimate when you get into large numbers anyway. You might try refactoring that first query into 2 queries so you no longer have to do a COUNT in there. I've had drastic improvements doing that, even when it's against they way we normally might try to minimize the number of queries.
Reply
#3

(03-30-2017, 10:41 PM)kilishan Wrote: Two things that might help.

1) Do you have an index on all fields that end up in your joins, like a.person_id in your first query?
2) COUNT is notoriously slow, and I've found it's more of an estimate when you get into large numbers anyway. You might try refactoring that first query into 2 queries so you no longer have to do a COUNT in there. I've had drastic improvements doing that, even when it's against they way we normally might try to minimize the number of queries.

Yes, the proper fields are indexed. As it turns out, after breaking down everything that happens before and after these queries, I found the reason why things were so slow.

After the queries run, I'm connecting to another database on a remote server, and inserting/updating as needed. Then I mark the local rows with a timestamp. All of that was in a transaction, but it turned out I needed two separate transactions, one for the remote database, and one for the local. After applying that second transaction, the entire process runs almost instantaneously, even when selecting 500 records for export. It actually runs so fast I thought something might be wrong Smile

Thanks for your help. I didn't realize count was supposedly so slow.
Reply
#4

Glad you found it!

I never had an issue with count until working on a site with a database of 8+ million records and breaking one of the queries down away from count (which was also giving inaccurate results at that size) into 2 separate queries had the same effect on performance that you just saw on yours.
Reply




Theme © iAndrew 2016 - Forum software by © MyBB