Welcome Guest, Not a member yet? Register   Sign In
Performance hit with limit query on large dataset
#1

(This post was last modified: 10-21-2015, 11:43 AM by futurewebs.)

Hi,

Ive been working on optimizing queries today and found a number of slow queries when fetching paged results from tables with lots of rows. Ive found a way to vastly improve things but Im having problems working my findings into my existing active record queries. Heres an example of what Im doing.

This query takes over 2 seconds

Code:
$limit = 10;
$offset = 9950;

$this->db->select('id, name, address, phone');
$this->db->limit($limit, $offset);
$this->db->order_by('name','desc');
$query = $this->db->get('customers');

$result = $query->result();

this query takes less then 10% of the time

Code:
SELECT id, name, address, phone
FROM customers
INNER JOIN (
SELECT id
FROM customers
ORDER BY name DESC
LIMIT 10 OFFSET 9950)
AS my_results USING(id);


This is the thread ive taken this info from
http://www.iheavy.com/2013/06/19/3-ways-...-in-mysql/

I have indexes set up correctly and have run the raw queries on a live database to get the results

My question is now would I incorporate this method into my queries written CI using active record ?
Reply
#2

PHP Code:
$query $this->db->query('
  SELECT id, name, address, phone
  FROM customers
  INNER JOIN (
  SELECT id
  FROM customers
  ORDER BY name
  LIMIT 10 OFFSET 9950)
  AS my_results USING(id);
'
); 
Reply
#3

In your code block, your SQL does not include DESC in the order by statement, which could impact not only the results, but also the speed of the query. If you did forget the DESC when testing the speed of your SQL, you may want to re-test before going through the hassle of building it in query builder.

While you will likely have issues attempting to use a sub-query in query builder, you would start by building the internal query:

PHP Code:
$inner_query $this->db->select('id')
 
   ->from('customers')
 
   ->order_by('name''desc')
 
   ->limit($limit$offset)
 
   ->get_compiled_select(); 

Then you would build your outer query, joining the inner query:

PHP Code:
$outer_query $this->db->select(array('id''name''address''phone'))
 
   ->from('customers')
 
   ->join($inner_query ' as my_results''my_results.id = customers.id''inner')
 
   ->get(); 

More than likely, though, query builder will mangle the inner query by attempting to escape something it shouldn't, so you'll have to use db->query(), in which case you could go back to the original SQL instead of using the get_compiled_select() to build the inner query. In that case, I would recommend using query bindings to place the $limit and $offset values into the SQL.
Reply
#4

thanks for the replies guys, the code I provided is not my actual code but a representation of what im doing. Ive edited my post to include the order direction.

@skunkbad, I know about this->db->query(). The idea was to use active record to include the inner join / select

@mwhitney, this looks a bit more promising. I set of down this path but had not noticed get_compiled_select(). Ill give this a go and report back.
Reply
#5

Not having much luck with this. maybe Im tackling this from the wrong angle.

The problems are due using limit and order by on a large table. say 100,000 rows. if I want to show 25 per page then the first few pages are fine but when the query uses something like LIMIT 900000, 25 we end up reading 900,000 rows and then ditching all but 10 of them which causes the drop in performance.

Is this something you have come across in the past and how would you deal with performance issues with paginating large datasets
Reply
#6

Do you have an index on your name column?

The MySQL documentation goes into a lot of detail on optimizing queries:
- Order By Optimization: https://dev.mysql.com/doc/refman/5.7/en/...ation.html
- Limit Optimization: http://dev.mysql.com/doc/refman/5.7/en/l...ation.html

Most of the time, unless I've really messed up the query itself (or done something which creates multiple queries which could otherwise be done in fewer queries), I can figure it out with some EXPLAIN calls and just modify or create an index or two.

The most important thing, though, is to make sure you know exactly what SQL is being generated by CI. Sometimes (especially when I'm working with complicated models), it can be relatively easy to slip something extra into a query or inadvertently create an extra query. If possible, make sure the query is what's really wasting your time, too. I can see myself accidentally setting up a loop that's executed 900,000 times when it only needs to be executed 25 times.
Reply
#7

There's literally no way to trick the query builder into producing that exact query you want, and you wouldn't gain anything from that anyway ... just use query().
Reply




Theme © iAndrew 2016 - Forum software by © MyBB