• 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Most efficient way to list 'popular' items on very high load website

[eluser]James Spibey[/eluser]

I'm looking for some feedback on the most efficient way to determine what the most popular articles/items are for a given time period. The way this is currently implemented on our website is causing us issues because I think we are getting into a position where our database is locking whilst trying to retrieve the list of popular items.

For simplicity's sake, let's say we have a table where we record pagehits with the appropriate article id. We are currently using the following query to get the 5 most popular articles for the last 7 days

SELECT `id`, `title`, `body`
FROM (`articles`)
WHERE articles.active = 1
ORDER BY(SELECT COUNT(*) FROM pageviews pv WHERE pv.page_type = 2 AND pv.page_id = articles.id AND from_unixtime(pv.datecreated) > date_add(curdate(), interval -7 day))

This query is used on our homepage but is cached so it only runs every 60 seconds. However, under certain circumstances, it can lock the database entirely which I believe is because the pageviews table is receiving a large amount of inserts whilst the query is trying to run.

Is there a better way of doing this kind of thing?




I dunno what database you're using, but it should be possible to solve the locking problem by changing the transaction isolation level to allow a read without locking the writes (which can lead to inconsistent results because of dirty reads but in your case you don't care).

Google for READ UNCOMMITTED and your database manager.


[eluser]Daniel Eriksson[/eluser]
Another possible solution is to combine updates to the pagehits table by (for example) storing pagehit information in your session and writing a custom session handler that on close() flushes the pagehit data to the pagehits table.

(This is only useful if you typically get multiple pageviews per session.)

[eluser]James Spibey[/eluser]
Thanks to you both for interesting suggestions. Regarding the READ UNCOMMITED suggestion, I am using MySql (MYISAM engine) so this is a possibility, however as this command is session-wide rather than being a query hint I have read that this can add a couple of seconds to the query. It's a pity there isn't something like the WITH(NOLOCK) hint which Sql Server uses.

[eluser]Sean Murphy[/eluser]
First off, I would get rid of that subselect. Try a join instead. Something like:
SELECT COUNT(pv.id) AS views, `id`, `title`, `body`
FROM `articles`
LEFT JOIN `pageviews` AS pv ON (pv.page_id = articles.id)
WHERE articles.active = 1 AND pv.page_type = 2 AND from_unixtime(pv.datecreated) > date_add(curdate(), interval -7 day)
GROUP BY pv.id

Second, that query will never be cached as long as it uses the CURDATE() function. MySQL SELECT Entries Before NOW()

Third, as far as I know, even if it was being cached that query won't benefit much from the MySQL query cache because it gets invalidated with every page view.

Fourth, setting the transaction-isolation level to READ UNCOMMITTED will not work for MyISAM tables since they don't support transactions.

[eluser]James Spibey[/eluser]
Actually using a sub-query is about 400% quicker than using a join on our systems (both hitting indexes etc) because the join requires a temporary table whereas the sub-query does not. The caching is done in the application rather than the database so again nothing to worry about there. Thanks for the tip on on READ UNCOMMITTED though, i've only really looked at the INNODB engine very briefly, would it give us any benefits which would make it worth moving? My understanding was that you lose performance going to INNODB from MYISAM but you gain the ability to run transactions.

[eluser]Sean Murphy[/eluser]
It's surprising that you can't get the join to out perform a sub-query considering how inefficient sub-queries currently are in MySQL.

If you're having problems with table locks then InnoDB might be a worthwhile move since it allows row-level locking. Yes, MyISAM generally gives the best read performance, but I'd test InnoDB--you might find that you get better performance.

Also, a very common practice for this type of situation is to use summary tables. Basically you run your "intensive" query once a week, day, hour, whatever, to generate the contents of your summary table. Then your app just uses a simple query to pull rows from the summary table.

Hope that helps.

I use (and recommend) the "summary" table approach, updated every hour maybe.

Regarding JOIN's vs sub-queries, i have to agree that the former are slower on MySQL (using MyISAM), maybe you won't notice it with just one query, but try usinf multiple JOINS vs multiple sub-queries and you will spot the difference.

Digg   Delicious   Reddit   Facebook   Twitter   StumbleUpon  

  Theme © 2014 iAndrew  
Powered By MyBB, © 2002-2020 MyBB Group.