Welcome Guest, Not a member yet? Register   Sign In
SQL [Advanced, I think...]
#1

[eluser]Lewis Cowles[/eluser]
Hi so this is more about optimizing my SQL, than actual codeigniter, unless someone knows of a way to do this using codeigniter's in-built libraries and helpers more efficiently. It is worth noting that real-time data or as near as possible is a goal of the application I am working on so I think caching may produce undesirable effects.

I am running an SQL query to get the soonest appointment within a date range, that is also restricted by the distance between the postcode of the searcher and the postcode of the venue.

Code:
"SELECT postcodes.postcode, postcodes.id, available_venues.*, address_details.*, people.*, ( 3958.7558 * ACOS( COS( RADIANS( '%s' ) ) * COS( RADIANS( postcodes.latitude ) ) * COS( RADIANS( postcodes.longitude ) - RADIANS( '%s' ) ) + SIN( RADIANS( '%s' ) ) * SIN( RADIANS( postcodes.latitude ) ) ) ) AS distance FROM postcodes, people, available_venues, address_details HAVING distance < '%s' AND available_venues.availability > 0 AND available_venues.address_id = address_details.id AND address_details.postcode_id = postcodes.id AND available_venues.expert_id = people.id AND available_venues.date BETWEEN '%s' AND '%s' ORDER BY available_venues.date ASC, distance ASC LIMIT %s , %s"

so the above is my current query, I am currently using MyISAM, although as the relationships are defined in software, I suppose I should really use InnoDB? As you can probably tell i'm no db wizzard, please help lol
#2

[eluser]NotDior[/eluser]
Have you already used 'EXPLAIN SELECT....." on your query to see what it's using to search and the path it takes? I found that to help a bit on some of my queries and by putting in some indexes it helped out things quite a bit.
#3

[eluser]Aken[/eluser]
http://www.arubin.org/files/geo_search.pdf
#4

[eluser]Lewis Cowles[/eluser]
Hi, thanks for the reply,

I have used explain, but I am a little confused to it's output, each of the tables has an index on id and some have other indexes, but it states no indexes used... I'm unsure what to do about that

Basically I'm wondering if I should carry out two queries at this stage, one for venues within the dates and then a second on the distance, as I believe, the "as distance" calculation adds a lot of overhead to the query on 1.8million postcodes.

Furthermore I do not believe there will be more than 1.8million active available venues within sensible date ranges, and there are 1.8million postcodes in my database, so these are getting checked every time, so I think, that there would be less work if I made two queries, but I am unsure of how to use the available venue id's in the postcode distance check when from another query.

This is just basically that I need a real "know it all" to confirm that my "common sense" approach has a basis in the world of SQL
#5

[eluser]Lewis Cowles[/eluser]
Hey Aken, if i wasn't happily married a would kiss you, thanks, amazing! Just testing now...

[EDIT]
The Stored Procedure is not nearly as fast as in the example, even when the example is copied directly and I am only seeing nominal gains...
#6

[eluser]Aken[/eluser]
I'm no SQL expert, unfortunately. Just sharing something that seemed relevant. Google search for "Haversine formula php" led me to that document - maybe poke around like I did to find things that help you.

Also not sure if it will help much or not, but maybe you could do you math for finding the circle coordinates in PHP rather than MySQL.
#7

[eluser]Lewis Cowles[/eluser]
thanks again aken, the reason I'm using mysql is because it's compiled and as sql is much simpler than php, I feel the performance will be better in mysql, also sending 1.8million postcodes to php may not be such a bright idea... and I have every postcode in the uk
#8

[eluser]Aken[/eluser]
Ah, yeah, I didn't realize the math was still in the SELECT portion, I thought it was calculating a WHERE value.
#9

[eluser]Lewis Cowles[/eluser]
Just to keep you all updated I have heavily optimized the query by passing pre-computed values for a rectangle point check (for broad-phase check) using xmin,xmax,ymin&ymax; on the latitude and longitude, it does require having a valid postcode to search from but it has returned three results in 0.006seconds, for the broad phase check and 0.037 seconds for the full radial search within that result set, I'm still working on it, but i'll post the code here when I'm done so others can learn from it, Googles example for postcode search is terrible!
#10

[eluser]Lewis Cowles[/eluser]
Right, I have carried out some tests and posted times for the queries for a modified version of the broad phase test as well as a much higher performance version of the narrow-phase which is now the only check, tested for 300,000 dummy records

Broad Phase New
Code:
/*
* new plan for sql search to make faster is to search within dates and lattitude & longitude within xmin,xmax and ymin,ymax (rectangle point collision) precomputed by php
*/

// 0.0005 secs to query (1)
// 0.0007 secs to query (2)
// 0.0376 secs to query (3)
// 0.0166 secs to query (4)
// 0.0008 secs to query (5)
// 0.0008 secs to query (6)
// 0.0026 secs to query (7)
$query1 = sprintf("SELECT available_venues.id AS 'venue_id', available_venues.availability, available_venues.address_id, available_venues.date, address_details.id AS 'address_id', postcodes.id AS 'postcode_id', postcodes.postcode FROM available_venues, address_details, postcodes WHERE available_venues.date BETWEEN '%s' AND '%s' AND address_details.id = available_venues.address_id AND address_details.postcode_id = postcodes.id AND available_venues.availability > 0 AND postcodes.latitude BETWEEN %f and %f AND postcodes.longitude BETWEEN %f and %f",
mysql_real_escape_string($datefrom),
mysql_real_escape_string($dateto),
$postcodes->latitude + ($this->change_in_latitude($radius)*-1),
$postcodes->latitude + $this->change_in_latitude($radius),
$postcodes->longitude + ($this->change_in_longitude($postcodes->latitude,$radius)*-1),
$postcodes->longitude + $this->change_in_longitude($postcodes->latitude,$radius)
);

OLDER NARROW-PHASE CHECK, modified for faster searching (I remembered AS slows queries nominally and checks with AND drop out immediately upon false VALUES)
Code:
// 0.0278 secs to query
// 0.0008 secs to query (2)
// 0.0075 secs to query (3)
// 0.0008 secs to query (4)
// 0.0008 secs to query (5)
// 0.0025 secs to query (6)
// 0.0038 secs to query (7)
$query1 = sprintf("SELECT available_venues.id AS 'venue_id', available_venues.availability, available_venues.address_id, available_venues.date, address_details.id AS 'address_id', postcodes.id AS 'postcode_id', postcodes.postcode FROM available_venues, address_details, postcodes WHERE ( 3958.7558 * ACOS( COS( RADIANS( '%s' ) ) * COS( RADIANS( postcodes.latitude ) ) * COS( RADIANS( postcodes.longitude ) - RADIANS( '%s' ) ) + SIN( RADIANS( '%s' ) ) * SIN( RADIANS( postcodes.latitude ) ) ) ) < %f AND available_venues.date BETWEEN '%s' AND '%s' AND address_details.id = available_venues.address_id AND address_details.postcode_id = postcodes.id AND available_venues.availability > 0",
mysql_real_escape_string($postcodes->latitude),
mysql_real_escape_string($postcodes->longitude),
mysql_real_escape_string($postcodes->latitude),
mysql_real_escape_string($radius),
mysql_real_escape_string($datefrom),
mysql_real_escape_string($dateto)
);

In the future this will be implemented using a single select with active-record in CI, also as a note I think the performance differences are nominal, however the rectangular check is less accurate!

thanks for all your help guys!




Theme © iAndrew 2016 - Forum software by © MyBB