I'm working on a web scraper that turns web pages with rows of data (like a page of search results) into CSV files to use in Excel. Of course, there must be custom code for every site the scraper works on. Here is how I've arranged it, and it works fine, but I'd like feedback on the OO implementation. Not sure it follows best practices.
Here's the controller code. Give it the database row id of a search, and it will retrieve the search profile (search name, search url, etc.), load the correct library file, execute the scrape() function in that library, and then download the file.
PHP Code:
// controller Searches.php
public function execute($search_id)
{
$search = $this->searches_model->get_search($search_id);
if ($this->input->server('REQUEST_METHOD') == 'POST')
{
$this->load->library($search['site_class']);
$output = $this->$search['site_class']->scrape($search);
force_download($search['name'] . '.csv', $output);
}
Then there is.
PHP Code:
// application/libraries/Site.php
abstract class Site {
static private $curl_options = array(
//bunch of CURLOPT options
);
abstract protected function scrape ($search);
public function get_page($url)
{
}
public function clean_field ($field)
{
}
}
And
PHP Code:
// application/libraries/Site_craigslist.php
include ('Site.php');
class Site_craigslist extends Site {
const SITE = 'http://sfbay.craigslist.com';
const SITE_CODE = 'CL';
public function scrape($search)
{
}
}
In the controller code, $search['site_class'] contains the string 'site_craigslist,' so Site_craigslist.php is loaded, and it loads Site.php, the parent class. It works, but I'm not instantiating anything, and I have to use self:: to access the parent class methods. I've read you generally shouldn't use classes statically, so I suspect the above is not best practice. Any input or suggestions for a better structure and approach?
Hey, don't work without a PHP debugger. Several free IDEs have this features built in. Two are NetBeans and CodeLobster. Without a debugger, it's like you're driving with a blindfold on -- you are going to crash!