Welcome Guest, Not a member yet? Register   Sign In
Run controller method in background
#1

[eluser]Unknown[/eluser]
I'll start with what my program does. The index function of controller takes an array of URLs and keywords and stores them in DB. Now the crawlLink method with take all the keywords and URLs. The URLs are searched for all the keywords and the sublinks of all the URLs are generated and again stored in DB which are also searched for the keywords. Keywords are searched in each link using search method. The sublinks are extracted from all the URLs using extract_links function. search and extract_links both have a method called get_web_page which takes the complete content of the page using cURL. get_web_page is used once in search function to get content of web page so that keywords can be extracted from it. It is also used in extract_links function to extract links with valid page content.

Now crawlLink calls search function twice. Once to extract keywords from domain links and second time to extract keywords from sublinks. Hence, get_web_page is called thrice. It approximately takes 5 mins to get contents of around 150 links. And it is called thrice so it takes 15 minutes of processing time. During that duration nothing can be done. Thus, I want to run this process in background and show its status while processing. extract_links and get_web_page are included in the controller using include_once.

The get_web_page function is as follows:

Code:
function get_web_page( $url )
{
    $options = array(
    CURLOPT_RETURNTRANSFER => true,     // return web page
    CURLOPT_HEADER         => false,    // don't return headers
    CURLOPT_FOLLOWLOCATION => true,     // follow redirects
    CURLOPT_ENCODING       => "",       // handle compressed
    CURLOPT_USERAGENT      => "spider", // who am i
    CURLOPT_AUTOREFERER    => true,     // set referer on redirect
    CURLOPT_CONNECTTIMEOUT => 120,      // timeout on connect
    CURLOPT_TIMEOUT        => 120,      // timeout on response
    CURLOPT_MAXREDIRS      => 10,       // stop after 10 redirects
    );

    $ch      = curl_init( $url );
    curl_setopt_array( $ch, $options );
    $content = curl_exec( $ch );
    $err     = curl_errno( $ch );
    $errmsg  = curl_error( $ch );
    $header  = curl_getinfo( $ch );
    curl_close( $ch );

    $header['errno']   = $err;
    $header['errmsg']  = $errmsg;
    $header['content'] = $content;
    return $header;
}
An input of URLs and keywords once from the user can be considered as a task. Now this task can be started and it will start running in the background. At the same time another task can be defined and can be started. Each task will have statuses like "To Do", "In Progress", "Pending", "Done", etc. The Simple Task Board by Oscar Dias is the exact way I want the tasks to be displayed.

I read about so many ways to run function in background that now I am in a dilemma about which approach to adopt. I read about exec, pcntl_fork, Gearman and other but all need CLI which I don't want to use. I tried installing Gearman with Cygwin but got stuck in Gearman installation as it cannot find libevent. I've installed libevent separately but still it doesn't work. And Gearman needs CLI so dropped it. I don't want to use CRON also. I just want to know which approach will be best in my scenario.

I am using PHP 5.3.8 | Codeigniter 2.1.3 | Apache 2.2.21 | MySQL 5.5.16 | Windows 7 64 bit

EDIT:
I need to run
Code:
localhost/codeigniter/index.php/controller/method
in background. The controller method will have sub-methods which should also run in background. Any life saving suggestions are most welcome !




Theme © iAndrew 2016 - Forum software by © MyBB