CodeIgniter Forums
How to file_get_contents a gzip page ? - Printable Version

+- CodeIgniter Forums (https://forum.codeigniter.com)
+-- Forum: Archived Discussions (https://forum.codeigniter.com/forumdisplay.php?fid=20)
+--- Forum: Archived Development & Programming (https://forum.codeigniter.com/forumdisplay.php?fid=23)
+--- Thread: How to file_get_contents a gzip page ? (/showthread.php?tid=16093)



How to file_get_contents a gzip page ? - El Forum - 02-24-2009

[eluser]Benjamin David[/eluser]
Hi !

I'm trying to get the content of a page that has gzip header compression. When I file_get_contents the page, the result is compressed and I can't find a way to get the original content, uncompressed.

I've tried the gzdeflate and gzuncompress php 5 function and it didn't work. I haven't tried the gzdecode function as it comes with PHP 6 and my host hasn't it yet.

The page I'm trying to get content from is there :

http://ax.phobos.apple.com.edgesuite.net/WebObjects/MZStoreServices.woa/wa/wsSearch?term=jack+johnson&limit=1

And it's killing me because I can make it work in Javascript with Ajax requests because the browser automaticaly uncompresses the whole thing but not with PHP... Too bad !

Thanks for helping !


How to file_get_contents a gzip page ? - El Forum - 02-24-2009

[eluser]xwero[/eluser]
How do you read the content now?


How to file_get_contents a gzip page ? - El Forum - 02-24-2009

[eluser]pistolPete[/eluser]
This problem is described here plus a workaround is suggested:
http://bugs.php.net/bug.php?id=22967

But there are other possibilities:

Try to send a HTTP request without
Code:
Accept-Encoding: gzip
You can find an example here: http://php.net/fsockopen

Alternatively you could use cURL:
Code:
curl_setopt($handle,CURLOPT_ENCODING , 'gzip');

Or use one of the user contributed gzdecode() functions here: http://php.net/gzencode


How to file_get_contents a gzip page ? - El Forum - 02-24-2009

[eluser]Benjamin David[/eluser]
Thanks for your answer ! It's giving me hope Smile

I'm gonna test them and I'll hopefully write back the final word so it can help others that are trying to use Apple API.


How to file_get_contents a gzip page ? - El Forum - 02-24-2009

[eluser]Benjamin David[/eluser]
The user contributed gzdecode() function you were talking about works perfectly ! Thanks a lot pistolPete !

So here's the function I used to get decode a gziped string (found there : http://php.net/gzencode) :
Code:
<?php

function gzdecode($data) {
  $len = strlen($data);
  if ($len < 18 || strcmp(substr($data,0,2),"\x1f\x8b")) {
    return null;  // Not GZIP format (See RFC 1952)
  }
  $method = ord(substr($data,2,1));  // Compression method
  $flags  = ord(substr($data,3,1));  // Flags
  if ($flags & 31 != $flags) {
    // Reserved bits are set -- NOT ALLOWED by RFC 1952
    return null;
  }
  // NOTE: $mtime may be negative (PHP integer limitations)
  $mtime = unpack("V", substr($data,4,4));
  $mtime = $mtime[1];
  $xfl   = substr($data,8,1);
  $os    = substr($data,8,1);
  $headerlen = 10;
  $extralen  = 0;
  $extra     = "";
  if ($flags & 4) {
    // 2-byte length prefixed EXTRA data in header
    if ($len - $headerlen - 2 < 8) {
      return false;    // Invalid format
    }
    $extralen = unpack("v",substr($data,8,2));
    $extralen = $extralen[1];
    if ($len - $headerlen - 2 - $extralen < 8) {
      return false;    // Invalid format
    }
    $extra = substr($data,10,$extralen);
    $headerlen += 2 + $extralen;
  }

  $filenamelen = 0;
  $filename = "";
  if ($flags & 8) {
    // C-style string file NAME data in header
    if ($len - $headerlen - 1 < 8) {
      return false;    // Invalid format
    }
    $filenamelen = strpos(substr($data,8+$extralen),chr(0));
    if ($filenamelen === false || $len - $headerlen - $filenamelen - 1 < 8) {
      return false;    // Invalid format
    }
    $filename = substr($data,$headerlen,$filenamelen);
    $headerlen += $filenamelen + 1;
  }

  $commentlen = 0;
  $comment = "";
  if ($flags & 16) {
    // C-style string COMMENT data in header
    if ($len - $headerlen - 1 < 8) {
      return false;    // Invalid format
    }
    $commentlen = strpos(substr($data,8+$extralen+$filenamelen),chr(0));
    if ($commentlen === false || $len - $headerlen - $commentlen - 1 < 8) {
      return false;    // Invalid header format
    }
    $comment = substr($data,$headerlen,$commentlen);
    $headerlen += $commentlen + 1;
  }

  $headercrc = "";
  if ($flags & 1) {
    // 2-bytes (lowest order) of CRC32 on header present
    if ($len - $headerlen - 2 < 8) {
      return false;    // Invalid format
    }
    $calccrc = crc32(substr($data,0,$headerlen)) & 0xffff;
    $headercrc = unpack("v", substr($data,$headerlen,2));
    $headercrc = $headercrc[1];
    if ($headercrc != $calccrc) {
      return false;    // Bad header CRC
    }
    $headerlen += 2;
  }

  // GZIP FOOTER - These be negative due to PHP's limitations
  $datacrc = unpack("V",substr($data,-8,4));
  $datacrc = $datacrc[1];
  $isize = unpack("V",substr($data,-4));
  $isize = $isize[1];

  // Perform the decompression:
  $bodylen = $len-$headerlen-8;
  if ($bodylen < 1) {
    // This should never happen - IMPLEMENTATION BUG!
    return null;
  }
  $body = substr($data,$headerlen,$bodylen);
  $data = "";
  if ($bodylen > 0) {
    switch ($method) {
      case 8:
        // Currently the only supported compression method:
        $data = gzinflate($body);
        break;
      default:
        // Unknown compression method
        return false;
    }
  } else {
    // I'm not sure if zero-byte body content is allowed.
    // Allow it for now...  Do nothing...
  }

  // Verifiy decompressed size and CRC32:
  // NOTE: This may fail with large data sizes depending on how
  //       PHP's integer limitations affect strlen() since $isize
  //       may be negative for large sizes.
  if ($isize != strlen($data) || crc32($data) != $datacrc) {
    // Bad format!  Length or CRC doesn't match!
    return false;
  }
  return $data;
}

?&gt;



How to file_get_contents a gzip page ? - El Forum - 03-08-2010

[eluser]digitalpbk[/eluser]
Thanks for this information, I had a problem similarly while using file_get_contents which returned me gzip contents and I had to unzip. See more details file_get_contents gzip uncompress