Welcome Guest, Not a member yet? Register   Sign In
How to file_get_contents a gzip page ?
#1

[eluser]Benjamin David[/eluser]
Hi !

I'm trying to get the content of a page that has gzip header compression. When I file_get_contents the page, the result is compressed and I can't find a way to get the original content, uncompressed.

I've tried the gzdeflate and gzuncompress php 5 function and it didn't work. I haven't tried the gzdecode function as it comes with PHP 6 and my host hasn't it yet.

The page I'm trying to get content from is there :

http://ax.phobos.apple.com.edgesuite.net...on&limit=1

And it's killing me because I can make it work in Javascript with Ajax requests because the browser automaticaly uncompresses the whole thing but not with PHP... Too bad !

Thanks for helping !
#2

[eluser]xwero[/eluser]
How do you read the content now?
#3

[eluser]pistolPete[/eluser]
This problem is described here plus a workaround is suggested:
http://bugs.php.net/bug.php?id=22967

But there are other possibilities:

Try to send a HTTP request without
Code:
Accept-Encoding: gzip
You can find an example here: http://php.net/fsockopen

Alternatively you could use cURL:
Code:
curl_setopt($handle,CURLOPT_ENCODING , 'gzip');

Or use one of the user contributed gzdecode() functions here: http://php.net/gzencode
#4

[eluser]Benjamin David[/eluser]
Thanks for your answer ! It's giving me hope Smile

I'm gonna test them and I'll hopefully write back the final word so it can help others that are trying to use Apple API.
#5

[eluser]Benjamin David[/eluser]
The user contributed gzdecode() function you were talking about works perfectly ! Thanks a lot pistolPete !

So here's the function I used to get decode a gziped string (found there : http://php.net/gzencode) :
Code:
<?php

function gzdecode($data) {
  $len = strlen($data);
  if ($len < 18 || strcmp(substr($data,0,2),"\x1f\x8b")) {
    return null;  // Not GZIP format (See RFC 1952)
  }
  $method = ord(substr($data,2,1));  // Compression method
  $flags  = ord(substr($data,3,1));  // Flags
  if ($flags & 31 != $flags) {
    // Reserved bits are set -- NOT ALLOWED by RFC 1952
    return null;
  }
  // NOTE: $mtime may be negative (PHP integer limitations)
  $mtime = unpack("V", substr($data,4,4));
  $mtime = $mtime[1];
  $xfl   = substr($data,8,1);
  $os    = substr($data,8,1);
  $headerlen = 10;
  $extralen  = 0;
  $extra     = "";
  if ($flags & 4) {
    // 2-byte length prefixed EXTRA data in header
    if ($len - $headerlen - 2 < 8) {
      return false;    // Invalid format
    }
    $extralen = unpack("v",substr($data,8,2));
    $extralen = $extralen[1];
    if ($len - $headerlen - 2 - $extralen < 8) {
      return false;    // Invalid format
    }
    $extra = substr($data,10,$extralen);
    $headerlen += 2 + $extralen;
  }

  $filenamelen = 0;
  $filename = "";
  if ($flags & 8) {
    // C-style string file NAME data in header
    if ($len - $headerlen - 1 < 8) {
      return false;    // Invalid format
    }
    $filenamelen = strpos(substr($data,8+$extralen),chr(0));
    if ($filenamelen === false || $len - $headerlen - $filenamelen - 1 < 8) {
      return false;    // Invalid format
    }
    $filename = substr($data,$headerlen,$filenamelen);
    $headerlen += $filenamelen + 1;
  }

  $commentlen = 0;
  $comment = "";
  if ($flags & 16) {
    // C-style string COMMENT data in header
    if ($len - $headerlen - 1 < 8) {
      return false;    // Invalid format
    }
    $commentlen = strpos(substr($data,8+$extralen+$filenamelen),chr(0));
    if ($commentlen === false || $len - $headerlen - $commentlen - 1 < 8) {
      return false;    // Invalid header format
    }
    $comment = substr($data,$headerlen,$commentlen);
    $headerlen += $commentlen + 1;
  }

  $headercrc = "";
  if ($flags & 1) {
    // 2-bytes (lowest order) of CRC32 on header present
    if ($len - $headerlen - 2 < 8) {
      return false;    // Invalid format
    }
    $calccrc = crc32(substr($data,0,$headerlen)) & 0xffff;
    $headercrc = unpack("v", substr($data,$headerlen,2));
    $headercrc = $headercrc[1];
    if ($headercrc != $calccrc) {
      return false;    // Bad header CRC
    }
    $headerlen += 2;
  }

  // GZIP FOOTER - These be negative due to PHP's limitations
  $datacrc = unpack("V",substr($data,-8,4));
  $datacrc = $datacrc[1];
  $isize = unpack("V",substr($data,-4));
  $isize = $isize[1];

  // Perform the decompression:
  $bodylen = $len-$headerlen-8;
  if ($bodylen < 1) {
    // This should never happen - IMPLEMENTATION BUG!
    return null;
  }
  $body = substr($data,$headerlen,$bodylen);
  $data = "";
  if ($bodylen > 0) {
    switch ($method) {
      case 8:
        // Currently the only supported compression method:
        $data = gzinflate($body);
        break;
      default:
        // Unknown compression method
        return false;
    }
  } else {
    // I'm not sure if zero-byte body content is allowed.
    // Allow it for now...  Do nothing...
  }

  // Verifiy decompressed size and CRC32:
  // NOTE: This may fail with large data sizes depending on how
  //       PHP's integer limitations affect strlen() since $isize
  //       may be negative for large sizes.
  if ($isize != strlen($data) || crc32($data) != $datacrc) {
    // Bad format!  Length or CRC doesn't match!
    return false;
  }
  return $data;
}

?&gt;
#6

[eluser]digitalpbk[/eluser]
Thanks for this information, I had a problem similarly while using file_get_contents which returned me gzip contents and I had to unzip. See more details file_get_contents gzip uncompress




Theme © iAndrew 2016 - Forum software by © MyBB