Welcome Guest, Not a member yet? Register   Sign In
docx file mime wrong type back getMimeType
#1

Hello,
When I check mime type I got wrong type for docx.
Code:
PHP Code:
$path='test.docx';
$file = new \CodeIgniter\Files\File($path);
$type $file->getMimeType();
echo 
$type

I got:
application/octet-stream
But if use file from bash:
 file -b --mime-type 'test1.docx'  I got:
application/vnd.openxmlformats-officedocument.wordprocessingml.document
Any idea what is problem?
Reply
#2

(This post was last modified: 03-24-2023, 04:08 PM by kenjis.)

I got: application/vnd.openxmlformats-officedocument.wordprocessingml.document

Upgrade CI4 if you use older version.
Reply
#3

I using CI 4.3.2. Tested on Debian 11 (php 7.4) and Debian 12 (php 8.2).
Simple way for get error:
  1. open libreoffice writer and write single word "test1", save in docx format (getMimeType work as expect).
  2. add picture in document and save document (getMimeType don't work as expect I got "application/octet-stream")
Example of file can download from link - test1.docx
Reply
#4

Yes, the file is "application/octet-stream".
And the file command also returns it on my macOS.

$ file -b --mime-type test1.docx
application/octet-stream
Reply
#5

It looks like LibreOffice changes the mime type.
I tested using a native PHP function:
PHP Code:
$finfo finfo_open(FILEINFO_MIME_TYPE); // return mime type - all mimetype extension
 
$filename "E:\\test1.docx";
echo 
$filename ": " finfo_file$finfo$filename ) . "<br>";
echo 
$filename ": "  mime_content_type$filename )  "<br>"

When I add an image to the doc in LibreOffice, the mime type changes to application/octet-stream.
Reply
#6

(03-27-2023, 12:46 AM)kenjis Wrote: Yes, the file is "application/octet-stream".
And the file command also returns it on my macOS.

$ file -b --mime-type test1.docx
application/octet-stream

When I try got diferent result from file:

file -b --mime-type test1.docx
application/vnd.openxmlformats-officedocument.wordprocessingml.document

But from CI4 I got: application/octet-stream
Reply
#7

Anyway, it is an issue in finfo_file(), not in CI4. So we cannot fix it.
Please send a bug report to the PHP Group.
Reply
#8

(03-27-2023, 01:35 AM)kenjis Wrote: Anyway, it is an issue in finfo_file(), not in CI4. So we cannot fix it.
Please send a bug report to the PHP Group.

Thanks, and sry for post on wrong place.
Reply
#9

(03-27-2023, 12:46 AM)JustJohnQ Wrote: It looks like LibreOffice changes the mime type.
I tested using a native PHP function:
PHP Code:
$finfo finfo_open(FILEINFO_MIME_TYPE); // return mime type - all mimetype extension
 
$filename "E:\\test1.docx";
echo 
$filename ": " finfo_file$finfo$filename ) . "<br>";
echo 
$filename ": "  mime_content_type$filename )  "<br>"

When I add an image to the doc in LibreOffice, the mime type changes to application/octet-stream.

If update this:
PHP Code:
$finfo finfo_open(FILEINFO_MIME_TYPE

with:
PHP Code:
$finfo finfo_open(FILEINFO_MIME_TYPE,'/etc/magic'

And added in /etc/magic :
Code:
#------------------------------------------------------------------------------
# $File: msooxml,v 1.19 2023/03/14 19:46:15 christos Exp $
# msooxml:  file(1) magic for Microsoft Office XML
# From: Ralf Brown <[email protected]>

# .docx, .pptx, and .xlsx are XML plus other files inside a ZIP
#  archive.  The first member file is normally "[Content_Types].xml".
#  but some libreoffice generated files put this later. Perhaps skip
#  the "[Content_Types].xml" test?
# Since MSOOXML doesn't have anything like the uncompressed "mimetype"
#  file of ePub or OpenDocument, we'll have to scan for a filename
#  which can distinguish between the three types

0              name            msooxml
>0              string          word/          Microsoft Word 2007+
!:mime application/vnd.openxmlformats-officedocument.wordprocessingml.document
!:ext  docx
>0              string          ppt/            Microsoft PowerPoint 2007+
!:mime application/vnd.openxmlformats-officedocument.presentationml.presentation
!:ext  pptx
>0              string          xl/            Microsoft Excel 2007+
!:mime application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
!:ext  xlsx
>0              string          visio/          Microsoft Visio 2013+
!:mime application/vnd.ms-visio.drawing.main+xml
>0              string          AppManifest.xaml        Microsoft Silverlight Application
!:mime application/x-silverlight-app

# start by checking for ZIP local file header signature
0              string          PK\003\004
!:strength +10
# make sure the first file is correct
>0x1E          use            msooxml
>0x1E          default        x
>>0x1E          regex          \\[Content_Types\\]\\.xml|_rels/\\.rels|docProps|customXml
# skip to the second local file header
# since some documents include a 520-byte extra field following the file
# header, we need to scan for the next header
>>>(18.l+49)    search/6000    PK\003\004
# now skip to the *third* local file header; again, we need to scan due to a
# 520-byte extra field following the file header
>>>>&26        search/6000    PK\003\004
# and check the subdirectory name to determine which type of OOXML
# file we have.  Correct the mimetype with the registered ones:
# https://technet.microsoft.com/en-us/library/cc179224.aspx
>>>>>&26                use            msooxml
>>>>>&26                default        x
# OpenOffice/Libreoffice orders ZIP entry differently, so check the 4th file
>>>>>>&26      search/6000    PK\003\004
>>>>>>>&26      use            msooxml
# Some OOXML generators add an extra customXml directory. Check another file.
>>>>>>>&26      default        x
>>>>>>>>&26    search/6000    PK\003\004
>>>>>>>>>&26    use            msooxml
>>>>>>>>>&26    default        x              Microsoft OOXML
>>>>>>>&26      default        x              Microsoft OOXML
>>>>>&26        default        x              Microsoft OOXML
>>0x1E          regex          \\[trash\\]
>>>&26          search/6000    PK\003\004
>>>>&26        search/6000    PK\003\004
>>>>>&26        use            msooxml
>>>>>&26        default        x
>>>>>>&26      search/6000    PK\003\004
>>>>>>>&26      use            msooxml
>>>>>>>&26      default        x              Microsoft OOXML
>>>>>>&26      default        x              Microsoft OOXML
>>>>>&26        default        x              Microsoft OOXML

Now detect all office documents.

I check CI4 framework in system/Files/File.ph is function getMimeType() and don't have option for add custom magic file:
PHP Code:
    /**
    * Retrieve the media type of the file. SHOULD not use information from
    * the $_FILES array, but should use other methods to more accurately
    * determine the type of file, like finfo, or mime_content_type().
    *
    * @return string The media type we determined it to be.
    */
    public function getMimeType(): string
    
{
        if (! function_exists('finfo_open')) {
            return $this->originalMimeType ?? 'application/octet-stream'// @codeCoverageIgnore
        }

        $finfo    finfo_open(FILEINFO_MIME_TYPE);
        $mimeType finfo_file($finfo$this->getRealPath() ?: $this->__toString());
        finfo_close($finfo);

        return $mimeType;
    

mabye we can suggest to added that option?
Reply
#10

There is a convo that this potentially is a bug in LibreOffice:
https://bugs.documentfoundation.org/show...?id=101317
I don't see any reason why the mime type is changed after adding an image to a document.
Allowing mime type 'application/octet-stream' sounds dangerous to me.
Reply




Theme © iAndrew 2016 - Forum software by © MyBB