Wednesday 24 October 2012

read word files using php

In php tutorial we will see how to read word document .doc and .docx into browser. Generally it is not possible to read word file into browser with php. While pdfs can easily be embed into html.

But here we will see two different method which works well to displays characters from word .doc and .docx file.

To create .docx files with php you can use phpdocx. While if you wanna create pdf you can use fpdf and mpdf.

I suggest you to prefer method 1

Method 1: COM object to read MS WORD files. This works well with .docx and .doc


<div style="border:2px solid #1a4572; width:720px;padding:15px">
<?php

$filename = 'msword.docx';
$word = new COM("word.application") or die ("Could not initialise MS Word object.");
$word->Documents->Open(realpath($filename));

// Extract content.
$content = (string) $word->ActiveDocument->Content;

echo nl2br($content);

$word->ActiveDocument->Close(false);

$word->Quit();
$word = null;
unset($word);
?>
</div>


Method 2 : This works well with .doc 

if(file_exists($filename))
{
    if(($fh = fopen($filename, 'r')) !== false ) 
    {
       $headers = fread($fh, 0xA00);

       // 1 = (ord(n)*1) ; Document has from 0 to 255 characters
       $n1 = ( ord($headers[0x21C]) - 1 );

       // 1 = ((ord(n)-8)*256) ; Document has from 256 to 63743 characters
       $n2 = ( ( ord($headers[0x21D]) - 8 ) * 256 );

       // 1 = ((ord(n)*256)*256) ; Document has from 63744 to 16775423 characters
       $n3 = ( ( ord($headers[0x21E]) * 256 ) * 256 );

       // 1 = (((ord(n)*256)*256)*256) ; Document has from 16775424 to 4294965504 characters
       $n4 = ( ( ( ord($headers[0x21F]) * 256 ) * 256 ) * 256 );

       // Total length of text in the document
       $textLength = ($n1 + $n2 + $n3 + $n4);

       $extracted_plaintext = fread($fh, $textLength);

       // simple print character stream without new lines
       //echo $extracted_plaintext;

       // if you want to see your paragraphs in a new line, do this
       echo nl2br($extracted_plaintext);
       // need more spacing after each paragraph use another nl2br
    }
}

9 comments:

  1. thank you ,it's working fine

    ReplyDelete
  2. Replies
    1. Which version of php you are using.

      If you find difficulty with com then try to use second

      I think in php's new version you may have to set in php.ini manually

      Delete
  3. hi,
    if i want to use only function read .doc and .docx, did i need to purchase phpdocx??

    ReplyDelete
  4. thanks it work perfect. but i want to display line by line format. so how it can be possible.. please help..

    thanks
    satyawan

    ReplyDelete
    Replies
    1. You need to use nlbr function to make line break

      Delete
  5. thank it work successfully but i want same as docx file to be converted in pdf

    if i upload admit.docx. admit.docx file have arrangment and photo
    so i want asties same docx to pdf

    thanks

    Please guideline me any suggestion

    ReplyDelete
    Replies
    1. well these code not mean for conversion of files. these code are just to read .doc(x) files. If you want to convert FORMATTED doc file then you cant use these code.
      You need to look for other plugins like phpdocx, livedocx

      some link that may help you for conversion are
      1) http://www.phplivedocx.org/articles/using-livedocx-without-the-zend-framework/
      2)http://stackoverflow.com/questions/5538584/convert-word-doc-docx-and-excel-xls-xlsx-to-pdf-with-php

      Delete