Searchable PDF

Note: This content is no longer updated. For the latest content, please go to the user guide for M-Files Online. For information on the supported product versions, refer to our lifecycle policy.

M-Files can convert images imported from external file sources into searchable PDFs using optical character recognition (OCR). This makes full-text search of scanned documents possible. After conversion, you can find the PDF document by searching the actual document content.

Optical character recognition can be performed on the following file formats:
  • TIF
  • TIFF
  • JPG
  • JPEG
  • BMP
  • PNG
  • PDF
TIFF files using an alpha channel or JPEG compression are not supported.
Note: Converting the file to a searchable PDF does not affect the outward appearance of the document when viewing it. The users still see the original scanned image. M-Files stores the automatic text recognition results in the PDF as invisible text, which is used when searching the file. Possible text recognition inaccuracies will not affect the appearance of the scanned document in any way when viewed on screen or printed.
Note: The M-Files OCR module is an M-Files add-on product available for extra fee. It can be activated with a license code. For more information, see Enabling the M-Files OCR Module and Managing Server Licenses. M-Files uses an OCR engine offered by IRIS. For the M-Files OCR module purchase inquiries, please contact our sales team at [email protected].

Do the following steps to convert images from an external file source into searchable PDFs:

  1. Open M-Files Admin.
  2. In the left-side tree view, expand a connection to M-Files server.
  3. Expand Document Vaults.
  4. Expand a vault.
  5. Expand Connections to External Sources.
  6. Click File Sources.
  7. On the File Sources list, double-click the file source that you want to edit.
    Result:The Connection Properties dialog is opened.
  8. Click the Searchable PDF tab.
    Result:The Searchable PDF tab is opened.
  9. Check the Use OCR to enable full-text search of scanned documents check box.
  10. Using the Primary language and Secondary language drop-down menus, select the primary and secondary language of the documents scanned via this external connection to improve the quality of the recognition results. The list of secondary languages only contains languages that are allowed to be used with the selected primary language.
    Although the OCR automatically recognizes all Western languages and Cyrillic character sets, specifying a language selection often improves the quality of the text recognition results. In ambiguous cases, a problematic recognition result may be resolved by a language-specific factor, such as recognition of the letter 'Ä' in Finnish. The list of secondary languages only includes languages that are allowed to be used together with the selected primary language.
  11. Optional: Check the Use hyper-compression to reduce PDF file size check box if you want to reduce the file size of the searchable PDFs created via this connection.
  12. Optional: Check the Convert to PDF/A-1b format check box if you want the converted PDF documents to comply with the ISO standard 19005-1:2005 for long-term preservation of electronic documents.
    PDF/A-1b is a more restricted format than the format of standard PDF files, so the file size of documents converted to PDF/A is often larger than that of files converted to standard PDF. In addition, by exporting to PDF/A, certain advanced appearance settings may be omitted. You should use conversion to PDF/A form only if it is particularly necessary due to, for example, the requirements for long-term preservation.
  13. Click OK to close the Connection Properties dialog.
The documents scanned via this connection are converted into searchable PDFs provided that they are in the applicable file format. After they have been imported or linked to M-Files, you can find them by searching for their content.
Note: Text recognition can also be performed via M-Files Desktop. For more information, refer to Scanning and Text Recognition (OCR). If you wish to use text recognition using external sources through the M-Files Admin only, this limitation can be set by changing the registry settings. The registry settings can be used to set other limitations as well. For more information on registry settings, write to our customer support at [email protected].