M-Files API 23.11.13135.0
PerformOCROperation Method
VaultObjectFileOperations Object : PerformOCROperation Method
The object identifier for the target document file.
The file identifier for the target document file.

Various parameters for the OCR operation, such as languages to be used as recognition hints.

In scripting languages, a specific null value (e.g., Nothing in VBScript) should be used to indicate the default value.

OCR zones to be recognized as typed values. Zones are organized as a collection of pages to enable zone recognition for multipage images. Valid only if BlockRecognitionMode has value MFOCRBlockRecognitionModeRecognizeSpecifiedBlocks.

In scripting languages, a specific null value (e.g., Nothing in VBScript) should be used to indicate the default value.

If this parameter is True, the target file is converted to a searchable PDF file.
Description
Performs an optical character recognition operation for object file.
Syntax
Visual Basic
Public Function PerformOCROperation( _
   ByVal ObjVer As ObjVer, _
   ByVal FileVer As FileVer, _
   Optional ByVal OCROptions As OCROptions = 0, _
   Optional ByVal ZoneRecognitionMode As MFOCRZoneRecognitionMode = MFOCRZoneRecognitionModeNoZoneRecognition, _
   Optional ByVal ZoneRecognitionPages As OCRPages = 0, _
   Optional ByVal ConvertToSearchablePDF As Boolean = True _
) As OCRPageResults
Parameters
ObjVer
The object identifier for the target document file.
FileVer
The file identifier for the target document file.
OCROptions

Various parameters for the OCR operation, such as languages to be used as recognition hints.

In scripting languages, a specific null value (e.g., Nothing in VBScript) should be used to indicate the default value.

ZoneRecognitionMode
ValueDescription
MFOCRZoneRecognitionModeAutoDetectZonesRecognize all auto-detected zones.
MFOCRZoneRecognitionModeNoZoneRecognitionNo zone recognition.
MFOCRZoneRecognitionModeRecognizeSpecifiedZonesRecognize user-defined zones.
ZoneRecognitionPages

OCR zones to be recognized as typed values. Zones are organized as a collection of pages to enable zone recognition for multipage images. Valid only if BlockRecognitionMode has value MFOCRBlockRecognitionModeRecognizeSpecifiedBlocks.

In scripting languages, a specific null value (e.g., Nothing in VBScript) should be used to indicate the default value.

ConvertToSearchablePDF
If this parameter is True, the target file is converted to a searchable PDF file.
Return Type
OCR zone recognition results are organized as a collection of pages that enable result representation for multipage images. Each page contains a set of OCR zone results that represent recognized values for each requested OCR zone. The ID of each requested zone is copied to the corresponding zone result.
Remarks

The target object must be checked out before using this method. The target file, however, must be available from the latest object version that has been already checked in. Please note that often these preconditions are not satisfied when M-Files event handlers are executed. Therefore, it is not recommended to use this method directly from event handler scripts. 

You can, however, call this method from the "Run script" action of a workflow state. This is useful if you need to convert scanned documents to searchable PDF files as part of a workflow state transition. Make sure to call the 'GetFilesForModificationInEventHandler' method prior to calling 'PerformOCROperation' in the "Run script" action of a workflow state. See the code example "Converting scanned files to searchable PDF files in a workflow state action" below.

This method is available only if M-Files API is used in the server interface mode.

The OCR module for M-Files must be installed and activated.

Example
Option Explicit

' Prepare the files of the object for modification by script.
Dim files
Set files = Vault.ObjectFileOperations.GetFilesForModificationInEventHandler( ObjVer )

' Prepare OCR options.
Dim opts
Set opts = CreateObject( "MFilesAPI.OCROptions" )
opts.PrimaryLanguage = MFOCRLanguageEnglishUS
opts.SecondaryLanguage = MFOCRLanguageFinnish

' Perform OCR on each of the convertible files.
Dim file
For Each file In files

    ' Is the file in a convertible file format?
    If  file.Extension = "tif" Or _
        file.Extension = "tiff" Or _
        file.Extension = "jpg" Or _
        file.Extension = "jpeg" Or _
        file.Extension = "pdf" Then

        ' Convert this file to searchable PDF.
        Vault.ObjectFileOperations.PerformOCROperation ObjVer, file.FileVer, _
                opts, MFOCRZoneRecognitionModeNoZoneRecognition, Nothing, True

    End If

Next
' Initialize the API and connect to a vault.
Dim oServerApp As MFilesAPI.MFilesServerApplication = New MFilesAPI.MFilesServerApplication
Dim oVault As MFilesAPI.Vault
' ...

' Initialize the object version.
Dim oObjectVersion As MFilesAPI.ObjectVersion
oObjectVersion = ...

' Check out the object first. We assume that initially the object is checked in.
Dim oObjectVersionCheckedOut As MFilesAPI.ObjectVersion
oObjectVersionCheckedOut = oVault.ObjectOperations.CheckOut(oObjectVersion.ObjVer.ObjID)

' Simply process all the files of the object.
Dim oObjectFiles As MFilesAPI.ObjectFiles = oVault.ObjectFileOperations.GetFiles(oObjectVersionCheckedOut.ObjVer)
For Each oObjectFile As MFilesAPI.ObjectFile In oObjectFiles

    ' Specify OCR options.
    Dim oOcrOptions As New MFilesAPI.OCROptions
    oOcrOptions.PrimaryLanguage = MFilesAPI.MFOCRLanguage.MFOCRLanguageFinnish
    oOcrOptions.SecondaryLanguage = MFilesAPI.MFOCRLanguage.MFOCRLanguageEnglishUS

    ' Specify an OCR zone to be recognized.
    Dim oOcrZone As New MFilesAPI.OCRZone
    oOcrZone.DataType = MFilesAPI.MFDataType.MFDatatypeText
    oOcrZone.DimensionUnit = MFilesAPI.MFOCRDimensionUnit.MFOCRDimensionUnitMillimeterX10
    oOcrZone.ID = 1
    oOcrZone.Left = 1650 ' This is interpreted as 165.0 mm.
    oOcrZone.Top = 220 ' This is interpreted as 22.0 mm.
    oOcrZone.Width = 300 ' This is interpreted as 30.0 mm.
    oOcrZone.Height = 100 ' This is interpreted as 10.0 mm.

    ' Construct an OCR page object and add the OCR zone to this OCR page.
    Dim oOcrPage As New MFilesAPI.OCRPage
    oOcrPage.OCRZones.Add(0, oOcrZone)

    ' Indicate that all zones contained by this OCR page are
    ' recognized on the page 1 of the source image.
    oOcrPage.PageNum = 1

    ' Construct an OCR page collection and add the OCR page to this collection.
    Dim oOcrPages As New MFilesAPI.OCRPages
    oOcrPages.Add(0, oOcrPage)

    ' Invoke the OCR operation for the target file by requesting
    ' 1) OCR zone recognition with specific OCR zones, and
    ' 2) conversion to a searchable PDF.
    Dim oOcrPageResults As MFilesAPI.OCRPageResults = _
    oVault.ObjectFileOperations.PerformOCROperation( _
        oObjectVersionCheckedOut.ObjVer, _
        oObjectFile.FileVer, _
        oOcrOptions, _
        MFilesAPI.MFOCRZoneRecognitionMode.MFOCRZoneRecognitionModeRecognizeSpecifiedZones, _
        oOcrPages, _
        True _
    )

    ' Process the OCR zone recognition results.
    For Each oOcrPageResult As MFilesAPI.OCRPageResult In oOcrPageResults
    For Each oOcrZoneResult As MFilesAPI.OCRZoneResult In oOcrPageResult.OCRZoneResults
        Call Console.WriteLine("ID: " + CStr(oOcrZoneResult.ID))
        Call Console.WriteLine("Recognized value: " + oOcrZoneResult.ResultValue.DisplayValue)
    Next
    Next

Next

' Check in the object to finalize.
oVault.ObjectOperations.CheckIn(oObjectVersionCheckedOut.ObjVer)


See Also

VaultObjectFileOperations Object  | VaultObjectFileOperations Members