Class TesseractTextExtractor

java.lang.Object
org.opencastproject.textextractor.tesseract.TesseractTextExtractor
All Implemented Interfaces:
TextExtractor, org.osgi.service.cm.ManagedService

public class TesseractTextExtractor extends Object implements TextExtractor, org.osgi.service.cm.ManagedService
Commandline wrapper around tesseract' tesseract command.
  • Field Details

    • TESSERACT_BINARY_DEFAULT

      public static final String TESSERACT_BINARY_DEFAULT
      Default name of the tesseract binary
      See Also:
    • TESSERACT_BINARY_CONFIG_KEY

      public static final String TESSERACT_BINARY_CONFIG_KEY
      Configuration property that defines the path to the tesseract binary
      See Also:
    • TESSERACT_OPTS_CONFIG_KEY

      public static final String TESSERACT_OPTS_CONFIG_KEY
      Configuration property that defines additional tesseract options like the language or the pagesegmode to use. This is just appended to the command line when tesseract is called.
      See Also:
  • Constructor Details

    • TesseractTextExtractor

      public TesseractTextExtractor()
      Creates a new tesseract command wrapper that will be using the default binary.
    • TesseractTextExtractor

      public TesseractTextExtractor(String binary)
      Creates a new tesseract command wrapper that will be using the given binary.
      Parameters:
      binary - the tesseract binary
  • Method Details

    • setAdditionalOptions

      public void setAdditionalOptions(String addOptions)
      Sets additional options for tesseract calls.
      Parameters:
      addOptions -
    • getAdditionalOptions

      public String getAdditionalOptions()
      Returns the additional options for tesseract..
      Returns:
      additional options
    • extract

      public List<String> extract(File image) throws TextExtractorException
      Extracts text from the image and returns it as a set of lines in the text frame.
      Specified by:
      extract in interface TextExtractor
      Parameters:
      image - the image
      Returns:
      the text
      Throws:
      TextExtractorException - if text extraction fails
      See Also:
    • updated

      public void updated(Dictionary properties)
      Specified by:
      updated in interface org.osgi.service.cm.ManagedService
    • activate

      public void activate(org.osgi.service.component.ComponentContext cc)