Class TesseractTextExtractor
java.lang.Object
org.opencastproject.textextractor.tesseract.TesseractTextExtractor
- All Implemented Interfaces:
TextExtractor,org.osgi.service.cm.ManagedService
public class TesseractTextExtractor
extends Object
implements TextExtractor, org.osgi.service.cm.ManagedService
Commandline wrapper around tesseract'
tesseract command.-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final StringConfiguration property that defines the path to the tesseract binarystatic final StringDefault name of the tesseract binarystatic final StringConfiguration property that defines additional tesseract options like the language or the pagesegmode to use. -
Constructor Summary
ConstructorsConstructorDescriptionCreates a new tesseract command wrapper that will be using the default binary.TesseractTextExtractor(String binary) Creates a new tesseract command wrapper that will be using the given binary. -
Method Summary
Modifier and TypeMethodDescriptionvoidactivate(org.osgi.service.component.ComponentContext cc) Extracts text from the image and returns it as a set of lines in the text frame.Returns the additional options for tesseract..voidsetAdditionalOptions(String addOptions) Sets additional options for tesseract calls.voidupdated(Dictionary properties)
-
Field Details
-
TESSERACT_BINARY_DEFAULT
Default name of the tesseract binary- See Also:
-
TESSERACT_BINARY_CONFIG_KEY
Configuration property that defines the path to the tesseract binary- See Also:
-
TESSERACT_OPTS_CONFIG_KEY
Configuration property that defines additional tesseract options like the language or the pagesegmode to use. This is just appended to the command line when tesseract is called.- See Also:
-
-
Constructor Details
-
TesseractTextExtractor
public TesseractTextExtractor()Creates a new tesseract command wrapper that will be using the default binary. -
TesseractTextExtractor
Creates a new tesseract command wrapper that will be using the given binary.- Parameters:
binary- the tesseract binary
-
-
Method Details
-
setAdditionalOptions
Sets additional options for tesseract calls.- Parameters:
addOptions-
-
getAdditionalOptions
Returns the additional options for tesseract..- Returns:
- additional options
-
extract
Extracts text from the image and returns it as a set of lines in the text frame.- Specified by:
extractin interfaceTextExtractor- Parameters:
image- the image- Returns:
- the text
- Throws:
TextExtractorException- if text extraction fails- See Also:
-
updated
- Specified by:
updatedin interfaceorg.osgi.service.cm.ManagedService
-
activate
public void activate(org.osgi.service.component.ComponentContext cc)
-