Development of OCR Technology

​​The essence of OCR is image recognition, and its principle is basically the same as other image recognition problems. On the whole, OCR is generally divided into two major steps, image processing, and text recognition. For simple recognition scenarios, the OCR recognition strategy we consider first is of course the simplest and violent template matching method. The template matching method is limited to some very simple scenarios. But for slightly more complex scenarios, it is not very practical.
 
In response to the shortcomings of traditional OCR solutions, everyone turned their attention to OCR based on deep learning. Taking the time of the rise of deep learning as the segmentation point, the OCR recognition framework based on this technology quickly broke through the original technical bottlenecks (such as text positioning, binarization, and text segmentation) with another new idea, and has been It is widely used in the industry. The emergence of deep learning has given OCR technology a second spring.
 
In the era of deep learning, the text recognition framework is gradually simplified. At present, there are mainly two mainstream solutions, one is a two-state solution of text line detection and text recognition, and the other is an end-to-end text recognition solution.
 
For the two-stage text recognition scheme, the main idea is to locate the position of the text line first and then recognize the content of the positioned text line. From the perspective of methods, text line detection is mainly divided into methods based on text box regression, methods based on segmentation or instance segmentation, and methods based on a mixture of regression and segmentation. From the detection ability, it has also developed from the multi-directional rectangular box to the polygonal text. The current hot spot is to solve the problem of text line detection of arbitrary shape. Text recognition has developed from single word detection and recognition to text sequence recognition. At present, sequence recognition is mainly divided into CTC-based methods and Attention-based methods.
 
Let's look at the end-to-end text recognition solution. Although the two-step method of text detection and text recognition can realize the recognition of scene text, it still requires a lot of manual knowledge when fusing the results of the two steps, and will increase time consumption. The end-to-end text recognition can complete detection and recognition tasks at the same time, which greatly improves the real-time performance of text recognition. The task of text line detection and text recognition can be completed at the same time through a model, which can improve the real-time performance of text recognition. At the same time, because the two tasks are jointly trained in the same model, the two tasks can promote each other's effects.
 
OCR text recognition can also be divided into printed text recognition and handwritten text recognition. The rise of OCR technology started with print recognition. The success of print recognition laid a solid foundation for the later development of handwriting.
 
The OCR software bundled with scanners on the market is all popular versions. At present, the recognition technology of printed OCR has reached a practical level, and the recognition rate of characters with poor printing quality has reached more than 95%. Due to the limitations of handwritten OCR technology, the products of professional OCR systems are mostly oriented to specific industries, that is, suitable for departments that need to process a large amount of form information entry every day, such as postal services, taxation, customs, statistics, etc.
 
The information format of this professional OCR system for specific industries is relatively fixed, and the recognized character set is relatively small. So it is often used in combination with dedicated input devices, so it has the characteristics of fast speed and high efficiency. The professional OCR system information format has been widely used in the world and is fully functioning. The professional version of OCR has a batch processing function, the performance is more optimized, and the recognition rate is different from the normal version of OCR.