At 10pt and 300 DPI, x-heights are typically about 20 pixels. Consider the resolution as well as point size – OCR accuracy drops off below 10pt, rapidly below 8pt (with resolutions 300 DPI). There is a minimum text size for reasonable accuracy. For the best results, try to make sure the text height is at least 20 pixels. The recommended text size in the scanned documents is 10 points or higher. #4 Increase the text size of the source images Adjust high contrast in such a way that characters are distinctive. When using a scanner (or an image editor if there is no way to scan the document again), you can adjust gamma and contrast to get clearer outputs. #3 Enhance the contrast of imagesĬontrast and density are vital factors to consider before OCR'ing an image.
Therefore, select a lossless file format, such as TIFF or high-quality PDF when scanning the source file. If you scan to a TIFF without compression, no image information (roughly speaking, pixels) will be lost.
To let OCR software extract text more precisely, choose a lossless file format, e.g., TIFF.
#2 Select a lossless output format when scanning With high image resolution, OCR engine should be able to recognize high contrasts, character borders, pixel noise, and aligned characters. Preferably, scan at 600 DPI to capture as much image information as possible. One of the most significant factors is DPI (Dots per Inch).
#1 Improve the quality of the source images The OCR results are considered to be good if the recognized text is 98-99% accurate (1-2% of OCR incorrect).īelow are some tips which will help you achieve better OCR results.
Understanding the limitations of the OCR process can help you assist the OCR engine in producing more accurate results. Short advice here is to make sure that the input files have high quality – large format and high resolution. The quotation marks are not optional.Text may be incorrect or corrupted after conversion with OCR. To create furigana in your posts, use the following syntax: このコードを書いたら (#fg "かんじ")
See the r/LearnJapanese Starter's Guide for information on how to get started.Ĭlick here for the full rules. Please check our list of FAQs before posting your question. No "how do I learn" postsīroad questions on how to learn Japanese, kanji, what app/textbook to start with, etc. Such posts will be removed repeat offenders and posters found to be deliberately evading the Automoderator warnings will be banned. Translation checks/proofreading including homework help,Īnd so on. Requests for transcription/transliteration, Posts asking for them anyway will be removed, and the poster may be temporarily banned. Please submit these to /r/translator instead. Translation and transcription/transliteration requests are not allowed in /r/japanese. We welcome posts about Japan and cultural exchange in Japanese and English. r/Japanese is a subreddit for bilingual discussion and exchange centering on Japan, its people, language and culture.