Information Extraction and Noise Reduction of Images in PDF Files