How to Optimize PDF Files for Search Engines

by

Most search engines are now capable of crawling non-HTML contents such as PDF files, spreadsheets and presentations. No doubt that these documents can even rank higher in search results comparative to other HTML pages.

Some important points regarding PDF optimization for search results:

  • PDF files should not be locked or password protected because search engines have no way to read or index those files.

  • The title of the PDF file plays vital role in search results. So put a proper title metadata for the PDF file. The anchor text of links pointing to the PDF file also have a weight in showing the title in search results.

  • Embedding too many images in a PDF file doesn’t hold good. Make sure that there are more readable texts in the file. Check yourself that if you can copy the text by selecting a para, search engines will also able to index that.

  • Language of a file doesn’t matter. So files written in any language or having other character encoding will not affect its indexing.

  • Links in PDF files are treated similarly to links in HTML. So, be sure to include links in the PDF files back to your own website. If someone share your PDF file it will link back to your own site helping in Page Rank and other indexing signals for search engines.

  • If you are serving the same content via HTML page and PDF file on the web, make sure you indicate the preferred one by including the preferred URL in your Sitemap or by specifying the canonical version in the HTML or in the HTTP headers of the PDF resource. Ignoring this may treat other content as the duplicate one which is not a good practice.

Here is a video by Matt Cutt from Google Webmaster Team answering a question about best practices for PDF optimization.