Tuesday, June 19, 2012

How to do a word count for a PDF file

17 Rejab 1433H, Khamis.

Assalamualaikum. My thesis needed a word count for what could have been its third proofread.

By the way i'm still running Ubuntu 11.10 Oneiric Ocelot. To know why i have not upgraded to Ubuntu 12.04 Precise Pangolin, read here:
My experience upgrading to 12.04 then downgrading to 11.10
(http://ubuntudigest.blogspot.com/2012/04/my-experience-upgrading-to-1204-then.html)

This post and the two previous posts:

were initially meant to be one very looong post. It was so long and covered at least three different things that i didn't know what to put as the title. Hence these three short subsequent posts.

This post refines the word count method mentioned in this thread:
Word Count in PDF file?
(http://ubuntuforums.org/showthread.php?p=5024198#post5024198)



Part A: Installing PDF utilities (based on Poppler)

You will need to install PDF utilities (based on Poppler) before proceeding to Part B.

Pic 1 - PDF utilities in Ubuntu Software Center.


To install...


1. Run Ubuntu Software Center.


2. In the search field, type-in the search term "pdfinfo". The search results will auto-filter. See Pic 2.

Pic 2 - Refer Step 2. Searching for PDF utilities (based on Poppler).


3. Select the PDF utilities (based on Poppler), see Pic 2. Two buttons will auto-appear.

To install, click the Install button.

To read more, click the More Info button, see Pic 2. When ready to install, click the Install button.

Once the software is installed, the Install button will turn into the Remove button.


4. If an Authenticate window like that in Pic 3 appears prompting for your password, key-in your password in the Password field, then click the Authenticate button.

Pic 3 - Refer Step 4. Authenticating the installation.



Part B: Performing a word count for a PDF file

Need i to remind you that i've no idea how accurate the count is.


5. Run the Terminal from the same location as the PDF file. Right-click then select Open in Terminal from the menu. See Pic 4.

Pic 4 - Refer Step 5. Accessing the Terminal from the right-click menu.


If you do not have the Terminal shortcut in your right-click menu, you will need to run the Terminal then change the working directory. Read here how to change the directory:
UsingTheTerminal - Community Ubuntu Documentation
(https://help.ubuntu.com/community/UsingTheTerminal)

To add the Terminal shortcut in the right-click menu, read here:
How to add a Terminal shortcut in the right-click menu
(http://ubuntudigest.blogspot.com/2010/04/how-to-add-terminal-shorcut-in-righ.html)


6. In the Terminal type-in:
pdftotext nameof.pdf - | wc -w -

Source: Re: Word Count in pdf file? #5
(http://ubuntuforums.org/showpost.php?p=5024198&postcount=5)
then press the Enter button. A number will appear below the commandline. See Pic 5.

pdftotext nameof.pdf - | tr -d '.' | wc -w -

Source: Re: Word Count in pdf file? #8
(http://ubuntuforums.org/showpost.php?p=7063843&postcount=8)
then press the Enter button. A number will appear below the commandline. See Pic 5.


To paste into the Terminal, press: Ctrl + Shift + C
To copy from the Terminal, press: Ctrl + Shift + V

Pic 5 - Count. Count. Count.


Roughly, there are 25,109 to 25,120 words in my thesis according to Pic 5. Other methods/ softwares/ online services will produce varying numbers.

Wassalam.

2 comments:

  1. Good artcile, but it would be better if in future you can share more about this Keep posting.
    online word count

    ReplyDelete