Amazon Listmania is nice.
It can drastically reduce the amount of back breaking research when it comes to compiling book lists on a certain topic/category. What if you were researching multiple amazon book lists and compiling your own list for some purpose? Do you copy and paste each title one by one? I hope not.
Here’s the problem: What if the book list(s) you are interested in contain 30, 60, 100 books and you need to export it to somewhere in plain old text?
Let’s say I am researching horror novels and I have found a worthy list.
I recently had this problem and I used a solution involving curl and regular expressions:
note: replace the listmania URL with your own. You will need to be on a *nix/linux/OSx machine.
curl -s http://www.amazon.com/Books-Hell-Best-Horror-Written/lm/JR6CF6ER4QJR | sed -n '/class="listItem">/s/.* alt="\([^"]*\).*/\1/p'
This will filter out all the useless junk from the page, parse out the Book Title and deliver it to you ONE LINE at a time in a readable (text) format.
This is the result:
It Books of Blood, Vols. 1-3 The Best of H. P. Lovecraft: Bloodcurdling Tales of Horror and the Macabre Song of Kali The Shining I Am Legend The Silence of the Lambs In the Flesh Pet Sematary Koko Dark Gods Hell House The Haunting of Hill House The Exorcist Night Shift (Signet) Something Wicked This Way Comes The Damnation Game The Stand: Expanded Edition: For the First Time Complete and Uncut (Signet) The Night of the Ripper Summer of Night (Aspect Fantasy) Blue World Dracula (Signet Classics) 'Salem's Lot The Vampire Lestat (Vampire Chronicles, Book II) Haunted
If you use this, please send me a chocolate muffin and a bottle of brandy as a sign of appreciation. You are welcome 😉
Troubleshooting: If you find that the command above stops working after several runs, Amazon is likely blocking the use of curl (I mean, who wants to be scraped uninvitingly?). In this case, simply change the default user-agent of curl to something else, so it looks like this:
curl -A 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:23.0) Gecko/20100101 Firefox/23.0' -s http://www.amazon.com/Books-Hell-Best-Horror-Written/lm/JR6CF6ER4QJR | sed -n '/class="listItem">/s/.* alt="\([^"]*\).*/\1/p'