curl

You are currently browsing articles tagged curl.

Amazon Listmania is nice.

It can drastically reduce the amount of back breaking research when it comes to compiling book lists on a certain topic/category. What if you were researching multiple amazon book lists and compiling your own list for some purpose? Do you copy and paste each title one by one? I hope not.

Here’s the problem: What if the book list(s) you are interested in contain 30, 60, 100 books and you need to export it to somewhere in plain old text?

Let’s say I am researching horror novels and I have found a worthy list.

I recently had this problem and I used a solution involving curl and regular expressions:

note: replace the listmania URL with your own. You will need to be on a *nix/linux/OSx machine.

curl -s http://www.amazon.com/Books-Hell-Best-Horror-Written/lm/JR6CF6ER4QJR | sed -n '/class="listItem">/s/.* alt="\([^"]*\).*/\1/p'

This will filter out all the useless junk from the page, parse out the Book Title and deliver it to you ONE LINE at a time in a readable (text) format.

This is the result:

It
Books of Blood, Vols. 1-3
The Best of H. P. Lovecraft: Bloodcurdling Tales of Horror and the Macabre
Song of Kali
The Shining
I Am Legend
The Silence of the Lambs
In the Flesh
Pet Sematary
Koko
Dark Gods
Hell House
The Haunting of Hill House
The Exorcist
Night Shift (Signet)
Something Wicked This Way Comes
The Damnation Game
The Stand: Expanded Edition: For the First Time Complete and Uncut (Signet)
The Night of the Ripper
Summer of Night (Aspect Fantasy)
Blue World
Dracula (Signet Classics)
'Salem's Lot
The Vampire Lestat (Vampire Chronicles, Book II)
Haunted

Neat huh?

If you use this, please send me a chocolate muffin and a bottle of brandy as a sign of appreciation. You are welcome 😉

Troubleshooting:
If you find that the command above stops working after several runs, Amazon is likely blocking the use of curl (I mean, who wants to be scraped uninvitingly?). In this case, simply change the default user-agent of curl to something else, so it looks like this:

curl -A 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:23.0) Gecko/20100101 Firefox/23.0' -s http://www.amazon.com/Books-Hell-Best-Horror-Written/lm/JR6CF6ER4QJR | sed -n '/class="listItem">/s/.* alt="\([^"]*\).*/\1/p'

That’s it.