For a project that I'm working on, I want to download a set of pictures from flickr once a day. In this way, my Rails app won't need to access the flickr api - it can just grab the pics out of an images directory.

I figured that it would be an awfully easy task to download pictures from the internet using Ruby, and it is, but it was very difficult to find good examples. I googled for an hour or so, and kept coming up empty-handed. Finally, I checked in gmail where I keep mail from the Ruby Language mailing list. I found a relatively straightfoward way to do this.

Let's say that you want to download a picture from flickr, and you know its URL. Here's the ruby script to download the file:

1
2
3
4
5
6
7
8
9
10

require 'net/http'

Net::HTTP.start("static.flickr.com") { |http|
  resp = http.get("/92/218926700_ecedc5fef7_o.jpg")
  open("fun.jpg", "wb") { |file|
    file.write(resp.body)
   }
}
puts "Yay!!"

The Net::HTTP class contains the magic needed to handle this operation. I don't think this will work at all if you need to pull down a file from an FTP server. For now, we're dealing with http urls. So, just strip off the "http://" portion of the url and everything after that up to the first / goes into the start method.

Now, we get the file. The rest of your image url after what you put in the start method goes into the get method. This grabs the file from flickr.

Now we're going to copy the file down to where the script is running. First, let's create the file we're going to copy the picture into. Using the open method, the first parameter is the name of the file that you're going to plop the picture into. We could have used "218926700_ecedc5fef7_o.jpg" or anything else here. The second parameter of the open method, "wb" indicates that we're opening the file for (w)riting and we're going to be writing (b)inary information. The "b" may not be necessary on non-Windows platforms.

Finally, we're going to write into the new file, the contents or "body" of what we grabbed from flickr. So, this writes in the binary bits of the picture into fun.jpg. Remember that with the way that we created fun.jpg it'll be in the same directory with our ruby script.

This same method will copy down .html files, .css files, .pdf's and just about any other kind of file. In my next RubyNoob entry, I'll write about how to combine this method with a flickr api call to grab an arbitrary number of recent photos from flickr. As usual, I'm still a noob, and there's probably much better ways to do this. If you know a better way, please share in the comment section below.

4 Responses to “How to download files with a Ruby script”

  1. Mariano Kamp Says:
    What about this method using the power of open-uri:

    require 'open-uri'
    open("fun.jpg","w").write(open("http://static.flickr.com/92/21 8926700_ecedc5fef7_o.jpg").read)


    Here in full:

    localhost:~/Desktop mkamp$ irb
    irb(main):001:0> require 'open-uri'
    => true
    irb(main):002:0> open("fun.jpg","w").write(open("http://static.flickr.com/92/21 8926700_ecedc5fef7_o.jpg").read)
    => 157606
    irb(main):003:0> `ls`
    => "fun.jpg\n"
  2. TAD Says:
    Thanks Mariano,

    That's pretty cool. Ruby is so freaking flexible, and I think that's why we all love it so much. I was going to comment that the way you've written the code makes it less readable, but after looking at it a few times, I think it makes it _more_ readable. It's weird that having such a long line of code in our favorite language makes things easier to read.

    Also, for anyone running in Windows, make sure to add a "b" after the "w" in the second parameter of the open method Mariano has provided.

    Thanks!!
  3. evan Says:
    Stolen from ozmm:
    require 'rubygems'
    require 'rio'
    # open an URI and copy the content into a file
    rio('http://www.juretta.com/') > rio('juretta_index.html')
    
    Rio is at http://rio.rubyforge.org/. You could also use WWW::Mechanize if you need to be more browser-like (preserve history, store cookies, etc.).
  4. TAD Says:
    Thanks evan! I'd seen rio before, but while I figured out how to copy text around like you're showing, I couldn't figure out how to get binary information like a jpg to copy properly. It would create a file0.jpg file, but then windows didn't recognize it as a picture. I think it might have to do with it not copying the information as binary information as windows seems to require, and as the open-uri method allows.

Sorry, comments are closed for this article.