API-scrape images from Instagram
An image can say more than a thousand words, especially when you add a retro-filter and a score of hashtags to go with it. That is a basic explanation to the functionality of Instagram; the power app which revolutionised peoples creativity when it came to documenting dietary habits… and popularised images in social media.
Instagram brought social into photography in a way other more desktop-oriented photo sharing applications like Picasa and Flick never managed. It is social and users can like and comment on other’s pictures. Instagram also enhances images by reducing their photographic clarity (let’s emulate cameras far less technologically advance by adding a filter), but then again, this adds the style to images, and makes some of them look cool. However, I will let the filters and pastulation (the digital emulation of an analogue past – coined by moi, but please let me know if there is a buzzword for this and I may conform) rest for now. Let us instead focus – pun intended – on something else: the tagging of images, and how to retrieve images with a certain tag.
Adding context and indices with tags
An instagram picture may be tagged with several hashtags, these are words or concatenated words prepended with an hashtag #. The great thing with # is that they are 1) providing a social signifier for the users that this is a tag and hence, the users can use this tag to organise their content and create a social context for which the photo exists e.g. #instafood (picture of a meal), #selfie (a person taking a picture of him/herself usually in together with..), #duckface (quack quack pouting) and #onedirection (popular teenage idols). Tags can be of any kind, from current affair, to more general stuff. 2) providing a token to declare something indexable for the Instagram-server and other technical resources. Once the computer system knows it’s a tag it may group the tags together, perform analysis on the tag and users associated with this tag, aggregate statistics on the tag and other stuff to enhancing the user experience. In our case the tagging is great as we want to retrieve images with a given tag.
The #InstagramTagGetterScript
Below is a script which takes the tagname as an argument and downloads the images and meta-data associated with these images. To get it to work you will need to obtain an API-key from Instagram’s developer page. This URL you can put into the inital request sent to the server (that being stored into the next_url variable). We are using the tags-endpoint to download the images.
The rough outline of the script is as follows:
First we define a class to store each InstaEntry, and this class comes with the functionality to retrieve and store the image and metadata, as well as dump the data to disk and load the data from disk. The class holds all the variables we are interested in collecting, and once instantiated these variables are set unless they do not exist with the image.
Once the structure is created some inital parameters are set: the tag and our initial URL-request, and the folders into where we will store data are created. When everything is set up we run a loop which continues to run as long as there are data available and we get responses with HTTP 200-status (OK). The loop instantiates an InstaEntry for each image which then download images as well as metadata on the fly. The objects are retained until the program is fully executed, but all large data (see images) are downloaded directly and not kept in memory.
Please contact me if you want to use this script, tailor it, or have any questions related to it.
#!/usr/bin/ruby # encoding: UTF-8 require 'active_support' require 'restclient' require 'csv' require 'open-uri' require 'fileutils' class InstaEntry attr_accessor :id, :username, :picture_url, :likes, :filter, :location, :type, :caption, :tags, :fullname, :user_id, :created_time, :link def initialize(id) @id = id @@threads = [] end def marshal_dump [@id, @username, @picture_url, @likes, @filter, @location, @type, @caption, @tags, @fullname, @user_id, @created_time, @link] end def marshal_load(variables) @id = variables[0] @username = variables[1] @picture_url = variables[2] @likes = variables[3] @filter = variables[4] @location = variables[5] @type = variables[6] @caption = variables[7] @tags = variables[8] @fullname = variables[9] @user_id = variables[10] @created_time = variables[11] @link = variables[12] end def to_arr [@id, @username, @picture_url, @likes, @filter, @location, @type, @caption, @tags, @fullname, @user_id, @created_time, @link] end def self.get_image(obj,tag) @@threads << Thread.new(obj,tag) { begin open("images_#{tag}/#{obj.id}_#{obj.username}_.#{obj.picture_url.match('\.(jpe?g|gif|png)')[1]}","wb") do |file| file << open("#{obj.picture_url}").read end rescue puts "ERROR: #{obj.id} triggered an Exception in get_image method" end } end def self.print_metadata(obj,tag) open("md_#{tag}/#{@id}_#{@username}.txt","wb") do |file| file.print(obj.to_arr) end end end #end InstaEntry class # # This block sets the parameters, and reads the first word for keyboard to be file # raise ArgumentError, "Missing name of tag to download" if ARGV.length < 1 $tag = ARGV[0] output = open("output.json","wb") next_url = URI::encode("https://api.instagram.com/v1/tags/#{$tag}/media/recent?access_token=51998418.d146264.e77441adc4a04399874a19b48bb91e71f&min_id=1") # NB: The access token above is similar to a token, but obfuscated. Get your own by retrieving a developer account at Instagram. puts next_url unless File.directory?("md_#{$tag}") FileUtils.mkdir_p("md_#{$tag}") end unless File.directory?("images_#{$tag}") FileUtils.mkdir_p("images_#{$tag}") end count = 0 instas = {} # # This blocks run through all the subsequent pagination pages. Stop when stumbles upon HTTP code not being 200 or if the access string is shorter or like 5 characters. # begin response = RestClient.get(next_url) json = ActiveSupport::JSON.decode(response) pretty_json = JSON.pretty_generate(json) puts "Status code #{json['meta']['code']} for URL #{next_url}.. Fetching" next_url = json['pagination']['next_url'] sleep 2 # loop through the data elements json['data'].each do |item| puts item['link'] puts item['user']['full_name'] ie = InstaEntry.new( item['id']) instas[item['id']] = ie ie.username = item['user']['username'] ie.picture_url = item['images']['standard_resolution']['url'] ie.likes = item['likes']['count'] ie.filter = item['filter'] ie.location = item['location'] ie.type = item['type'] ie.caption = item['caption']['text'] unless item['caption'].nil? or item['caption']['text'].nil? ie.tags = item['tags'] ie.fullname = item['user']['full_name'] ie.user_id = item['user']['id'] ie.created_time = item['created_time'] ie.link = item['link'] InstaEntry.get_image(ie,$tag) InstaEntry.print_metadata(ie,$tag) end count += 1 output << pretty_json puts "Now checked __ #{count} __ files and __#{instas.length}__ number of instas" puts "*****Ending with #{count} __ files and __#{instas.length}__ number of instas****" if next_url.nil? end while not next_url.nil? output.close File.open("instadump_#{$tag}",'wb') do |f| f.write Marshal.dump(instas) end CSV.open("output_#{$tag}.csv", "wb", {:col_sep => "\t"}) do |csv| instas.each do |k,v| csv << instas[k].to_arr end end
Disclaimer: Enabling you to download images associated with tags does not make me say that you can do whatever you want to. First, please refer to the Instagram guidelines to confirm that you are actually allowed to download images. Second, respect the individual users privacy and immaterial content rights, do not use images in a publishing context without the users consent. Generally: be nice, and do good.
4 thoughts on “API-scrape images from Instagram”
Hey Is this in Python?
Hi,
no this code is written in Ruby.
Hi Ola,
Can you please tell me how can I possibly use this in real world? I have no knowledge of Ruby but I am after a code that can let me download instagram images with a unique hashtag (ie: #decemberwedding) to a local folder in my harddrive.
Thanks,
Larry
Hi Larry,
If you would like to tinker with Ruby you could rewrite the script with a new key and enter the tag as an argument to the script. Since you write you don’t know Ruby I had a look online for a GUI-tool and found this. Hopefully this may help you further: http://www.makeuseof.com/tag/instagramdownloader-download-images-instagram-user-windows/