404 Error Handling with R’s download.file()

404 = Not Found

This is a “client side” error – indicating that you, the individual sending a request to a server, have made some kind of error. In my most recent run-in with this error, I scraped 10,000+ links in order to downloaded image files using download.file(). Over a thousand of the links were duplicates and over 500 of the non-duplicate links were dead links – where I incurred the 404 error.

The Lazy Way

In the case above, I used httr::GET(link) wrapped inside a tryCatch() to return the response header to use in a conditional statement.

tryCatch({
     download.file("http://file.jpg", "file.jpg")
     }, error = function(e){
          response <- httr::GET("http://file.jpg")
          if(response$status_code == 404){
               do.something()
          }, else {
               do.something.else()
          }
     })

The Better Way

Using try catch, you can return the error message and use grepl() to search for 404.

tryCatch({
     download.file("http://file.jpg", "file.jpg"
     }, error = function(e){
          if(grepl("404", e$message)){
               do.something()
          }, else {
               do.something.else()
          }
     })

Another Possibilty

Within download.file() you can give the argument method the value of either curl or wget, both of which will throw an error code if an error is encountered.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s