When using Cascading to run MapReduce jobs in production, the most common exception we find in our job logs look like this:
This exception tells us that the job failed because of map or reduce task failures, but doesn’t give us any information about the actual cause of the failures. We can get that information from task logs on the JobTracker, but that can be a hassle, especially if the job has already been flushed to the JobTracker’s history.
To save ourselves the hassle of looking up the failures, we’ve modified our LoggingFlow helper class to fetch remote task failure information and log it alongside the Cascading job failure message. Our logs are now a bit more informative:
It’s not a huge change, but has certainly saved us time and headaches hunting down the cause of failures. This logging is now enabled automatically when using our CascadingUtil helper class in our cascading_ext library.