Monitor AWS by Application


Learn from our challenges and triumphs as our talented engineering team offers insights for discussion and sharing.

Monitor AWS by Application


At LiveRamp, we have a couple of applications running in Amazon’s cloud infrastructure to handle spikes in traffic. Knowing how many instances are powering a given internal application requires navigating several clunky AWS console screens that, out of the box, are not application aware. There are a couple of things one can do like add tags or use a service such as Heroku to build your application but, we found these solutions too heavy handed to answer some simple questions; How many instances are currently powering our application? How many instances are powering our application within a specific auto-scaling group (ASG) or elastic load balancer (ELB)? One must visit at least two different, slow-loading screens, during which the number of instances might change while you are moving between the screens. The graphical console takes time to load and gather all of the data necessary to display the information. There has to be an easier way so, we made one.

At LiveRamp, Ops is always at the command line doing something so it made sense for us to write a couple of scripts using the python boto library and spit that information out to the terminal. First we must define our applications. We use a simple yaml file with the following format:

    elb: ['app1-elb1', 'app1-elb2']
    asg: ['app1-asg']
    elb: ['app2-elb1', 'app2-elb2']
    asg: ['app2-asg', 'app2-spot-asg']

With this yaml file, our scripts will contact only those ELBs or ASGs associated with an internal application specified with a command-line option, without having to remember which ELB/ASG is associated with which application.

The main part of the script is checking the ELBs and ASGs for the number of instances by querying the relative type to get their particular states such as ‘in service’, ‘terminating’, etc.

def get_instance_ids(conn, aws_objs):
    ids = {}

        if type(conn) == boto.ec2.autoscale.AutoScaleConnection:
            asgs = conn.get_all_groups(names=aws_objs)
            for asg in asgs:
                for instance in asg.instances:
                    state = str(instance.lifecycle_state)
                    id = str(instance.instance_id)
                    if state in ids:
                        ids[state] = [id]
        elif type(conn) == boto.ec2.elb.ELBConnection:
            for elb in aws_objs:
                instances = conn.describe_instance_health(elb)
                for instance in instances:
                    state = instance.state
                    id = str(instance.instance_id)
                    if state in ids:
                        ids[state] = [id]
            raise "Not Implemented yet"
    except boto.exception.BotoServerError:
        raise "Not a valid request to Amazon!"

    return ids

Finally, it is just a matter of presenting that information on the command-line sorted by state.
Knowing how many instances are in each ASG and ELB is not enough. If an instance is in the ASG(s) and not in the ELB(s), then it is under-utilized and wasting money. We created another script that checks the instance ids in each type and will add or remove instances to or from the ELB(s). It is as simple as comparing 2 sets of instance ids.

# get instance count from each elb for comparison
for elb in app_map[region][app]['elb']:
    result['elbs'][elb] = aws.get_instance_ids(conns['elb'],[elb])

# compare'em
for elb in result['elbs'].keys():
    diff = set(result['asgs']) - set(result['elbs'][elb]['InService'])
    if len(diff) > 0 and balance:
    if 'OutOfService' in result['elbs'][elb] and 
         len(result['elbs'][elb]['OutOfService']) > 0 and 
        """ TODO: need to check that these instances are not in the ASG
                  before removing them from the ELB

In summary, there are number of ways to monitor your applications and there are a number of companies that will create and monitor your applications. If you want to do it yourself there are ways to do that as well. This is just one example on how to efficiently monitor your applications in the cloud and ensure that all instances in the ASG are being utilized from the command-line using python.