Filter, Parse & Group Log Data in AWS CloudWatch Logs Insights

Say you have an application that’s logging a stringified JSON to CloudWatch logs & you have a requirement to perform some kind of analysis on this data. Here’s the JSON:

{
    "clientId": "abc123",
    "message": "hello"
}

Here’s how this JSON looks when it’s logged:

2020-06-25 INFO response from server: { "clientId": "abc123", "message": "hello" }

Suppose you want to get a count of messages received from each client. Let’s see how to go about building a query in CloudWatch Logs Insights that’ll give us this output:

|----------|----------|
| clientId | count(*) |
|----------|----------|
|  abc123  |     4    |
|  def456  |     3    |
|----------|----------|

First of all, since every CloudWatch log event is itself a JSON object, we extract just the log messages from this JSON using:

fields @message

This gives us all the log statements:

2020-06-25 INFO calling server...
2020-06-25 INFO response from server: { "clientId": "abc123", "message": "hello" }
2020-06-25 INFO calling server...
2020-06-25 INFO response from server: { "clientId": "def456", "message": "hi" }

Now, let’s filter out the unwanted log statements:

fields @message |
filter @message like 'response from server'

This leaves us with:

2020-06-25 INFO response from server: { "clientId": "abc123", "message": "hello" }
2020-06-25 INFO response from server: { "clientId": "def456", "message": "hi" }

Excellent! Next, we have to extract the client ID so we can group by it later on & count the number of messages in each group. Use the parse command to extract the client ID:

fields @message |
filter @message like 'response from server' |
parse @message '"clientId": "*", "message"' as clientId

What the above parse statement does is overlay a pattern that we specified in the single quotes, over the log message & wherever it finds a wildcard like * in the pattern, it extracts that value into the field named after “as”. That’s how we extract the client ID from each log message into the newly created field named clientId.

Now, to count the number of log statements containing a particular client ID, use the stats command with the count function as shown below:

fields @message |
filter @message like 'response from server' |
parse @message '"clientId": "*", "message"' as clientId |
stats count(*) by clientId

And voila, just like that, we have our desired output:

|----------|----------|
| clientId | count(*) |
|----------|----------|
|  abc123  |     4    |
|  def456  |     3    |
|----------|----------|