- The docs say the logs rotate every hour but they don't say it's on the hour or what the rotating style is. The current log (
slowquery.log
) is rotated toslowquery.log.#
where # is the current hour (0-23) UTC. So at 18:00 UTC, the file slowquery.log.18 is written with the logs from 17:00 to 18:00 (almost. more later) - When asking for a portion of the log file, you can hand it an index from which to start reading and the number of lines you would like to retrieve (both optional). You get back a struct with a
LogFileData *string
containing log lines and aMarker *string
containing the index for the next segment of the log file. TheMarker
looks like#:#
where the first number is the hour and the second is the distance in to the file.
There are two places things get interesting: log rotation (aka the top of the hour) and asking for older data.
Let's say it's 17:58 UTC and you're happily tailing content, fetching additional lines every 5 seconds. You get to 18:00:02 or so and the marker switches from 18:xxxx to 19:0 and suddenly you're not getting any new data anymore. You wait and wait and ... at 18:05:00, you start getting data! but it's from 18:05:00; the data from between 18:00 and 18:05 is missing.
You take the last marker you got from your tail before data stopped flowing (with the 18: prefix) and use that marker as an index into
slowquery.log.18
. This gets you the data from between 18:00:02 and 18:05:00, then stops. Or! Once some amount of time (not necessarily the full 5 minutes) you can ask the slowquery.log
for data since the 18:offset marker and you'll get the data that's missing.Back you go to tailing the
slowquery.log
.Ok, so here's the next fun part - you can hand a marker to either the previous hour or the current hour to AWS when asking for the
slowquery.log
and it will give you data since then. If you do this after the 5 minute log rotation period has ended, you get all the intermediate data too! However, if you ask it for data where the number before the colon is not the current or previous hour, it ignores that and hands you back current data at the offset specified by the second half of the marker.You can ask for data from older files (the numbered log files) and pass them a marker, but in that case it always ignores the hour portion of the marker. It will happily return the hour portion you gave it, but it doesn't affect the data you get. In other words, when asking for data from
slowquery.log.8
passing it a marker of 12:123456
and 4:123456
are both treated as though they were 8:123456
.To poke around this for yourself, you have to have RDS configured to write the slow query log to a file. If that's the case, grab the DB instance identifier
aws rds describe-db-instances --output table --region us-east-1 | grep DBInstanceIdentifier
to list identifiers) and the log file aws rds describe-db-log-files --db-instance-identifier <ID> --output table --region <region> | sort -n
to show log files.Once you've got your instance identifier, log file, and region handy, download this go code[1] and give it a whirl:
go run tailrdslog.go --identifier myrdsinstanceidTry passing it different markers and log files
go run tailrdslog.go -i prod-mysql -l slowquery/mysql-slowquery.log.10 -m 14.123456and see what happens!
It is especially interesting to run this across an hour boundary to see what happens as the log rolls over. The script attempts to grab some of the data hiding in the previous log until everything to return to normal at 5 past the hour.
It will output a line every 5s (the default polling interval) with
- the current time
- the starting marker
- the length of the content downloaded
- the ending marker
- the first timestamp encountered in the slow query log
- the last timestamp encountered in the slow query log
- (optional) sleep duration before the next poll
Oct 7 21:35:43 starting from 22:20523279, got 37424 content, next section from 22:20560703. first time is 21:35:39, last time is 21:35:43 sleeping 5 sec... Oct 7 21:35:49 starting from 22:20560703, got 59056 content, next section from 22:20619759. first time is 21:35:43, last time is 21:35:48 sleeping 5 sec... Oct 7 21:35:54 starting from 22:20619759, got 54649 content, next section from 22:20674408. first time is 21:35:48, last time is 21:35:53 sleeping 5 sec... Oct 7 21:35:59 starting from 22:20674408, got 42533 content, next section from 22:20716941. first time is 21:35:54, last time is 21:35:59 sleeping 5 sec...One nice thing to notice - the starting marker plus the content length gives you the ending marker. \o/
[1] I know, and I'm sorry. It's terrible code. It's sort of OK because it's scratch code just used to explore the API. But I should be giving the world better examples. It doesn't check pointers aren't nil before derefenencing them. The control flow is awkward. Ifs are nested too deeply. Code is repeated. I encourage you to do none of those things when you cargo cult snippets in to your production application. :)
No comments:
Post a Comment