2016-10-10

Working with the MySQL slow query log on RDS with the AWS Go SDK

In order to let you submit RDS logs directly to Honeycomb, I've been exploring how to use the go RDS SDK to tail the mysql slow query log. It's fascinating the things I've discovered that I didn't find in the docs. (All of this is probably there somewhere, I just failed to find it.)
  • The docs say the logs rotate every hour but they don't say it's on the hour or what the rotating style is. The current log (slowquery.log) is rotated to slowquery.log.# where # is the current hour (0-23) UTC. So at 18:00 UTC, the file slowquery.log.18 is written with the logs from 17:00 to 18:00 (almost. more later)
  • When asking for a portion of the log file, you can hand it an index from which to start reading and the number of lines you would like to retrieve (both optional). You get back a struct with a LogFileData *string containing log lines and a Marker *string containing the index for the next segment of the log file.  The Marker looks like #:# where the first number is the hour and the second is the distance in to the file.
Asking for current data is rather straightforward. First ask for one line of  the current log file with no marker. You'll get back the most recent line (which you throw away) and the current marker. Wait a bit, then ask for the current file with the marker, and you'll get all lines since that first one. Repeat.

There are two places things get interesting: log rotation (aka the top of the hour) and asking for older data.

Let's say it's 17:58 UTC and you're happily tailing content, fetching additional lines every 5 seconds. You get to 18:00:02 or so and the marker switches from 18:xxxx to 19:0 and suddenly you're not getting any new data anymore. You wait and wait and ... at 18:05:00, you start getting data! but it's from 18:05:00; the data from between 18:00 and 18:05 is missing.

You take the last marker you got from your tail before data stopped flowing (with the 18: prefix) and use that marker as an index into slowquery.log.18. This gets you the data from between 18:00:02 and 18:05:00, then stops.  Or! Once some amount of time (not necessarily the full 5 minutes) you can ask the slowquery.log for data since the 18:offset marker and you'll get the data that's missing.

Back you go to tailing the slowquery.log.

Ok, so here's the next fun part - you can hand a marker to either the previous hour or the current hour to AWS when asking for the slowquery.log and it will give you data since then. If you do this after the 5 minute log rotation period has ended, you get all the intermediate data too! However, if you ask it for data where the number before the colon is not the current or previous hour, it ignores that and hands you back current data at the offset specified by the second half of the marker.

You can ask for data from older files (the numbered log files) and pass them a marker, but in that case it always ignores the hour portion of the marker. It will happily return the hour portion you gave it, but it doesn't affect the data you get. In other words, when asking for data from slowquery.log.8 passing it a marker of 12:123456 and 4:123456 are both treated as though they were 8:123456.

To poke around this for yourself, you have to have RDS configured to write the slow query log to a file. If that's the case, grab the DB instance identifier aws rds describe-db-instances --output table --region us-east-1 | grep DBInstanceIdentifier to list identifiers) and the log file aws rds describe-db-log-files --db-instance-identifier <ID> --output table --region <region> | sort -n to show log files.

Once you've got your instance identifier, log file, and region handy, download this go code[1] and give it a whirl:
go run tailrdslog.go --identifier myrdsinstanceid
Try passing it different markers and log files
go run tailrdslog.go -i prod-mysql -l slowquery/mysql-slowquery.log.10 -m 14.123456
and see what happens!

It is especially interesting to run this across an hour boundary to see what happens as the log rolls over. The script attempts to grab some of the data hiding in the previous log until everything to return to normal at 5 past the hour.

It will output a line every 5s (the default polling interval) with
  • the current time
  • the starting marker
  • the length of the content downloaded
  • the ending marker
  • the first timestamp encountered in the slow query log
  • the last timestamp encountered in the slow query log
  • (optional) sleep duration before the next poll
Oct  7 21:35:43 starting from 22:20523279, got 37424 content, next section from 22:20560703. first time is 21:35:39, last time is 21:35:43 sleeping 5 sec...
Oct  7 21:35:49 starting from 22:20560703, got 59056 content, next section from 22:20619759. first time is 21:35:43, last time is 21:35:48 sleeping 5 sec...
Oct  7 21:35:54 starting from 22:20619759, got 54649 content, next section from 22:20674408. first time is 21:35:48, last time is 21:35:53 sleeping 5 sec...
Oct  7 21:35:59 starting from 22:20674408, got 42533 content, next section from 22:20716941. first time is 21:35:54, last time is 21:35:59 sleeping 5 sec...
One nice thing to notice - the starting marker plus the content length gives you the ending marker. \o/

[1] I know, and I'm sorry. It's terrible code. It's sort of OK because it's scratch code just used to explore the API. But I should be giving the world better examples. It doesn't check pointers aren't nil before derefenencing them. The control flow is awkward. Ifs are nested too deeply. Code is repeated. I encourage you to do none of those things when you cargo cult snippets in to your production application. :)

2016-04-01

Simulating network delay on OSX

As of Mac OSX Yosemite, the tool to manipulate the networking stack changed from ipfw to pf.
There are many guides out there on the internet for both ipfw and pf, but it took me a while to sort through which ones are still applicable. Also, the most common use case is bandwidth throttling (just like I wrote about when throttling my FreeNAS uploads), but I was interested in doing some network testing.

Goal: see how my client/server app behaved with a network delay of 500ms by inserting a delay between every packet.
Tools:
  • pfctl - the utility to manipulate the PF packet filter
  • dnctl - the utility to manipulate dummynet, the part that actually inserts delay
Summary:
enable pf
pfctl -E
add a temporary extra ruleset (called "anchor") named "deeelay
(cat /etc/pf.conf && echo "dummynet-anchor \"deeelay\"" && echo "anchor \"deeelay\"") | sudo pfctl -f -
add a rule to the deeelay set to send any traffic to my app (running on port 8081) to dummynet pipe 1
echo "dummynet in quick proto tcp from any to any port 8081 pipe 1" | sudo pfctl -a deeelay -f -
Add a rule to dummynet pipe 1 to delay every packet by 500ms
sudo dnctl pipe 1 config delay 500
View your all PF rules
sudo pfctl -sa
View your PF dummynet anchors
sudo pfctl -sd
View the rules in your PF dummynet anchor
sudo pfctl -a "deeelay" -sd
View your dummynet pipe
sudo dnctl show
When you're done, reset everything to its default state
sudo pfctl -f /etc/pf.conf
View the results:

Use tshark (terminal) or wireshark (GUI) to view your traffic and verify that there is delay inserted into the conversation.
Read the dnctl and pfctl man pages for more goodies.

2016-03-31

Log all mysql queries

For the record, to log all mysql queries (as opposed to the slow query log, which only runs "slow" queries, even when you set the threshold to 0) on a running database (i.e. without a restart):
mysql> set global general_log=1;
Query OK, 0 rows affected (0.02 sec)

mysql> select @@global.general_log;
+----------------------+
| @@global.general_log |
+----------------------+
|                    1 |
+----------------------+
1 row in set (0.00 sec)

mysql> select @@global.general_log_file;
+--------------------------------+
| @@global.general_log_file      |
+--------------------------------+
| /usr/local/var/mysql/peach.log |
+--------------------------------+
1 row in set (0.00 sec)
then tail -f /usr/local/var/mysql/peach.log to get your stream of queries.

2016-03-23

Copying Playlists from iTunes to Plex Media Server on FreeNAS

I've managed to copy one playlist from iTunes to my Plex media server running on FreeNAS. It was an ordeal.

For the next time I try, here are the juicy bits:

Prep the m3u file
  • iTunes -> File -> Library -> Export Playlist
    • select 'm3u' as the file format
  • open the playlist.m3u file and replace all ^Ms with real carriage returns
  • remove any files that have non-ascii characters (eg ü, ç, ñ, é, à, etc.)
Get the Converter
  • https://github.com/dakusan/PlexPlaylistImporter 
  • The README says it requires python 3 but I got it to run with python 2.7.8
  • I had to make some changes
    • syntax error in print("...", end="")
    • munge path at line 104 to change the path necessary because of the chrooted plex jail
    • add some debugging print statements
Run the converter
  • python imp.py ./playlists/playlist.m3u 90s "/mnt/bluepool/jails/plexmediaserver_1/var/db/plexdata/Plex\ Media\ Server/Plug-in\ Support/Databases/com.plexapp.plugins.library.db" 

2016-03-17

Sublime Text CLI doesn't work after upgrading to El Capitan

I suffered this problem on my new laptop: a staple of opening files in sublime from the command line stopped working:
$ subl -n /tmp/foo.sh

I tracked it down to a few things:
  • it worked correctly in a new shell
  • it was only broken inside tmux
Searching yielded lots of half answers that were confusing about reattach-to-user-namespace. My solution:
  • brew install reattach-to-user-namespace
  • alias subl='reattach-to-user-namespace subl'
Assuming you have your subl symlink set up and in your path,  these changes will make it work from within tmux.

2016-03-15

More reflections from a terraform n00b

There are many posts out there talking about how to use Hashicorp's Terraform, and a number of them describing some of the things people learned as they were just beginning to use the product.  It's still early enough in its life cycle that these are useful. One came a few weeks ago from Charity, and I am adding the few things I've learned as an addendum to that.

Once you have a functional Terraform environment, there are a few things that can help you level up your game. You start thinking about how to get something that can be safely modified by a larger team and how to use your revision control to safely make changes to your environment in a way that you can test.

Multiple State Files
It is not at all uncommon to have multiple state files (aka multiple terraform ... I don't have a good word for it). This is even recommended by Hashicorp in some of their Atlas documentation. Common uses are
  • multiple copies of your environment (eg production, staging, etc.)
  • federation of responsibility (each team manages their own service)
  • protection (you're not making changes to the terraform config for all resources when changing only some. This is especially true for qa/dev/prod or global config vs. specific service config). 
  • refreshing large state files can have a significant performance impact, so splitting your installation into multiple states can be beneficial as your infrastructure grows.

Reusing Modules
Modules can be reused within a single .tf file. The module "name" is the definition of the local instance of that module; the module source is what determines what module to use. This was surprising to me because I thought module paralleled resource, where the first argument is the type of resource (the first argument would be the module name). This is not so - the only argument of module is more akin to the second argument to resource, giving it a local name. It's totally fine (and common) to say module "foo1" {source="./foo"}, module "foo2" {source="./foo"}, changing some other variable to make the two uses of the module different.

Promoting Changes using Modules
The module source can be a github repo for external modules (terraform get will download them) and they can also be github URLs at a specific revision (of your own or remote modules) to assert a specific version of a module. You can use multiple versions of the same module within one terraform instance, allowing you to promote changes from dev to qa to prod within a single terraform state. Note, of course, that doing this means that you have to run terraform get every time you update a module, whereas when using file paths to your modules, they're just symlinks so all changes are immediate.

For example, your dev environment can have source="github.com/...?ref=master" while your staging and prod .tf files can use source="github.com/...?ref=abcd1234" (or a git tag or branch) to avoid getting all the new changes until you're ready.

Overall Impression
It's been lovely getting to play with all these AWS resources that I'm used to mucking with by hand in a structured, automatable way. Though there are some strange bits to the configuration files (you can nest quotes! it's weird.), you get used to them quickly enough. I will bet that this product goes through radical change over the next year or two (in a manner similar to how the Chef community grew a few years ago) and I look forward to seeing it (and helping it) mature.

2016-03-11

Using Google Docs as a Time Tracker

I'm doing some consulting in between full time employment. It's been forever and a day since I did hourly work, so I briefly looked around to see what nice apps there were (either desktop or iOS) to help me track my hours for billing. Unsurprisingly, I was disappointed by what I found. They're all so complicated! I guess people that do this for real have multiple clients, multiple projects, want to interface with online billing systems, and so on. All I wanted was something quick to start work, stop work, and calculate the time spent.

Google spreadsheets + forms to the rescue!

I'm mostly putting this down just so I can remember it for the future.  If it's useful to you, great! Please leave a comment. It's likely you'll have to modify it some for your own use. Enjoy!

Step 1: Set Up the Form
  • New google sheet
  • Tools -> Create a Form
  • Only question: Start or Stop. Multiple Choice, start and stop as the options
    • make the question required
  • (If you have projects, add them as a second question)
  • Click the Preview button (an eye icon)
    • Copy/paste the URL to a temporary spot for later
Step 2: Do Time Calculations
Go back to the spreadsheet. You'll see a tab labeled "Form Responses 1" and one labeled "Sheet1"
  • In the Sheet1 tab, label the first column Timestamp and the second Duration
  • In cell A2, put ='Form Responses 1'!A2
  • In cell B2, put =if('Form Responses 1'!B2="Stop", 'Form Responses 1'!A2 - 'Form Responses 1'!A1, 0)
    • This formula assumes you always alternate Start and Stop, and calculates the time between start and stop
    • If tracking multiple projects, instead of =if(B2="Stop",... use =if(and(B2="Stop", C2="Proj1"),... with one column for each project.
  • Copy the formula down for the rest of each of columns A and B
  • Change column B to use Duration format: Format menu -> Number  -> Duration
Make a summary tab for easy viewing
  • Create a new tab, name it Summary
  • To get your total hours spent, use the formula =sum(Sheet1!B2:B30) (changing 30 to the final row of your list of durations
Step 3: Make it easy to use
Load the form on your phone. Using the 'send' icon in Safari, choose 'Add to Homescreen'. From now on, when you start or stop work, tap the icon for the form on your phone, tap Start or Stop, and submit. Submission will record the timestamp in the form, to be calculated by your spreadsheet formulas.

On your mac, open the AppleScript Editor. Create a new applescript with the contents:
open location "https://docs.google.com/blah/blah/blah"
changing the docs.google.com URL to be the URL for the form.  Save the applescript as an application and put it on your desktop. When you want to start or stop work on your mac, open the applescript (this will load the form in your default browser), check start or stop, and hit submit.

For extra fun, use Karabiner to bind a key or keycombo such as F10 to open the applescript. In private.xml:
<vkopenurldef>
    <name>KeyCode::VK_OPEN_URL_HOURS_APPLESCRIPT</name>
    <url>file:////Users/ben/Desktop/Hours.app </url>
</vkopenurldef>
<item>
    <name>F10 to Hours</name>
    <appendix>Launch hour tracker in default browser</appendix>
    <identifier>private.f10_to_hours</identifier>
    <autogen>__KeyToKey__
        KeyCode::F10,
        KeyCode::VK_OPEN_URL_HOURS_APPLESCRIPT
    </autogen>
</item>
 
Conditions and Use Instructions
  • When you start, load the form and submit with "Start"
  • When you finish, load the form and submit with "Stop"
  • You must alternate Start and Stop. If you have two starts in a row or two stops in a row, it will break.
    • If you have two Starts or Stops in a row, just delete one or insert a new row with the one that's missing.
  • If you forget to Start or Stop, you can either ignore it and add a new row later, or add a new row immediately then go and change the Timestamp to be correct after the fact.


2016-03-08

Installing CrashPlan headless on a FreeNAS

The real purpose of this post is to send google juice to the forum post that gives a working guide: https://forums.freenas.org/index.php?threads/tested-crashplan-4-5-setup.41418/. When I was searching for a guide yesterday, I got many postings from last year and before, none of which worked. This one does, though it does retain the extra-complicated bit of connecting to the server via an SSH tunnel instead of asking CrashPlan to just listen on all interfaces and connecting directly. I understand it's there because the conversation between the CP client and server is not encrypted, but honestly on my home network, I'm ok with that. My only suggestion is to give people the option, since asking CrashPlan to listen on all interfaces is way easy. (change the third field of /var/lib/.ui_info from 127.0.0.1 to 0.0.0.0).

Thanks, Martin. Your guide was a huge help.

[Addendum] In order to stop my laptop (which I am only using to connect to the CrashPlan instance on the NAS) from running CrashPlan, I needed to stop the CrashPlan service on my laptop and prevent it from starting again when restarting my laptop:
  sudo launchctl unload /Library/LaunchDaemons/com.crashplan.engine.plist
  sudo launchctl disable system/com.crashplan.engine