Down 6 pounds

At the beginning of this month, I said enough is enough and I forced myself back onto the slow carb diet. Slow carb is really another fancy name for low carbs, but it makes a distinction between complex and simple carbs. You also get a cheat day.

  • Rule 1: Avoid "white" carbohydrates
  • Rule 2: Eat the same few meals over and over again
  • Rule 3: Don't drink calories
  • Rule 4: Take one day off per week

I have been almost exclusively eating steak salads (from Sheetz and Ed's) and avoiding soda/sugar drinks (including diet). I had one cheat day so far, where I got to have ice cream and tons of pasta (I had it coincide with my Nifty Noodles and Drones Day). During that day, I was surprised that I didn't gain any weight. The last time I was on this diet, I would loose a couple pounds during the week, then gain most of it back on cheat day, and average out the week with a loss of one pound.

I know for long term weight loss, it's recommended to focus on 1lb a week, but I've been reading up on this and more and more people are agreeing that a quick drop in weight at the beginning is more motivating. After loosing 6.6lbs in 2 weeks, I have to agree.

weight loss chart

Offline Social Networking

A few days ago, I read a blog post by André Staltz about "AN OFF-GRID SOCIAL NETWORK". I immediately intrigued and I'll tell you why. Amateur Radio. The bulk of all amateur radio traffic takes place as either phone (voice) or CW (morse code). Textual communication is only a portion of the traffic. I have been on and done PSK31 to send real time messages to people that are listening right now. I have used WinLink to send email from my computer, over the radio, to a packet station in another state or another country. I am very active with APRS to send text messages over VHF (think real-time twitter for amateur radio). Actually, on thinking about about how APRS beacons go every 10 minutes, and beacon stations in general, I guess those do push more than voice does.

documenting my homelab with openDCIM


I've been wanting to document my home and homelab network for a while now. I used to keep some individual files around with a list of ip addresses and networks, and I used to have a yaml file with the network laid out. But a lot of it was out of date and any time I made a significant change (like adding vlans or tagged interfaces), the entire format would need to change. I've also been planning to redo how my network is laid out and realized I would have to document everything before I could really work on the new scenario. But overall, these projects have been at the very bottom of the list, kind of kicked under the rug.

This weekend, I got a big itch to go back and figure out how everything was laid out physically, and document it. I decided I just wanted to grab a tool, even if it wasn't the ideal do everything tool, and start recording things. So Saturday night and early Sunday morning, I started re-researching network documentation tools. I was pretty dissapointed by what I found, at least in the open source world.

Let me explain my home network setup. I have the house, my batcave/office, and a shed. These all are networked together, and each building has a managed switch in it. I have cable internet coming into the house. From the comcast router, a wire goes into a managed switch on a "public" vlan. My gateway router sits in the batcave is also plugged into a managed switch on the same vlan. I have 4U wall-mount racks holding patch panels (and/or switches) a 40U rack in my batcave holding the gateway router, a switch, and a couple servers. I also have some unifi access points spread around the property.

To start off, I came up with a list of what I wanted to do:

  • Enter in all of my physical devices
    • servers
    • racks
    • patch panels
    • switches
    • access points
    • routers
  • Record the connection between each device.
  • Note the native and tagged vlans on each switch port
  • Possibly record server ip addresses, virtual machines, and their vlans (native or tagged)
  • Be able to fetch my data by api, and ideally programatically enter
  • Be able to view a rack elevation or cable path

What I did not want to do:

  • Use a homemade file like a spreadsheet or yaml file (even though people have done wonders with making elevation spreadsheets)
  • Write an application to do this
  • Lock my data into an obscure format


Spoiler Alert As you can guess from the title, I ultimately installed openDCIM and used that. It doesn't meet all of my needs, but I'll explain my reasoning below.

After searching, I came up with there being several categories that touch upon these areas. They are 1) asset/inventory management, 2) network scanning and monitoring.

Network Scanning and Monitoring: Applications in this category, such as opennms and solarwinds npm work by interrogating your network and building a live map of everything connected. In reality, this is what most people, including myself should look at. The reason network documentation gets out of date is because someone is manually entering it. By providing a live report of the network (and showing historical changes in an audit log), you will always have the most accurate information. Looking at software in this category had me re-evaluate what I wanted to do. I was looking heavily at openNMS (and I've come across this product in the past). Here, the NMS stands for Network Management Solution, though they have a lot of focus on monitoring, so you might
assume the "M" stands for "Monitoring". openNMS looks excellent for what it does, and I will probably use openNMS down the road for my logical network documentation. But for what I wanted to do, openNMS and applications in this category are not designed to physically lay out hardware. There was nothing in openNMS about rack elevations or cable connectiosn that I could find. I even found people in forums looking for a way to integrate openNMS with Racktables (mentioned below).

Like I said, looking at applications in this category made me want to further separate my goals to target the physical layout. Some software can do live polling of switches to see what MAC addresses are connected to a port, and things like lldp will show friendly neighbor names. But it can't tell that there is a patch panel or unmanaged switch in between. The only way to get this information is visual inspection. I need an application that can do that.

Asset and Inventory Management: These also overlap with Config Management Databases. In fact, a number of IT Asset Managers also call themselves CMDBS, or vice versa. IT Asset Management is a pretty wide area. However, only a couple support concepts like rack elevations and
cable path management. The two I looked at the most were RackTables and opendcim. A third one I looked at was Ralph.

Ralph: Let me just talk about Ralph for a moment. I looked at Ralph two years ago for a larger project at work, and it was disqualified for a number of reasons, specific to that project. That experience gave me a negative view of Ralph, and I didn't give it too much looking over this time around. That may have been unwise. I took a second look while writing this up. If you look at their documentation, they seem to have all the features I'm looking for here, along with an API. It's based on python/django, and their github is pretty active. I think I owe it to Ralph to install and review the software again. A lot seems to have changed over the last two years. UPDATE: I went and played with Ralph's demo. Very slick addings datacenters, server rooms, racks, and devices. If I go to add a device, I can create a new device template and manufacturer on the fly. However, it has no support for cable management. There is an ongoing issue opened for this. So even though I didn't give Ralph a fair shake, it's out of the running for right now because it can't do links between interfaces.

Racktables: I've used racktables off an on over the years. Quite frankly, it's just not nice software to work with. Data entry is difficult, it has no native api support (though some people have worked at bolting some on), and in my mind, it's a one-way system. You put the data in and that's about it, you can only visually access the data afterwards. On the plus side, it does have some IPAM and VLAN management features, so for those looking to do more than physical layout, Racktables has quite an advantage.

openDCIM: Finally, we come to openDCIM. Like Racktables, it's php/mysql based. It has a rather nice interface for creating datacenters, cabinets, and devices. It understands about chassis and blade setups. It has a baked in read-only API. These days, my philosophy on web apps is that they should build an api, and then a frontend that uses that api. But this app pre-dates the popularity of APIs, and they have been adding it on afterwards. I would have been turned off by the lack of writeable API, but their existing html forms are basic enough that you could easily manipulate them with curl or python. If I really needed to update the data programatically, I am sure I could do so. But being able to run a curl command and get data back in json, means that I can easily integrate this with other tools down the road. Ultimately, I decided I didn't want to waste this motivation trying to seek out other tools and went with openDCIM. My goal for Sunday was to record what I could about the system.

Actual Usage


Installation was pretty straightforward. I did it inside a vagrant vm on my laptop. It was basically install Apache, PHP, Mysql and go to town. I
uploaded my vagrant config to github, so you can clone that and start your own instance right away.
After installation, opendcim provides a web-based pre-fight check and walks you through creating datacenters and cabinets. This is pretty straightforward - you just give your datacenters a name, and your cabinets a name and u-height. Once done, you have to manually delete install.php
from the opendcim directory.

For me, I called my batcave one datacenter, and my house another. Then for each room, I made that a cabinet. In my basement, I have a 4U network rack on the wall, and a rack shelf also screwed into the wall (holding my synology and my comcast router). I called my shelf a 10U rack. Then for each room that has a wall jack and any equipment (like an access point) I wanted to track, I created an imaginary rack. I'll talk more about that in a bit. Now, at work, we use rack names like R01 or NP01 or A54. At home, I used highly technical names like "basementwallrack", "tvstand", and "batcaverack". Pick a naming scheme that works for you.

enter manufacturers and templates

One of the first things I came across was that I can't just add a device to a rack. Devices are based on device templates, and templates are tied
to a manufacturer. This wasn't really a surprise, because almost any sort of asset tracker I've used works the same way. This means that I had to go
into my Template Management, and add manufacturers. Then I went into Template Management and start editing templates. This was a fun excercise going
into my emails, amazon orders, or just logging into a device to get a model number. In a template, you can define things like power consumption,
weight, network ports, and u-height. These are templates, so you won't be putting serial numbers in. I added templates for my managed switches,
my ubiquiti UAP and UAP-AC-PRO, 24v poe injectors, my chromebox, generic desktop computer, patch panels, and anything else I could think of. There
was a neat looking feature where you could import templates and submit templates back, but none of the existing ones had my equipment. If you have
images of the front and back of a device, you can include those to make your rack elevations look more accurate.

One quick gothca was that when I started adding devices with network ports, I found I had to go into configuration->cabling types and add cable media
types like 1000BaseTX and 1000BaseFX. For fun, I added 802.11bgn along with 802.11ac. I also added a 1000BaseTX-POE24V medie type, because I have some
runs that are carrying 24V POE.

One useful to do when making templates is to go down to the Ports and rename them to things like "eth0". For my access ports, I made "LAN" and "WLAN" ports. For the POE injector template, I put "LAN" and "POE" as port names. You can always rename ports when you create a specific device to put in a rack, but the better your template, the less work later on.

Also, the device type is important. Most of the types (servers, storage arrays, and appliances) all work the same way. Physical Infrastructure does not have any ports. Patch Panels are unique in that each port ends up having a front and rear connection.

adding devices and connecting ports

Finally, you can browse to a rack and start adding devies. When you add a device, you select from a template, add a label, and then select a u position. when you save that, you can then start connecting ports. I found it best to start with patch panels first. I have 24-port patch panels in
each wall-mounted network rack. I had to get creative for the wall jacks, and I made either 1-port or 2-port RJ45 keystone jacks (RJ4KJ1 and RJ45KJ2). When connecting patch panels (or wall jacks) make sure that you connect the rear of one patch panel to the rear of another. When editing a port, you can "connect" the front and rear side of the port at the same time. So you can connect the front of the patch panel to a switchport, and the back of your patch panel port to the back of a wall jack, and then hit save.

I found that saving connections seemed straightforward, but was easy to make mistakes. After you have entered each row and clicked save on it, you need to hit "update" to save all of your changes. Also, if you don't save your rows before hitting update, your changes will be lost. I also found that once I linked two ports, I could no longer change the name of the ports. Once you make connections, you can see the entire path in an image form, or in a text description like this; SW02BATCAVE[Port4]BATCAVE-PATCH24[Port4]BC2[BC2-2]FRESHDESK[eth0].

As for IP addresses and multiple interfaces, this was sadly lacking. I could enter a management address for a device. On the ports, there is a notes column. I could add an ip address or a vlan bit there, but it's simply a free-form field.

snmp When adding switches, if I added an ip address, I could query the switch with snmp. On my tp-link switches, it was able to get basic system information over snmp, but it could not get a list of ports. If it had, I believe it can populate some port status information.

imaginary racks and other oddities

As I mentioned above, I had to make imaginary racks in each room. The imaginary racks were sort of a pain point for me. I get that this program was written with racks in mind. The concept of a freestanding device such as a celing mounted access point, a wall jack, a printer, or a desktop tower just really doesn't factor in. The idea is that you have racks, and only devices that are in a rack can be cabled.

This also impacted how I made wall jacks. A single port wall jack, I had to enter as a patch panel, 1U in height. If a device does not have a U-height, you can't add it to a rack. And if you don't add it to a rack, you can't cable it. So, in order to document the RJ45KJ1 and RJ45KJ2, I created two 10U racks in my living room. "TVSTAND" with a 1U RJ45KJ1" and "LRCOUCH" with a 1U "RJ45KJ2". For TVSTAND, I added my tp-link unmanaged switch, my UAP-AC-PRO, and (for fun), my Chromebox. The switch connects to the front of the RJ45KJ1. The Chromebox connects to the unmanaged switch.

My access point provided another hitch. This might be a bit obsessive, but I want to record when something is using POE. So I created a Ubiquiti
24VPOEINJECTOR appliance template, which I used to create a (1U) device to place in my imaginary rack. One port (LAN) connects to the switch, while the other (POE) connects to the access point.

For my living room, since the POE lives with the access point, this isn't really needed. But for the access point in the hallway, the poe injection takes place in the basement, and we have a 1000BaseTX running from the switch to the injector, then 1000BaseTX-POE24V running to the front of the patch panel, then from the rear of the patch panel, to the keystone jack in the wall, and finally up to the access point. I have a similar setup in my batcave, with a POE injector powering an external access point, and another (48V!) powering an ip phone. While POE is supposed to be safe for non POE devices, I think it comes in handy to document which wall jack I can expect to find power at.

Wrap Up

This about wraps up my experience. Over the course of a Sunday, I was able to get openDCIM up and running, and enter all the data that describes the physical layout of my network. I would love to be able to wire up freestanding devices in a data center, and I would like to assign ip addresses and vlans to individual interfaces. But for physical layout and inventory, it works really well. I suspect that another application like openNMS will have to track my logical network. If it can be configured to query my switches (snmp/lldp), then it would be a better live solution. Ralph might be a good system for handling this aspect as well, though that requires further investigation.

Once all the data was in, I was able to do curl commands and retrieve json. Since references to other devices were id numbers, a true fech and sync would need to make multiple calls, retrieving related records. For visualization, openDCIM has a reports feature, including a network map. This network map is generated using graphviz dot language, and it can output that in png or svg. The default generated map is a bit difficult to trace lines from port to port. But I took the dot file, changed splines to ortho and it came out much nicer. I think there's room for improvement here, and I think with some tweaking, we can make a really nice printable network diagram to hang up next to each rack.

Another feature that might be nice would be printable asset labels, that have a QR pointing back to the opendcim instance. With the API, I could definitely see writing a script to pull and generate these.

I used mysqldump to backup my data and I can run this like an application, though I plan to put this on a VM. My next goal (in this category) is to create an ansible role to install this on one of my virtual machines and give it an always on life.

Welcome to 2017

This year, I have made a few new resolutions. One of them is to write more blog posts. This post is the first step towards that. Rather than just naming off a few common resolutions, I spent a couple days working through the exercises in Alex Vermeer's 8,760 Hours: How to get the most out of next year. In my opinion, this is an excellent book. The author and myself used mind-mapping software, but this is also very doable using pencil and paper. I did end up using Mindjet MindManager (trial) for mind, but I am going to switch it over to FreeMind. MindManager is (in my opinion) much more expensive than I anticipated and I can't justify the cost. I've used Freemind in the past and found it adequate.

You spend a good bit of time making a snapshot of "where you are now". You don't have to follow his guide exactly (I didn't, it is in fact, a "guide"), but you start with breaking your life down into 12 areas and adding details for each area. Some areas will be hard numbers or metrics, such as how much you have saved for retirement, or how much you weigh and what you think your goal weight should be. For instance, I did 7 blog posts in 2015 and only 3 in 2016. Others will more thought provoking like "what fun things have I done recently?" or describing your home life.

As you can imagine, the life snapshot is going to be a pretty personal thing, not something you'd want to share out. But it does make you think about your life as a whole. It's also nice to have some of the metrics that you can compare to when next year comes around.

Once you have built the snapshot, the second portion of the exercise is to actually set your goals. Much like making your current snapshot, Alex recommends splitting your goals into "life areas" and organizing your goals under each section. You should also give thoughts on how you might go about accomplishing these goals and add details underneath each goal. You don't have to make a full detailed plan. This "goal map" is going to be something that you can save or print out and refer to during the year. The entire thing should fit on the equivalent of one page and you would be able to see all of your goals and ideas towards accomplishing them at a glance. You can always use other specific tools to draw up a budget, record your diet, or design a sail boat as you work towards your goals.

Finally, the author recommends doing both monthly and quarterly reviews of your goals. Ask yourself if you're on track. Some goals may be more important than others. My goal to get my credit card balances back to $0 (and pay them off each month) is more important than my goal to build out 3 APRS stations and replace the 2nd garage door on our pole barn (it has been disabled/nailed shut since we moved in).

Worst Practice Lab VM Automation

Worst Practice Lab VM Automation

I've started the process of switching my lab over from unmanaged to ansible. I've used Puppet and Salt quite extensively through work, but after a handful of false starts with the lab, I think ansible is the way to go.g his is a series of what
many (including myself) would consider "worst practices", but are more along the lines of "rapid iteration". The goal here
is to get something working in a short period of time, without spending hours, days, or weeks researching best practices.
This is instead something someone can put together on a Sunday afternoon, in between chasing after a 3 year old.

These are a handful of manual steps, each of which could be easily automated once you determine your "starting point".

Background: When I clone a VM in proxmox, it comes up with the hostname "xenial-template". I should be able to do something like I do with cloud-init under kvm, but I haven't gotten that far under the proxmox setup. Additionally, these hosts are not in dns until they are entered into the freeipa server. Joining a client to IPA will automatically create the entry. So the first thing I need to do to any VM is to set the hostname, fqdn, and then register it with IPA. My template
has a user called "yourtech", which I can use to login and configure the VM.

First, create an ansible vault password file: echo secret> ~/.vault_pass.txt. Next, create an and inventory directory and setup an encrypted group_vars/all.

mkdir -p inventory/group_vars
touch inventory/group_vars/all

Add some common variables to all:

ansible_ssh_user: yourtech
ansible_ssh_pass: secret
ansible_sudo_pass: secret
freeipaclient_enroll_user: admin
freeipaclient_enroll_pass: supersecret

Then encrypt it: ansible-vault --vault-password-file=~/.vault_pass.txt encrypt inventory/group_vars/all

Generate inventory files.

With the following script, I can run ./ example If ansible failes, then I need to
troubleshoot. A better approach would be to add these entries into a singular inventory file, or better yet,
a database, providing a constantly updated and dynamic inventory. Put that on the later pile.

#!/usr/bin/env bash

LINE="${FQDN} ansible_host=${IP}"


echo ${LINE} > ${FILENAME}

echo "Removing any prior host keys"
ssh-keygen -R ${NEWNAME}
ssh-keygen -R ${FQDN}
ssh-keygen -R ${IP}

echo "${FILENAME} created, testing"
ansible --vault-password-file ${ANSIBLE_VAULT_PASSFILE} -i ${FILENAME} ${FQDN} -m ping -vvvv

Let's go to work.

At this point, I should have a working inventory file for a single host and I've validated that ansible can
connect. Granted, I haven't tested sudo, but in my situation, I'm pretty sure that will work. But I haven't
actually done anything with the VM. It's still just this default template.


Ansible provides a module to set the hostname, but does not modify /etc/hosts to get the FQDN resolving. As with
many things, I'm not the first to encounter this, so I found a premade role holms/ansible-fqdn.

mkdir roles
cd roles
git clone fqdn

This role will read inventory_hostname for fqdn, and inventory_hostname_short for hostname. You can override
this, but these are perfect defaults based on my script above.


Once again, we're saved by the Internet. alvaroaleman/ansible-freeipa-client is an already designed role that installs the necessary freeipa packages and runs the
ipa-join commands.

# assuming still in roles
git clone freeipa

The values this module needs just happens to perfectly match the freeipa_* variables I put in my all file earlier. I
think that's just amazing luck.

Make a playbook.

I call mine bootstrap.yml.

- hosts: all
become: yes
- fqdn
- freeipa


Let's run our playbook against host "pgdb02"

ansible-playbook -i inventory/pgdb02 --vault-password-file=~/.vault_pass.txt bootstrap.yml


ytjohn@corp5510l:~/projects/ytlab$ ansible-playbook -i inventory/pgdb02 --vault-password-file=~/.vault_pass.txt base.yml

PLAY ***************************************************************************

TASK [setup] *******************************************************************
ok: []

TASK [fqdn : fqdn | Configure Debian] ******************************************

TASK [fqdn : fqdn | Configure Redhat] ******************************************
skipping: []

TASK [fqdn : fqdn | Configure Linux] *******************************************
included: /home/ytjohn/projects/ytlab/roles/fqdn/tasks/linux.yml for

TASK [fqdn : Set Hostname with hostname command] *******************************
changed: []

TASK [fqdn : Re-gather facts] **************************************************
ok: []

TASK [fqdn : Build hosts file (backups will be made)] **************************
changed: []

TASK [fqdn : restart hostname] *************************************************
ok: []

TASK [fqdn : fqdn | Configure Windows] *****************************************
skipping: []

TASK [freeipa : Assert supported distribution] *********************************
ok: []

TASK [freeipa : Assert required variables] *************************************
ok: []

TASK [freeipa : Import variables] **********************************************
ok: []

TASK [freeipa : Set DNS server] ************************************************
skipping: []

TASK [freeipa : Update apt cache] **********************************************
ok: []

TASK [freeipa : Install required packages] *************************************
changed: [] => (item=[u'freeipa-client', u'dnsutils'])

TASK [freeipa : Check if host is enrolled] *************************************
ok: []

TASK [freeipa : Enroll host in domain] *****************************************
changed: []

TASK [freeipa : Include Ubuntu specific tasks] *********************************
included: /home/ytjohn/projects/ytlab/roles/freeipa/tasks/ubuntu.yml for

TASK [freeipa : Enable mkhomedir] **********************************************
changed: []

TASK [freeipa : Enable sssd sudo functionality] ********************************
changed: []

RUNNING HANDLER [freeipa : restart sssd] ***************************************
changed: []

RUNNING HANDLER [freeipa : restart ssh] ****************************************
changed: []

PLAY RECAP ********************************************************************* : ok=18 changed=8 unreachable=0 failed=0


Essentially, we created a rather basic inventory generator script, we encrypted some
credentials into a variables file using ansible-vault, and we downloaded some roles
"off the shelf" and executed them both with a single "bootstrap" playbook.

If I was doing this for work, I would first create at least one Vagrant VM and work through
an entire development cycle. I would probably rewrite these roles I downloaded to make them
more flexible and variable driven.

In case you got lost where these files go:

├── bootstrap.yml
├── inventory
│   ├── group_vars
│   │   ├── all
│   ├── pgdb01
│   ├── pgdb02
│   └── sstorm01
└── roles
├── fqdn
└── freeipa

Ceph getting acquainted

The two key components:

  • Ceph OSDs: A Ceph OSD Daemon (Ceph OSD) stores data, handles data replication, recovery, backfilling, rebalancing, and provides some monitoring information to Ceph Monitors by checking other Ceph OSD Daemons for a heartbeat. A Ceph Storage Cluster requires at least two Ceph OSD Daemons to achieve an active + clean state when the cluster makes two copies of your data (Ceph makes 3 copies by default, but you can adjust it).
  • Monitors: A Ceph Monitor maintains maps of the cluster state, including the monitor map, the OSD map, the Placement Group (PG) map, and the CRUSH map. Ceph maintains a history (called an “epoch”) of each state change in the Ceph Monitors, Ceph OSD Daemons, and PGs. Ceph uses the Paxos algorithm, which requires a consensus among the majority of monitors in a quorum. With Paxos, the monitors cannot determine a majority for establishing a quorum with only two monitors. A majority of monitors must be counted as such: 1:1, 2:3, 3:4, 3:5, 4:6, etc. Side note: some ceph docs advise not to comingle Montior and Ceph OSD daemons on the same host or you may encounter performance issues. But in deployment guides and the Mellanox high performance paper, they do comingle them. For all test purposes, I plan to comingle them (deploy monitor on ecs nodes) and evaluate performance under load. I am also still trying to estimate how many monitors per Ceph OSD. We'll have 480 Ceph OSDs per rack, and we'll want either 3, 5, or 7 monitors. I'm going to take a shot in the dark and go with 5.
  • RADOS GW This is what provides the S3 and SWIFT API access to Ceph file storage. You can install this on the OSD nodes (simplest) or select a handful of external VMs to run these. You would setup multiple RADOS GW nodes, and place a load balancer like haproxy or nginx/lua_proxy in front of them.

OSD Notes:

OSD Journal Location: stores a daemon's journal by default on /var/lib/ceph/osd/$cluster-$id/journal - on a ECS node, this would be an SSD, which is recommended by CEPH. However, you could point it to an SSD partition instead of a file for even faster performance.

OSD Journal Size: The expected throughput number should include the expected disk throughput (i.e., sustained data transfer rate), and network throughput. For example, a 7200 RPM disk will likely have approximately 100 MB/s. Taking the min() of the disk and network throughput should provide a reasonable expected throughput. Some users just start off with a 10GB journal size. For example:
osd journal size = 10000

OSD's can be removed gracefully:

Check Max Threadcount: If you have a node with a lot of OSDs, you may be hitting the default maximum number of threads (e.g., usually 32k), especially during recovery. You can increase the number of threads using sysctl to see if increasing the maximum number of threads to the maximum possible number of threads allowed (i.e., 4194303) will help. For example:
sysctl -w kernel.pid_max=4194303

Crush MAP

The "location" of each Ceph OSD is maintained in a CRUSH MAP.

The CRUSH algorithm determines how to store and retrieve data by computing data storage locations. CRUSH empowers Ceph clients to communicate with OSDs directly rather than through a centralized server or broker. With an algorithmically determined method of storing and retrieving data, Ceph avoids a single point of failure, a performance bottleneck, and a physical limit to its scalability.

CRUSH maps contain a list of OSDs, a list of ‘buckets’ for aggregating the devices into physical locations, and a list of rules that tell CRUSH how it should replicate data in a Ceph cluster’s pools. By reflecting the underlying physical organization of the installation, CRUSH can model—and thereby address—potential sources of correlated device failures. Typical sources include physical proximity, a shared power source, and a shared network. By encoding this information into the cluster map, CRUSH placement policies can separate object replicas across different failure domains while still maintaining the desired distribution. For example, to address the possibility of concurrent failures, it may be desirable to ensure that data replicas are on devices using different shelves, racks, power supplies, controllers, and/or physical locations.

The short of this is that in ceph.conf, you can define a host's location, which subsequently defines the location of each Ceph OSD operating on that host. A location is a collection of key pairs consisting of Ceph predefined types.

root=default row=a rack=a2 chassis=a2a host=a2a1

# types (from narrowest ascending to broadest grouping)
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

Each CRUSH type has a value. The higher this value, the less specific the grouping is. So when deciding where to place data chunks or replicants of an object, Ceph OSDs will consult the crush maps to find other Ceph OSDs in other host, chassis, and racks. The fault domain policies can be defined and tweaked.

Zero To Hero Guide : : For CEPH CLUSTER PLANNING - High performance ceph builds.

Ceph 1st Runthrough

ID: 34
post_title: Ceph 1st Runthrough
author: ytjohn
post_date: 2017-04-30 21:42:52
post_excerpt: ""
layout: post

published: true

These are just some notes I took as I did my first run through on installing Ceph on some spare ECS Hardware I had access to. Note that currently, no one would actually recommend doing this, but it was a good way for me to get started with Ceph.


Following this guide

I set this up the first time in the lab, nodes:

  • ljb01.osaas.lab (admin-node)
  • rain02-r01-01.osaas.lab (mon.node1)
  • rain02-r01-03.osaas.lab (osd.0)
  • rain02-r01-04.osaas.lab (osd.1)

In a more fleshed out setup, I would probably have a dedicated admin node (instead of the jump), and we would start off with the layout like this:

The first 'caveat' was that it tells you to configure a user ("ceph") that can sudo up, which I did. But ceph-deploy attempts to modify the ceph user, which it can't do while ceph is logged in.

[rain02-r01-03][DEBUG ] Setting system user ceph properties..

For step 6, adding OSDs, I diverged again to add disks instead of a directory.

ceph-deploy disk list rain02-r01-01
[rain02-r01-01][DEBUG ] /dev/sda :
[rain02-r01-01][DEBUG ]  /dev/sda1 other, 21686148-6449-6e6f-744e-656564454649
[rain02-r01-01][DEBUG ]  /dev/sda2 other, ext4, mounted on /boot
[rain02-r01-01][DEBUG ] /dev/sdaa other, unknown
[rain02-r01-01][DEBUG ] /dev/sdab other, unknown
[rain02-r01-01][DEBUG ] /dev/sdac other, unknown
[rain02-r01-01][DEBUG ] /dev/sdad other, unknown
[rain02-r01-01][DEBUG ] /dev/sdae other, unknown

I will setup sdaa, sdab, and sdac. Note that while I could use a separate disk partition (like an ssd) to maintain the journal, we only have one ssd in ECS hardware and it hosts the OS. So we'll let each disk maintain its own journal.

ceph-deploy disk zap rain02-r01-01:sdaa  # zap the drive
ceph-deploy disk prepare rain02-r01-01:sdaa # format the drive with xfs
ceph-deploy disk activate rain02-r01-01:/dev/sdaa1  # notice we changed to partition path
# /dev/sdaa1              5.5T   34M  5.5T   1% /var/lib/ceph/osd/ceph-0

Repeat those steps for each node and disk you want to activate. Could you imagine doing 32-48 nodes * 60 drives by hand? This seems like a job to be automated.

I also noticed that the drives get numbered sequentially across nodes. I wonder what kind of implications that has for replacing drives or an entire node.

root@rain02-r01-01:~# df -h | grep ceph
/dev/sdaa1              5.5T   36M  5.5T   1% /var/lib/ceph/osd/ceph-0
/dev/sdab1              5.5T   36M  5.5T   1% /var/lib/ceph/osd/ceph-1
/dev/sdac1              5.5T   35M  5.5T   1% /var/lib/ceph/osd/ceph-2
root@rain02-r01-03:~# df -h | grep ceph
/dev/sdaa1              5.5T   35M  5.5T   1% /var/lib/ceph/osd/ceph-3
/dev/sdab1              5.5T   35M  5.5T   1% /var/lib/ceph/osd/ceph-4
/dev/sdac1              5.5T   34M  5.5T   1% /var/lib/ceph/osd/ceph-5
root@rain02-r01-04:~# df -h | grep ceph
/dev/sdaa1              5.5T   34M  5.5T   1% /var/lib/ceph/osd/ceph-6
/dev/sdab1              5.5T   34M  5.5T   1% /var/lib/ceph/osd/ceph-7
/dev/sdac1              5.5T   34M  5.5T   1% /var/lib/ceph/osd/ceph-8

After creating all this, I can do a ceph status.

root@ljb01:/home/ceph/rain-cluster# ceph status
    cluster 4ebe7995-6a33-42be-bd4d-20f51d02ae45
     health HEALTH_WARN
            too few PGs per OSD (14 < min 30)
     monmap e1: 1 mons at {rain02-r01-01=}
            election epoch 2, quorum 0 rain02-r01-01
     osdmap e43: 9 osds: 9 up, 9 in
            flags sortbitwise
      pgmap v78: 64 pgs, 1 pools, 0 bytes data, 0 objects
            306 MB used, 50238 GB / 50238 GB avail
                  64 active+clean

PG's are known as placement groups.
That page recommends that for 5-10 OSDs, (I have 9) we set this number to 512. I'm defaulted at 64. But then the tool tells me otherwise.

root@ljb01:/home/ceph/rain-cluster# ceph osd pool get rbd pg_num
pg_num: 64
root@ljb01:/home/ceph/rain-cluster# ceph osd pool set rbd pg_num 512
Error E2BIG: specified pg_num 512 is too large (creating 448 new PGs on ~9 OSDs exceeds per-OSD max of 32)

I'll put this down as a question for later and set it to 128.
This does nothing, so I learned what I really need to do is make more pools. I make a new pool, but my HEALTH_WARN has changed to reflect my mistake.

root@ljb01:/home/ceph/rain-cluster# ceph status
    cluster 4ebe7995-6a33-42be-bd4d-20f51d02ae45
     health HEALTH_WARN
            pool rbd pg_num 128 > pgp_num 64
     monmap e1: 1 mons at {rain02-r01-01=}
            election epoch 2, quorum 0 rain02-r01-01
     osdmap e48: 9 osds: 9 up, 9 in
            flags sortbitwise
      pgmap v90: 256 pgs, 2 pools, 0 bytes data, 0 objects
            311 MB used, 50238 GB / 50238 GB avail
                 256 active+clean

There is also a pgp_num to set, so I set that to 128. Now everything is happy and healthy. And I've only jumped from 306MB to 308MB used.

root@ljb01:/home/ceph/rain-cluster# ceph status
    cluster 4ebe7995-6a33-42be-bd4d-20f51d02ae45
     health HEALTH_OK
     monmap e1: 1 mons at {rain02-r01-01=}
            election epoch 2, quorum 0 rain02-r01-01
     osdmap e50: 9 osds: 9 up, 9 in
            flags sortbitwise
      pgmap v100: 256 pgs, 2 pools, 0 bytes data, 0 objects
            308 MB used, 50238 GB / 50238 GB avail
                 256 active+clean

Placing Objects

You can place objects into pools with rados command.

root@ljb01:/home/ceph/rain-cluster# echo bogart > testfile.txt
root@ljb01:/home/ceph/rain-cluster# rados put test-object-1 testfile.txt --pool=pool2
root@ljb01:/home/ceph/rain-cluster# rados -p pool2 ls
root@ljb01:/home/ceph/rain-cluster# ceph osd map pool2 test-object-1
osdmap e59 pool 'pool2' (1) object 'test-object-1' -> pg 1.74dc35e2 (1.62) -> up ([8,5], p8) acting ([8,5], p8)

Object Storage Gateway

Ceph does not provide a quick way to install and configure object storage gateways. You essentially have to install apache, libapache2-mod-fastcgi, rados, radosgw, and create a virtualhost. While you could do this on only a portion of your OSD nodes, it seems like it would make most sense to do it on each OSD node so that each node can be part of the pool.

Repo change:$(lsb_release -sc)-x86_64-basic/ref/master

should be:$(lsb_release -sc)-x86_64-basic/ref/master

After installing the packages, you need to start configuring.

After steps 1-5 (creating and distributing a key), you need to make a storagepool.

root@ljb01:/home/ceph/rain-cluster# ceph osd pool create storagepool1 128 128 erasure default
pool 'storagepool1' created

Creating domain "*.rain.osaas.lab" for this instance. I also had to create /var/log/radosgw before I could start the radosgw service.

After starting radosgw, I had to chown the fastcgi.sock file ownership:

chown www-data:www-data /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock

Next, you go to the admin section to create users.

root@rain02-r01-01:/var/www/html# radosgw-admin user create --uid=john --display-name="John Hogenmiller" [email protected]
    "user_id": "john",
    "display_name": "John Hogenmiller",
    "email": "[email protected]",
    "max_buckets": 1000,
    "keys": [
            "user": "john",
            "access_key": "KH6ABIYU7P1AC34F9FVC",
            "secret_key": "OFjRqeMGH26yYX9ggxr8dTyz9KYZMLFK9W5i1ACV"
    "temp_url_keys": []

Or specify a key like we do in other environments.

root@rain02-r01-01:/var/www/html# radosgw-admin user create --uid=cicduser1 --display-name="TC cicduser1" --access-key=cicduser1 --secret-key='5Y4pjcKhjAsmbeO347RpyaVyT6QhV8UHYc5YWaBB'
    "user_id": "cicduser",
    "display_name": "TC cicduser1",
    "keys": [
            "user": "cicduser1",
            "access_key": "cicduser1",
            "secret_key": "5Y4pjcKhjAsmbeO347RpyaVyT6QhV8UHYc5YWaBB"

Fun fact: You can set quotas and read/write capabilities on users. It also can do usage statistics for a given time period.

All of the CLI commands can be implemented over API: - in this, just adding /admin/
(configurable) to the url. You can give any S3 user admin capabilities. It's the same backend authentication for both.

I also confirmed that by installing radosgw on a second node, all user ids and bucket was still available. Clustering confirmed.


When it comes to automating this, there are several options.

Build our own ceph-formula up into something that fully manages ceph.


  • It will do what we want it to.


  • Our current ceph-formula currently only installs packages.
  • Lots of work involved

Refactor public ceph-salt formula to meet our needs.


  • ceph-salt seems to cover most elements, including orcehstration
  • uses a global_variables.jinja much like we use map.jinja


  • I'm sure we'll find something wrong with it. (big grin)
  • maintained by 1 person
  • last updated over a year ago

Use Kolla to setup Ceph:

* Openstack team might be using Kolla - standardization
* Already well built out
* Puts ceph components into docker containers (though some might consider this a con)


  • It's reported that it work primarily on Redhat/Centos; less so on Ubuntu
  • Uses ansible as underlying management - this introduces a secondary management system over ssh
  • Is heavily opinionated based on Openstack architecture (some might say this is a pro)

Use Ansible-Ceph:


  • Already well built out
  • Highly flexible/configurable
  • Works on Ubuntu
  • Not opinionated
  • maintained by ceph project
  • large contribution base


  • Uses ansible as underlying management - this introduces a secondary management system (in additon to salt) over ssh

postgresql hstore is easy to compare

hstore is an option key=>value column type that's been around in postgresql for a long time. I was looking at it for a project where I want to compare "new data" to old, so I can approve it. There is a hstore-hstore option that compares two hstore collections and shows the differences.

In reality, an hstore column looks like text. It's just in a format that postgresql understands.

Here, we have an existing record with some network information.

hs1=# select id, data::hstore from d1 where id = 3;
 id |                          data                          
  3 | "ip"=>"", "fqdn"=>""
(1 row)

Let's say I submitted a form with slightly changed network information. I can do a select statement to get the differences.

hs1=# select id, hstore('"ip"=>"", "fqdn"=>""')-data from d1 where id =3;
 id |             ?column?              
  3 | "fqdn"=>""
(1 row)

This works just as well if we're adding a new key.

hs1=# select id, hstore('"ip"=>"", "fqdn"=>"", "netmask"=>""')-data from d1 where id =3;
 id |                           ?column?                            
  3 | "fqdn"=>"", "netmask"=>""
(1 row)

This information could be displayed on a confirmation page. Ideally, a proposed dataset would be placed somewhere, and a page could be rendered on the fly showing any changes an approval would create within the database.

Then we can update with the newly submitted form.

hs1=# update d1 set data = data || hstore('"ip"=>"", "fqdn"=>"", "netmask"=>""') where id = 3;

hs1=# select id, data::hstore from d1 where id = 3; id |                                         data                                         
  3 | "ip"=>"", "fqdn"=>"", "netmask"=>""
(1 row)

Note that if I wanted to delete a key instead of just setting it to NULL, that would be a separate operation.

update d1 SET data = delete(data, 'ip') where id = 3;

Programming Uniden AMH-350 for APRS

This is a narrative post. If you want to see my python program that calculates out the diode matrix, skip to the end or click here,

I recently received this "Force Communications AMH-350" radio. Actually, it was an entire cabinet with a large power supply, an MFJ TNC2 tnc, and an old DOS PC running JNOS. These had active in a tower shed and turned off 3 years ago. The club wanted me to repurpose this packet system for APRS.

Once I plugged it in, the computer booted up to JNOS, but the radio and TNC did not turn on. The power supply had a plastic box on the back with a larger bussman 30A fuse. When I pulled it out, corrosion dust leaked out. I made a trip to the hardware store and replaced it. The radio turned on but not the TNC. On the front I found 3 smaller fuses and a note describing that "F3" ran the TNC. Pulled that fuse out and it was dead. A second trip to the hardware store got this fuse replaced. Then I plugged everything back in and turned on the power supply. Within 10 seconds, the "make it work smoke" had leaked out of the TNC2. This is probably why the F3 fuse had blown in the first place. This was disappointing, because there is new firmware for the TNC2 that makes it a decent APRS TNC, no computer needed.

The computer, I deemed too old to run a soundcard packet (using direwolf as my driver), so this left me with the power supply and radio. Grounding out the PTT line and using a frequency counter, it showed me "channel 2" was transmitting on 145.050. Channel 1 was not programmed at all.

A quick google search told me that Uniden bought Force Communications and sold this radio as a Uniden AMH-350. I found 2 other people looking for how to program it (one in 1994, and the other in 2004) with no response. I found someone selling the radio's manual on ebay for $20. I offered them $10 and received the manual earlier this week.

The radio itself is programmed with a common cathode diode matrix, representing a binary value. Here is a picture of one back side of it programmed for 145.050. The manual provides a table covering frequencies from 148Mhz to 174Mhz in 5khz increments. Fortunately, it provides a formula on how to come up with your own frequencies. I ran through this formula multiple times getting different results from the book, till I realized the book was rounding some values UP or outright disregarding fractional parts. It also took a bit to wrap my head around binary "1" being disconnected (or cut) and binary "0" being connected. That felt backwards to me.

Eventually though, I was able to match the book, create a chart that matched the existing programmed 145.050 frequency (both Tx and Rx, which are programmed separately). Then, I wrapped the whole thing up in a set of python functions inside an ipython notebook. You can view this on ipython's nbviewer or the direct gist.

I don't have the radio programmed yet. I feel getting the diode matrixes out of "channel 2" and still having them useful for programming with is going to be difficult. I will need 7 diodes connected for each Tx and Rx slot, 14 total. I am attempting to program up channel 1. By the time I got to this portion, I was a bit tired and making mistakes, so I called it a night. Once I get to building out the programming board, I'll post some more pictures.

rPI DPI Display, cheap.

Recently the internet noticed the Raspberry Pi could drive LCD panels using DPI. This allows very inexpensive displays to be used with basically no additional hardware.

This is not a full post, just capturing some details from someone elses blog post so I don't lose it (will my site become my next bookmark holder?).

Let’s add a dirt cheap screen to the Raspberry Pi B+


Total: $43.85

That being said, I see a Pi: 7" Display no Touchscreen 1024x600 w/ Mini Driver for $70 with no messing with HDMI.

I also recently purchased a 5" 800x480 screen with an hdmi driver off of a chinese vendor on ebay for $42.75. It just takes a month or so to arrive.

Click to access VS-TY50-V2.pdf