Welcome to 2017

This year, I have made a few new resolutions. One of them is to write more blog posts. This post is the first step towards that. Rather than just naming off a few common resolutions, I spent a couple days working through the exercises in Alex Vermeer’s 8,760 Hours: How to get the most out of next year. In my opinion, this is an excellent book. The author and myself used mind-mapping software, but this is also very doable using pencil and paper. I did end up using Mindjet MindManager (trial) for mind, but I am going to switch it over to FreeMind. MindManager is (in my opinion) much more expensive than I anticipated and I can’t justify the cost. I’ve used Freemind in the past and found it adequate.

You spend a good bit of time making a snapshot of “where you are now”. You don’t have to follow his guide exactly (I didn’t, it is in fact, a “guide”), but you start with breaking your life down into 12 areas and adding details for each area. Some areas will be hard numbers or metrics, such as how much you have saved for retirement, or how much you weigh and what you think your goal weight should be. For instance, I did 7 blog posts in 2015 and only 3 in 2016. Others will more thought provoking like “what fun things have I done recently?” or describing your home life.

As you can imagine, the life snapshot is going to be a pretty personal thing, not something you’d want to share out. But it does make you think about your life as a whole. It’s also nice to have some of the metrics that you can compare to when next year comes around.

Once you have built the snapshot, the second portion of the exercise is to actually set your goals. Much like making your current snapshot, Alex recommends splitting your goals into “life areas” and organizing your goals under each section. You should also give thoughts on how you might go about accomplishing these goals and add details underneath each goal. You don’t have to make a full detailed plan. This “goal map” is going to be something that you can save or print out and refer to during the year. The entire thing should fit on the equivalent of one page and you would be able to see all of your goals and ideas towards accomplishing them at a glance. You can always use other specific tools to draw up a budget, record your diet, or design a sail boat as you work towards your goals.

Finally, the author recommends doing both monthly and quarterly reviews of your goals. Ask yourself if you’re on track. Some goals may be more important than others. My goal to get my credit card balances back to $0 (and pay them off each month) is more important than my goal to build out 3 APRS stations and replace the 2nd garage door on our pole barn (it has been disabled/nailed shut since we moved in).

Worst Practice Lab VM Automation

Worst Practice Lab VM Automation

I’ve started the process of switching my lab over from unmanaged to ansible. I’ve used Puppet and Salt quite extensively through work, but after a handful of false starts with the lab, I think ansible is the way to go.g his is a series of what
many (including myself) would consider “worst practices”, but are more along the lines of “rapid iteration”. The goal here
is to get something working in a short period of time, without spending hours, days, or weeks researching best practices.
This is instead something someone can put together on a Sunday afternoon, in between chasing after a 3 year old.

These are a handful of manual steps, each of which could be easily automated once you determine your “starting point”.

Background: When I clone a VM in proxmox, it comes up with the hostname “xenial-template”. I should be able to do something like I do with cloud-init under kvm, but I haven’t gotten that far under the proxmox setup. Additionally, these hosts are not in dns until they are entered into the freeipa server. Joining a client to IPA will automatically create the entry. So the first thing I need to do to any VM is to set the hostname, fqdn, and then register it with IPA. My template
has a user called “yourtech”, which I can use to login and configure the VM.

First, create an ansible vault password file: echo secret> ~/.vault_pass.txt. Next, create an and inventory directory and setup an encrypted group_vars/all.

mkdir -p inventory/group_vars
touch inventory/group_vars/all

Add some common variables to all:

ansible_ssh_user: yourtech
ansible_ssh_pass: secret
ansible_sudo_pass: secret
freeipaclient_server: dc01.lab.ytnoc.net
freeipaclient_domain: lab.ytnoc.net
freeipaclient_enroll_user: admin
freeipaclient_enroll_pass: supersecret

Then encrypt it: ansible-vault --vault-password-file=~/.vault_pass.txt encrypt inventory/group_vars/all

Generate inventory files.

With the following script, I can run ./add-new.sh example If ansible failes, then I need to
troubleshoot. A better approach would be to add these entries into a singular inventory file, or better yet,
a database, providing a constantly updated and dynamic inventory. Put that on the later pile.

#!/usr/bin/env bash

LINE="${FQDN} ansible_host=${IP}"


echo ${LINE} > ${FILENAME}

echo "Removing any prior host keys"
ssh-keygen -R ${NEWNAME}
ssh-keygen -R ${FQDN}
ssh-keygen -R ${IP}

echo "${FILENAME} created, testing"
ansible --vault-password-file ${ANSIBLE_VAULT_PASSFILE} -i ${FILENAME} ${FQDN} -m ping -vvvv

Let’s go to work.

At this point, I should have a working inventory file for a single host and I’ve validated that ansible can
connect. Granted, I haven’t tested sudo, but in my situation, I’m pretty sure that will work. But I haven’t
actually done anything with the VM. It’s still just this default template.


Ansible provides a module to set the hostname, but does not modify /etc/hosts to get the FQDN resolving. As with
many things, I’m not the first to encounter this, so I found a premade role holms/ansible-fqdn.

mkdir roles
cd roles
git clone https://github.com/holms/ansible-fqdn.git fqdn

This role will read inventory_hostname for fqdn, and inventory_hostname_short for hostname. You can override
this, but these are perfect defaults based on my script above.


Once again, we’re saved by the Internet. alvaroaleman/ansible-freeipa-client is an already designed role that installs the necessary freeipa packages and runs the
ipa-join commands.

# assuming still in roles
git clone https://github.com/alvaroaleman/ansible-freeipa-client.git freeipa

The values this module needs just happens to perfectly match the freeipa_* variables I put in my all file earlier. I
think that’s just amazing luck.

Make a playbook.

I call mine bootstrap.yml.

- hosts: all
become: yes
- fqdn
- freeipa


Let’s run our playbook against host “pgdb02”

ansible-playbook -i inventory/pgdb02 --vault-password-file=~/.vault_pass.txt bootstrap.yml


[email protected]:~/projects/ytlab$ ansible-playbook -i inventory/pgdb02 --vault-password-file=~/.vault_pass.txt base.yml

PLAY ***************************************************************************

TASK [setup] *******************************************************************
ok: [pgdb02.lab.ytnoc.net]

TASK [fqdn : fqdn | Configure Debian] ******************************************

TASK [fqdn : fqdn | Configure Redhat] ******************************************
skipping: [pgdb02.lab.ytnoc.net]

TASK [fqdn : fqdn | Configure Linux] *******************************************
included: /home/ytjohn/projects/ytlab/roles/fqdn/tasks/linux.yml for pgdb02.lab.ytnoc.net

TASK [fqdn : Set Hostname with hostname command] *******************************
changed: [pgdb02.lab.ytnoc.net]

TASK [fqdn : Re-gather facts] **************************************************
ok: [pgdb02.lab.ytnoc.net]

TASK [fqdn : Build hosts file (backups will be made)] **************************
changed: [pgdb02.lab.ytnoc.net]

TASK [fqdn : restart hostname] *************************************************
ok: [pgdb02.lab.ytnoc.net]

TASK [fqdn : fqdn | Configure Windows] *****************************************
skipping: [pgdb02.lab.ytnoc.net]

TASK [freeipa : Assert supported distribution] *********************************
ok: [pgdb02.lab.ytnoc.net]

TASK [freeipa : Assert required variables] *************************************
ok: [pgdb02.lab.ytnoc.net]

TASK [freeipa : Import variables] **********************************************
ok: [pgdb02.lab.ytnoc.net]

TASK [freeipa : Set DNS server] ************************************************
skipping: [pgdb02.lab.ytnoc.net]

TASK [freeipa : Update apt cache] **********************************************
ok: [pgdb02.lab.ytnoc.net]

TASK [freeipa : Install required packages] *************************************
changed: [pgdb02.lab.ytnoc.net] => (item=[u'freeipa-client', u'dnsutils'])

TASK [freeipa : Check if host is enrolled] *************************************
ok: [pgdb02.lab.ytnoc.net]

TASK [freeipa : Enroll host in domain] *****************************************
changed: [pgdb02.lab.ytnoc.net]

TASK [freeipa : Include Ubuntu specific tasks] *********************************
included: /home/ytjohn/projects/ytlab/roles/freeipa/tasks/ubuntu.yml for pgdb02.lab.ytnoc.net

TASK [freeipa : Enable mkhomedir] **********************************************
changed: [pgdb02.lab.ytnoc.net]

TASK [freeipa : Enable sssd sudo functionality] ********************************
changed: [pgdb02.lab.ytnoc.net]

RUNNING HANDLER [freeipa : restart sssd] ***************************************
changed: [pgdb02.lab.ytnoc.net]

RUNNING HANDLER [freeipa : restart ssh] ****************************************
changed: [pgdb02.lab.ytnoc.net]

PLAY RECAP *********************************************************************
pgdb02.lab.ytnoc.net : ok=18 changed=8 unreachable=0 failed=0


Essentially, we created a rather basic inventory generator script, we encrypted some
credentials into a variables file using ansible-vault, and we downloaded some roles
“off the shelf” and executed them both with a single “bootstrap” playbook.

If I was doing this for work, I would first create at least one Vagrant VM and work through
an entire development cycle. I would probably rewrite these roles I downloaded to make them
more flexible and variable driven.

In case you got lost where these files go:

├── add-new.sh
├── bootstrap.yml
├── inventory
│   ├── group_vars
│   │   ├── all
│   ├── pgdb01
│   ├── pgdb02
│   └── sstorm01
└── roles
├── fqdn
└── freeipa

Ceph getting acquainted

The two key components:

  • Ceph OSDs: A Ceph OSD Daemon (Ceph OSD) stores data, handles data replication, recovery, backfilling, rebalancing, and provides some monitoring information to Ceph Monitors by checking other Ceph OSD Daemons for a heartbeat. A Ceph Storage Cluster requires at least two Ceph OSD Daemons to achieve an active + clean state when the cluster makes two copies of your data (Ceph makes 3 copies by default, but you can adjust it).
  • Monitors: A Ceph Monitor maintains maps of the cluster state, including the monitor map, the OSD map, the Placement Group (PG) map, and the CRUSH map. Ceph maintains a history (called an “epoch”) of each state change in the Ceph Monitors, Ceph OSD Daemons, and PGs. Ceph uses the Paxos algorithm, which requires a consensus among the majority of monitors in a quorum. With Paxos, the monitors cannot determine a majority for establishing a quorum with only two monitors. A majority of monitors must be counted as such: 1:1, 2:3, 3:4, 3:5, 4:6, etc. Side note: some ceph docs advise not to comingle Montior and Ceph OSD daemons on the same host or you may encounter performance issues. But in deployment guides and the Mellanox high performance paper, they do comingle them. For all test purposes, I plan to comingle them (deploy monitor on ecs nodes) and evaluate performance under load. I am also still trying to estimate how many monitors per Ceph OSD. We’ll have 480 Ceph OSDs per rack, and we’ll want either 3, 5, or 7 monitors. I’m going to take a shot in the dark and go with 5.
  • RADOS GW This is what provides the S3 and SWIFT API access to Ceph file storage. You can install this on the OSD nodes (simplest) or select a handful of external VMs to run these. You would setup multiple RADOS GW nodes, and place a load balancer like haproxy or nginx/lua_proxy in front of them.

OSD Notes:

OSD Journal Location: stores a daemon’s journal by default on /var/lib/ceph/osd/$cluster-$id/journal – on a ECS node, this would be an SSD, which is recommended by CEPH. However, you could point it to an SSD partition instead of a file for even faster performance.

OSD Journal Size: The expected throughput number should include the expected disk throughput (i.e., sustained data transfer rate), and network throughput. For example, a 7200 RPM disk will likely have approximately 100 MB/s. Taking the min() of the disk and network throughput should provide a reasonable expected throughput. Some users just start off with a 10GB journal size. For example:
osd journal size = 10000

OSD’s can be removed gracefully: http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-osds-manual

Check Max Threadcount: If you have a node with a lot of OSDs, you may be hitting the default maximum number of threads (e.g., usually 32k), especially during recovery. You can increase the number of threads using sysctl to see if increasing the maximum number of threads to the maximum possible number of threads allowed (i.e., 4194303) will help. For example:
sysctl -w kernel.pid_max=4194303

Crush MAP

The “location” of each Ceph OSD is maintained in a CRUSH MAP.

The CRUSH algorithm determines how to store and retrieve data by computing data storage locations. CRUSH empowers Ceph clients to communicate with OSDs directly rather than through a centralized server or broker. With an algorithmically determined method of storing and retrieving data, Ceph avoids a single point of failure, a performance bottleneck, and a physical limit to its scalability.

CRUSH maps contain a list of OSDs, a list of ‘buckets’ for aggregating the devices into physical locations, and a list of rules that tell CRUSH how it should replicate data in a Ceph cluster’s pools. By reflecting the underlying physical organization of the installation, CRUSH can model—and thereby address—potential sources of correlated device failures. Typical sources include physical proximity, a shared power source, and a shared network. By encoding this information into the cluster map, CRUSH placement policies can separate object replicas across different failure domains while still maintaining the desired distribution. For example, to address the possibility of concurrent failures, it may be desirable to ensure that data replicas are on devices using different shelves, racks, power supplies, controllers, and/or physical locations.

The short of this is that in ceph.conf, you can define a host’s location, which subsequently defines the location of each Ceph OSD operating on that host. A location is a collection of key pairs consisting of Ceph predefined types.

root=default row=a rack=a2 chassis=a2a host=a2a1

# types (from narrowest ascending to broadest grouping)
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

Each CRUSH type has a value. The higher this value, the less specific the grouping is. So when deciding where to place data chunks or replicants of an object, Ceph OSDs will consult the crush maps to find other Ceph OSDs in other host, chassis, and racks. The fault domain policies can be defined and tweaked.

Zero To Hero Guide : : For CEPH CLUSTER PLANNING

https://www.mellanox.com/related-docs/whitepapers/WP_Deploying_Ceph_over_High_Performance_Networks.pdf – High performance ceph builds.

Ceph 1st Runthrough

ID: 34
post_title: Ceph 1st Runthrough
author: ytjohn
post_date: 2017-04-30 21:42:52
post_excerpt: “”
layout: post
permalink: https://new.yourtech.us/?p=34

published: true

These are just some notes I took as I did my first run through on installing Ceph on some spare ECS Hardware I had access to. Note that currently, no one would actually recommend doing this, but it was a good way for me to get started with Ceph.


Following this guide http://docs.ceph.com/docs/hammer/start/quick-ceph-deploy/

I set this up the first time in the lab, nodes:

  • ljb01.osaas.lab (admin-node)
  • rain02-r01-01.osaas.lab (mon.node1)
  • rain02-r01-03.osaas.lab (osd.0)
  • rain02-r01-04.osaas.lab (osd.1)

In a more fleshed out setup, I would probably have a dedicated admin node (instead of the jump), and we would start off with the layout like this:

The first ‘caveat’ was that it tells you to configure a user (“ceph”) that can sudo up, which I did. But ceph-deploy attempts to modify the ceph user, which it can’t do while ceph is logged in.

[rain02-r01-03][DEBUG ] Setting system user ceph properties..

For step 6, adding OSDs, I diverged again to add disks instead of a directory. http://docs.ceph.com/docs/hammer/rados/deployment/ceph-deploy-osd/

ceph-deploy disk list rain02-r01-01
[rain02-r01-01][DEBUG ] /dev/sda :
[rain02-r01-01][DEBUG ]  /dev/sda1 other, 21686148-6449-6e6f-744e-656564454649
[rain02-r01-01][DEBUG ]  /dev/sda2 other, ext4, mounted on /boot
[rain02-r01-01][DEBUG ] /dev/sdaa other, unknown
[rain02-r01-01][DEBUG ] /dev/sdab other, unknown
[rain02-r01-01][DEBUG ] /dev/sdac other, unknown
[rain02-r01-01][DEBUG ] /dev/sdad other, unknown
[rain02-r01-01][DEBUG ] /dev/sdae other, unknown

I will setup sdaa, sdab, and sdac. Note that while I could use a separate disk partition (like an ssd) to maintain the journal, we only have one ssd in ECS hardware and it hosts the OS. So we’ll let each disk maintain its own journal.

ceph-deploy disk zap rain02-r01-01:sdaa  # zap the drive
ceph-deploy disk prepare rain02-r01-01:sdaa # format the drive with xfs
ceph-deploy disk activate rain02-r01-01:/dev/sdaa1  # notice we changed to partition path
# /dev/sdaa1              5.5T   34M  5.5T   1% /var/lib/ceph/osd/ceph-0

Repeat those steps for each node and disk you want to activate. Could you imagine doing 32-48 nodes * 60 drives by hand? This seems like a job to be automated.

I also noticed that the drives get numbered sequentially across nodes. I wonder what kind of implications that has for replacing drives or an entire node.

[email protected]:~# df -h | grep ceph
/dev/sdaa1              5.5T   36M  5.5T   1% /var/lib/ceph/osd/ceph-0
/dev/sdab1              5.5T   36M  5.5T   1% /var/lib/ceph/osd/ceph-1
/dev/sdac1              5.5T   35M  5.5T   1% /var/lib/ceph/osd/ceph-2
[email protected]:~# df -h | grep ceph
/dev/sdaa1              5.5T   35M  5.5T   1% /var/lib/ceph/osd/ceph-3
/dev/sdab1              5.5T   35M  5.5T   1% /var/lib/ceph/osd/ceph-4
/dev/sdac1              5.5T   34M  5.5T   1% /var/lib/ceph/osd/ceph-5
[email protected]:~# df -h | grep ceph
/dev/sdaa1              5.5T   34M  5.5T   1% /var/lib/ceph/osd/ceph-6
/dev/sdab1              5.5T   34M  5.5T   1% /var/lib/ceph/osd/ceph-7
/dev/sdac1              5.5T   34M  5.5T   1% /var/lib/ceph/osd/ceph-8

After creating all this, I can do a ceph status.

[email protected]:/home/ceph/rain-cluster# ceph status
    cluster 4ebe7995-6a33-42be-bd4d-20f51d02ae45
     health HEALTH_WARN
            too few PGs per OSD (14 < min 30)
     monmap e1: 1 mons at {rain02-r01-01=}
            election epoch 2, quorum 0 rain02-r01-01
     osdmap e43: 9 osds: 9 up, 9 in
            flags sortbitwise
      pgmap v78: 64 pgs, 1 pools, 0 bytes data, 0 objects
            306 MB used, 50238 GB / 50238 GB avail
                  64 active+clean

PG’s are known as placement groups. http://docs.ceph.com/docs/master/rados/operations/placement-groups/
That page recommends that for 5-10 OSDs, (I have 9) we set this number to 512. I’m defaulted at 64. But then the tool tells me otherwise.

[email protected]:/home/ceph/rain-cluster# ceph osd pool get rbd pg_num
pg_num: 64
[email protected]:/home/ceph/rain-cluster# ceph osd pool set rbd pg_num 512
Error E2BIG: specified pg_num 512 is too large (creating 448 new PGs on ~9 OSDs exceeds per-OSD max of 32)

I’ll put this down as a question for later and set it to 128.
This does nothing, so I learned what I really need to do is make more pools. I make a new pool, but my HEALTH_WARN has changed to reflect my mistake.

[email protected]:/home/ceph/rain-cluster# ceph status
    cluster 4ebe7995-6a33-42be-bd4d-20f51d02ae45
     health HEALTH_WARN
            pool rbd pg_num 128 > pgp_num 64
     monmap e1: 1 mons at {rain02-r01-01=}
            election epoch 2, quorum 0 rain02-r01-01
     osdmap e48: 9 osds: 9 up, 9 in
            flags sortbitwise
      pgmap v90: 256 pgs, 2 pools, 0 bytes data, 0 objects
            311 MB used, 50238 GB / 50238 GB avail
                 256 active+clean

There is also a pgp_num to set, so I set that to 128. Now everything is happy and healthy. And I’ve only jumped from 306MB to 308MB used.

[email protected]:/home/ceph/rain-cluster# ceph status
    cluster 4ebe7995-6a33-42be-bd4d-20f51d02ae45
     health HEALTH_OK
     monmap e1: 1 mons at {rain02-r01-01=}
            election epoch 2, quorum 0 rain02-r01-01
     osdmap e50: 9 osds: 9 up, 9 in
            flags sortbitwise
      pgmap v100: 256 pgs, 2 pools, 0 bytes data, 0 objects
            308 MB used, 50238 GB / 50238 GB avail
                 256 active+clean

Placing Objects

You can place objects into pools with rados command.

[email protected]:/home/ceph/rain-cluster# echo bogart > testfile.txt
[email protected]:/home/ceph/rain-cluster# rados put test-object-1 testfile.txt --pool=pool2
[email protected]:/home/ceph/rain-cluster# rados -p pool2 ls
[email protected]:/home/ceph/rain-cluster# ceph osd map pool2 test-object-1
osdmap e59 pool 'pool2' (1) object 'test-object-1' -> pg 1.74dc35e2 (1.62) -> up ([8,5], p8) acting ([8,5], p8)

Object Storage Gateway

Ceph does not provide a quick way to install and configure object storage gateways. You essentially have to install apache, libapache2-mod-fastcgi, rados, radosgw, and create a virtualhost. While you could do this on only a portion of your OSD nodes, it seems like it would make most sense to do it on each OSD node so that each node can be part of the pool.


Repo change:

http://gitbuilder.ceph.com/apache2-deb-$(lsb_release -sc)-x86_64-basic/ref/master

should be:

http://gitbuilder.ceph.com/ceph-deb-$(lsb_release -sc)-x86_64-basic/ref/master

After installing the packages, you need to start configuring. http://docs.ceph.com/docs/hammer/radosgw/config/

After steps 1-5 (creating and distributing a key), you need to make a storagepool.

[email protected]:/home/ceph/rain-cluster# ceph osd pool create storagepool1 128 128 erasure default
pool 'storagepool1' created

Creating domain “*.rain.osaas.lab” for this instance. I also had to create /var/log/radosgw before I could start the radosgw service.

After starting radosgw, I had to chown the fastcgi.sock file ownership:

chown www-data:www-data /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock

Next, you go to the admin section to create users.

[email protected]:/var/www/html# radosgw-admin user create --uid=john --display-name="John Hogenmiller" [email protected]
    "user_id": "john",
    "display_name": "John Hogenmiller",
    "email": "[email protected]",
    "max_buckets": 1000,
    "keys": [
            "user": "john",
            "access_key": "KH6ABIYU7P1AC34F9FVC",
            "secret_key": "OFjRqeMGH26yYX9ggxr8dTyz9KYZMLFK9W5i1ACV"
    "temp_url_keys": []

Or specify a key like we do in other environments.

[email protected]:/var/www/html# radosgw-admin user create --uid=cicduser1 --display-name="TC cicduser1" --access-key=cicduser1 --secret-key='5Y4pjcKhjAsmbeO347RpyaVyT6QhV8UHYc5YWaBB'
    "user_id": "cicduser",
    "display_name": "TC cicduser1",
    "keys": [
            "user": "cicduser1",
            "access_key": "cicduser1",
            "secret_key": "5Y4pjcKhjAsmbeO347RpyaVyT6QhV8UHYc5YWaBB"

Fun fact: You can set quotas and read/write capabilities on users. It also can do usage statistics for a given time period.

All of the CLI commands can be implemented over API: http://docs.ceph.com/docs/hammer/radosgw/adminops/ – in this, just adding /admin/
(configurable) to the url. You can give any S3 user admin capabilities. It’s the same backend authentication for both.

I also confirmed that by installing radosgw on a second node, all user ids and bucket was still available. Clustering confirmed.


When it comes to automating this, there are several options.

Build our own ceph-formula up into something that fully manages ceph.


  • It will do what we want it to.


  • Our current ceph-formula currently only installs packages.
  • Lots of work involved

Refactor public ceph-salt formula to meet our needs.



  • ceph-salt seems to cover most elements, including orcehstration
  • uses a global_variables.jinja much like we use map.jinja


  • I’m sure we’ll find something wrong with it. (big grin)
  • maintained by 1 person
  • last updated over a year ago

Use Kolla to setup Ceph:


* Openstack team might be using Kolla – standardization
* Already well built out
* Puts ceph components into docker containers (though some might consider this a con)


  • It’s reported that it work primarily on Redhat/Centos; less so on Ubuntu
  • Uses ansible as underlying management – this introduces a secondary management system over ssh
  • Is heavily opinionated based on Openstack architecture (some might say this is a pro)

Use Ansible-Ceph:



  • Already well built out
  • Highly flexible/configurable
  • Works on Ubuntu
  • Not opinionated
  • maintained by ceph project
  • large contribution base


  • Uses ansible as underlying management – this introduces a secondary management system (in additon to salt) over ssh

postgresql hstore is easy to compare

hstore is an option key=>value column type that’s been around in postgresql for a long time. I was looking at it for a project where I want to compare “new data” to old, so I can approve it. There is a hstore-hstore option that compares two hstore collections and shows the differences.

In reality, an hstore column looks like text. It’s just in a format that postgresql understands.

Here, we have an existing record with some network information.

hs1=# select id, data::hstore from d1 where id = 3;
 id |                          data                          
  3 | "ip"=>"", "fqdn"=>"hollaback.example.com"
(1 row)

Let’s say I submitted a form with slightly changed network information. I can do a select statement to get the differences.

hs1=# select id, hstore('"ip"=>"", "fqdn"=>"hollaback01.example.com"')-data from d1 where id =3;
 id |             ?column?              
  3 | "fqdn"=>"hollaback01.example.com"
(1 row)

This works just as well if we’re adding a new key.

hs1=# select id, hstore('"ip"=>"", "fqdn"=>"hollaback01.example.com", "netmask"=>""')-data from d1 where id =3;
 id |                           ?column?                            
  3 | "fqdn"=>"hollaback01.example.com", "netmask"=>""
(1 row)

This information could be displayed on a confirmation page. Ideally, a proposed dataset would be placed somewhere, and a page could be rendered on the fly showing any changes an approval would create within the database.

Then we can update with the newly submitted form.

hs1=# update d1 set data = data || hstore('"ip"=>"", "fqdn"=>"hollaback01.example.com", "netmask"=>""') where id = 3;

hs1=# select id, data::hstore from d1 where id = 3; id |                                         data                                         
  3 | "ip"=>"", "fqdn"=>"hollaback01.example.com", "netmask"=>""
(1 row)

Note that if I wanted to delete a key instead of just setting it to NULL, that would be a separate operation.

update d1 SET data = delete(data, 'ip') where id = 3;


Programming Uniden AMH-350 for APRS

This is a narrative post. If you want to see my python program that calculates out the diode matrix, skip to the end or click here,

I recently received this “Force Communications AMH-350” radio. Actually, it was an entire cabinet with a large power supply, an MFJ TNC2 tnc, and an old DOS PC running JNOS. These had active in a tower shed and turned off 3 years ago. The club wanted me to repurpose this packet system for APRS.

Once I plugged it in, the computer booted up to JNOS, but the radio and TNC did not turn on. The power supply had a plastic box on the back with a larger bussman 30A fuse. When I pulled it out, corrosion dust leaked out. I made a trip to the hardware store and replaced it. The radio turned on but not the TNC. On the front I found 3 smaller fuses and a note describing that “F3” ran the TNC. Pulled that fuse out and it was dead. A second trip to the hardware store got this fuse replaced. Then I plugged everything back in and turned on the power supply. Within 10 seconds, the “make it work smoke” had leaked out of the TNC2. This is probably why the F3 fuse had blown in the first place. This was disappointing, because there is new firmware for the TNC2 that makes it a decent APRS TNC, no computer needed.

The computer, I deemed too old to run a soundcard packet (using direwolf as my driver), so this left me with the power supply and radio. Grounding out the PTT line and using a frequency counter, it showed me “channel 2” was transmitting on 145.050. Channel 1 was not programmed at all.

A quick google search told me that Uniden bought Force Communications and sold this radio as a Uniden AMH-350. I found 2 other people looking for how to program it (one in 1994, and the other in 2004) with no response. I found someone selling the radio’s manual on ebay for $20. I offered them $10 and received the manual earlier this week.

The radio itself is programmed with a common cathode diode matrix, representing a binary value. Here is a picture of one back side of it programmed for 145.050. The manual provides a table covering frequencies from 148Mhz to 174Mhz in 5khz increments. Fortunately, it provides a formula on how to come up with your own frequencies. I ran through this formula multiple times getting different results from the book, till I realized the book was rounding some values UP or outright disregarding fractional parts. It also took a bit to wrap my head around binary “1” being disconnected (or cut) and binary “0” being connected. That felt backwards to me.

Eventually though, I was able to match the book, create a chart that matched the existing programmed 145.050 frequency (both Tx and Rx, which are programmed separately). Then, I wrapped the whole thing up in a set of python functions inside an ipython notebook. You can view this on ipython’s nbviewer or the direct gist.

I don’t have the radio programmed yet. I feel getting the diode matrixes out of “channel 2” and still having them useful for programming with is going to be difficult. I will need 7 diodes connected for each Tx and Rx slot, 14 total. I am attempting to program up channel 1. By the time I got to this portion, I was a bit tired and making mistakes, so I called it a night. Once I get to building out the programming board, I’ll post some more pictures.

rPI DPI Display, cheap.

Recently the internet noticed the Raspberry Pi could drive LCD panels using DPI. This allows very inexpensive displays to be used with basically no additional hardware.

This is not a full post, just capturing some details from someone elses blog post so I don’t lose it (will my site become my next bookmark holder?).

Let’s add a dirt cheap screen to the Raspberry Pi B+


Total: $43.85

That being said, I see a Pi: 7″ Display no Touchscreen 1024×600 w/ Mini Driver for $70 with no messing with HDMI.

I also recently purchased a 5″ 800×480 screen with an hdmi driver off of a chinese vendor on ebay for $42.75. It just takes a month or so to arrive.


Enter the Matrix

I used to be a big proponent of xmpp. However, over the years my enthusiasm has waned for it. I’m not the only one. Essentially, these days if your chat service is not done over HTTP(s), and if it doesn’t have persistence, your chat service is now legacy. Yes, I still enjoy IRC, and I think it’s great for ephemeral communications. But in this multi-device, mobile world, it’s hard to use IRC as a daily driver for my friends and coworkers.

Several months ago, I started looking into chat systems again for a different reason than most – amateur radio. There’s this thing in amateur radio called Broadband-Hamnet, which is a wireless mesh network. It’s not the first mesh system out there, but it has a really good initiative behind it. The idea behind it is that all nodes are configured to use the same SSID and the network is self configuring. If I stand up a node here at my house, someone else, having never spoken to me before, could deploy a node within range of mine and the two would connect. They would be able to see the node, any services I offer, and use them. DNS and service advertising is built in.

I wanted to come up with some “generic” mesh nodes with a connected server (raspberry pi). The idea that you could grab a couple of these boxes, deploy them in the field and operators would be able to share files, chat, and even video. The big catch was that you never knew what systems would be online at any given time.

I looked into standing up an IRC server with a web front end. This had a problem in that no historical messages would be synchronized during a netjoin. There are a number of P2P chat systems, though most of these require some sort of “bootstrap” system. Even worse, for an amateur radio system under FCC regulation, most of these are focused around encryption. Tox.im would be a good choice, except it would violate the no message obscuring rule of FCC part 97 that governs the Amateur Radio service.

I even started conceiving of a system based on the idea of a pub/sub message queue, except json over http. Nodes would subscribe to a channel and any message posted to a channel would get propagated to all the subscribing nodes. Using twisted, I could also create gateways for standard IRC or XMPP clients.

Well fortunately for me (and you) a group went out and did just that, only much much better than anything I could have put together. Matrix.org has put together a federated chat specification. The concept is really simple – json over http(s). They have a reference implementation called Synapse that is written in twisted. People run homeservers of synapse and will join channel. A channel is shared between all homeservers that subscribe to it and all channel events are propogated until consistency is achieved. This means that if a homeserver joins the channel late, or goes a way for a while, it will eventually achieve a complete history of all message events within the channel.

If you run your server on the default port of either 8008 for HTTP or 8448 of HTTPS, the only DNS record you need is an A record. If you use another port like 443, then you add a DNS SRV record stating the host and port (just like with XMPP).

While the project still has a few rough edges, it is definitely usable today. The most stable implementation is on matrix.org but you can also join my homeserver at matrix.ytnoc.net.

Everyday Superadmin

Every morning I wake up bleary eyed
Another victim of aggressive deadlines
I don’t think this is supportable
It’s not good at all

I’m just your average ordinary everyday super admin
Trying to save the build but never really done
I’m just your average ordinary everyday super admin
Nothing more than that, that’s all I really am

Just an all day job, there’s so much left to do
It’s kinda hard when everything fails on you
Try to make it look easy, gonna make it look good
Like anybody would

I’m just your average ordinary everyday super admin
Trying to save the build but never really done
I’m just your average ordinary everyday super admin
Nothing more than that, that’s all I really am

I’m just like everybody else
After all the hype it’s hard to tell
I keep my game face on so well

‘Cause I’m just your average, ordinary, everyday super admin
I’m trying to save the build
I’m just your average ordinary everyday super admin

I’m trying to find my token key
And no one knows it’s really me
It’s really me, it’s really me
Oh, yeah

I’m just your average ordinary everyday super admin
Trying to save the build but never really done
I’m just your average ordinary everyday super admin
Nothing more than that, that’s all I really am

I’m just your average ordinary everyday super admin
I’m trying to save the build
I’m just your average ordinary everyday super admin
Yeah, yeah

I’m just your average ordinary everyday super admin
Trying to save the build but never really done
I’m just your average ordinary everyday super admin
Nothing more than that, that’s all I really am

I’m just your average ordinary everyday super admin

All about git pull

All about git pull

Because you know
I’m all about git pull
’bout git pull, no copies
I’m all about git pull
’bout git pull, no copies
I’m all about git pull
’bout git pull, no copies
I’m all about git pull
’bout git pull

Yeah, it’s pretty clear, I ain’t no guru
But I can break it, break it
Like I always do.
And now I got that pull request, that we all want.
With all the right code in all the right places.

I see that manual push with rsync
We know that shit aint right
It caused my script to stop.
If you got fixes fixes, just check ’em in
So every line of code is peer reviewed
from the bottom to the top

Yeah, my coworker he told me don’t worry about the style
He says “We just need a little more testing to make it right.”
You know I won’t trust the logs to /dev/null
So if that’s where you put them then you better use syslog.

Because you know
I’m all about git pull
’bout git pull, no copies
I’m all about git pull
’bout git pull, no copies
I’m all about git pull
’bout git pull, no copies
I’m all about git pull
’bout git pull