How to unit-test code that interacts with a database

I got some interesting comments on my previous article about unit testing Maatkit, including echoes of my own conversion to the unit-testing religion. One of the objections I’ve heard a lot about unit-testing is how it’s impossible to test code that talks to a database. “It’s too hard,” they say. “Oh, it’s easy to test a module that calculates a square root, but a database? Way too much work!”

Is it really impossible or even hard?

I disagree. In one of my previous articles I said The Rimm-Kaufman Group, my previous employer, has a comprehensive unit-test suite. When I say comprehensive I mean it: database interaction is fully tested, too. I know because I was heavily involved in building it. Even extremely complex things like big reports that are generated from lots of data are tested. And believe me, sharding the databases would have been much harder without complete code coverage. It’s really not that complicated to unit-test against a database, and it’s so worth it. Here are some hints about how you can do this.

There are many ways to do it, but I’ll just describe the basics of the system I helped build. There are several moving parts to the test suite (”smoke“), but one of them sets a magical environment variable. And then, all code that connects to a database server magically gets back a different database connection from the create_me_a_connection() function. This is because there is a database connection abstraction library that respects the environment variable. It’s really pretty simple for the most part; instead of doing DBI->connect(…) you just call this function, which is a thin wrapper that hands back a connection object.

This wrapper is itself unit-tested thoroughly, too. This ensures that when some code is being run from a test, it cannot (I mean cannot!) connect to a production database, and vice versa. There are some conventions about production and test servers that make sure the abstraction library can tell for sure. If there’s any confusion, of course, it will die in a non-recoverable way. Safety first.

Building a good development environment

Just as each developer has their own copy of the code from version control, each developer has their own private database server running on the dev machine. There are some simple conventions that make this possible: Unix user ID plus a constant for the port number, etc. It’s really quite easy. The private database server is a slightly modified version of Giuseppe Maxia’s MySQL Sandbox tool. It can be torn down and set up afresh as desired. It is wiped clean and re-filled at the start of every test, with a small, tightly focused dataset carefully chosen to represent the conditions the code is supposed to work with. (Each test has its own dataset).

If this sounds like a system that can’t work on a large scale, well, it does. That’s the secret sauce that I won’t reveal in this post. (It’s my past employer after all, and I can’t go revealing everything about them can I?) You just have to be smart about it. When a database is central to your business, you either figure out how to get this right, or you pay the consequences in lost time and poor code quality.

I and the other developers there (another secret: it’s a small team; small teams build great things) built several quick utilities to help develop unit tests against a database. There are utilities to get a minimal necessary dataset for testing and dump it into a file that can be loaded by the test. There are utilities that can migrate schemas and update the tests to match the schema changes. And so on, and so on. This is possible because of careful planning for testability, and really smart things like super-consistent and sensible naming conventions for database objects. (Ruby On Rails owes a lot of its success to simple things like this, too. Conventions are really powerful.) Maybe I’ll write about the database naming conventions some other time — I have to credit Alan Rimm-Kaufman a lot for designing those conventions. It was a stroke of genius.

Things to avoid

There are several things I do not recommend doing when you unit-test code that talks to a database. I’ll just mention a couple:

    Don’t mock anything! In general I think mocking is the devil. Most of the mock objects I’ve ever seen reflected a propensity to test an implementation instead of a behavior, which is also the devil. Write all your code to test a test instance of something real, and do not mock up a database to test against. It is a rabbit-hole that you will not emerge from easily.
  • Never let a test connect to a production database. Never, ever. Worlds of hurt will follow. Not only are you risking your production data, but what about the risk to your code? You’re testing against things that will almost certainly change and break your tests; and you’re possibly polluting your live data with testing data and/or changing live data from the tests.
  • I also recommend developing unit tests for your current database functionality if you’re thinking about changing it much. Don’t like MySQL’s lax error handling? Plan to set the SQL_MODE to something stricter? Dive into that database abstraction library and make your tests run in strict mode first by setting SQL_MODE on every new connection that’s created when running inside a test; fix all the breakage in the test suite; feel sure that your code isn’t going to break in production. That was easy!

Summary

Once your creative juices get flowing, you’ll see tons of places your unit test suite can help you out.

If you’re in the Oracle or SQL Server world, or any other world where you can’t just set up and discard database instances at will due to licensing problems, you’re going to have to be a little more inventive. But you can still do it. (Don’t you wish you’d chosen Freedom?) And unit tests are just as beneficial for apps based on Oracle as they are for MySQL.

Have fun! Go forth and test some more!

Technorati Tags:, , , ,

You might also like:

  1. How to write unit tests for ease of refactoring
  2. How Maatkit benefits from test-driven development
  3. How to track what owns a MySQL connection
  4. Temporary table subtleties in MySQL

Microsoft gets their way with so-called XML standard

It has all played itself out according to Microsoft’s wishes. They have railroaded through a so-called standard for document representation, gotten it rubber-stamped by so-called standards bodies, and fought their way past all the objections of sensible people and companies. In the process, lots of developing nations have been steamrolled, too. Shame, shame, shame.

Technorati Tags:,

No related posts.

How Maatkit benefits from test-driven development

Over in Maatkit-land, Daniel Nichter and I practice test-first programming, AKA test-driven development. That is, we write tests for each new feature or to catch regressions on each bug we fix. And — this is crucial — we write the tests before we write the code.* The tests should initially fail, which is a validation that the new code actually works and the tests actually verify this. If we don’t first write a failing testcase, then our code lacks a very important guarantee: “if you break this code, then the test case will tell you so.” (A test that doesn’t fail when the code fails isn’t worth writing.)

Most of the time when I do this, I write a test, it fails because I haven’t written any code yet, and I then go do some kind of clean-room coding. Then I run the test and it’s busted, and I have to go back to the code and figure out why, and after a few more tries I get it working. And then it feels great. (That’s the other thing about test-first coding. It’s really satisfying, like cooking the perfect dinner, arranging the plates beautifully and then eating.)

This time I wanted to write a pure-Perl implementation of CRC32, and embed it in mk-table-checksum. We try really hard never to rely on external modules, even modules that ought to be distributed with Perl itself. That keeps Maatkit as portable as possible and makes sure there is no installation hell. You can generally just get and run the Maatkit tools with no installation. So I referred to an existing CRC32 implementation, in Digest::Crc32. I wrote a test by referring to the value I got from MySQL’s built-in CRC32:

mysql> select crc32('hello world');
+----------------------+
| crc32('hello world') |
+----------------------+
|            222957957 | 
+----------------------+
1 row in set (0.00 sec)

Here’s the test:

is($c->crc32('hello world'), 222957957, 'CRC32 of hello world');

CRC32 is CRC32, so my code better agree with a working implementation. And then I wrote the code, which is a refactoring of the math in the module I linked to above. And then I ran the test, and it Just Passed with no further ado. w00t! This is pretty much a historic first for me! I thought at first that I’d screwed something up with the test, but I checked again. This is like getting a hole-in-one for me :-) So I just thought I’d share it with you. It feels awesome.

If you’re not doing test-first coding, you ought to give it a try. If you are conscientious about writing tests first, your code will always be easy to test. If you don’t, you write untestable code. Then it’s tough or impossible to ever get tests on it, and you spend the rest of your life wasting time on stupid bugs and slow, fearful development, never knowing what else you are breaking with your “fixes.”

Test-driven development is one reason The Rimm-Kaufman Group’s in-house bidding system blows away their competition. (RKG is my previous employer.) The comprehensive unit-test suite lets you know right away if you’ve broken something. That keeps the code clean and makes it possible to be extremely productive. I remember once when one of my co-workers there implemented a major feature in a very short time. It was also incredibly helpful when sharding the databases (anyone ever done this without a test suite? Would you like to share about how much of your systems broke during sharding? It was almost a non-event at RKG). The people I worked with before I joined RKG looked at me like an alien when I tried to explain that this was possible.

If you’re thinking that your code is not “that kind of code,” that “only certain kinds of code lend themselves to unit tests,” then stop. I’ve heard this before, and you’re wrong. It’s only “untestable” because you didn’t write tests first. Write tests first, and your code — all of it! — will be “that kind of code” that is testable. It’s hard. No one says it’s not; good programming is much harder than sloppy programming. But it’s well worth it.

Converting untested, untestable code into tested code is not so much fun, though. And in my experience you’ll rarely be rewarded for it, and your coworkers will not appreciate you raising the bar for them. Maybe you need a new job. I hear RKG is hiring. Did I mention that their codebase is built from the ground up on unit tests?

* OK, we’re not perfectly disciplined about this, but we’re pretty good about it.

Technorati Tags:, , , ,

You might also like:

  1. How to write unit tests for ease of refactoring
  2. How to unit-test code that interacts with a database
  3. Percona wants to hire a Maatkit developer
  4. Maatkit bounty begins tomorrow
  5. New Maatkit release policy

How to emulate the TYPEOF() function in MySQL

Want to know the type of an arbitrary expression in MySQL? Someday in the far far future in version 7.1, you might be able to with the TYPEOF() function.

For now you can try this:

CREATE TEMPORARY TABLE typeof AS SELECT [expression] AS col;

For example, let’s see what the type of CRC32 is.

mysql> CREATE TEMPORARY TABLE typeof AS SELECT CRC32('hello world') AS col;
mysql> DESCRIBE typeof;
+-------+------------------+------+-----+---------+-------+
| Field | Type             | Null | Key | Default | Extra |
+-------+------------------+------+-----+---------+-------+
| col   | int(10) unsigned | NO   |     | 0       |       | 
+-------+------------------+------+-----+---------+-------+

This is one possible way to programmatically determine the type of an expression — even an arbitrarily complex one.

Not beautiful, but it might get the job done. Other ideas?

Technorati Tags:, ,

You might also like:

  1. How to simulate the SQL ROW_NUMBER function

Anyone want to help build RPMs of Maatkit?

Dear LazyWeb, I want to use my Ubuntu laptop (on amd64 BTW) to build an RPM of Maatkit that will work on all RPM-based distros. Is it possible? Or are there enough differences between the RPM-based distros that I can’t do it? Mind you, the finished RPM ought to just have some man pages and Perl scripts, so I don’t think it will be platform- or distro-specific. But I am just not an expert on it.

The second question is, what do I need to put into my Makefile to do this? My ‘make all’ currently builds a .zip, a .tar.gz, and a .deb package — what needs to change to make that include .rpm?

Someone who is willing to help create .spec files, etc, etc will be immediately given commit rights to Maatkit’s SVN repository!

Technorati Tags:, ,

You might also like:

  1. New Maatkit release policy
  2. Maatkit in RHEL and CentOS

Maatkit version 2152 released

Download Maatkit

Maatkit version 2152 is ready for download. This release is also known as the “is this project really alive?” release. We thought we should delay until MySQL released a new Community Server version. Just kidding — it has nothing to do with that.

This release is also very significant in that it’s the first one that has large code contributions by someone other than myself. As you may know, Percona (my employer) has hired the very talented Daniel Nichter, author of mysqlreport and other goodies, to help with Maatkit. So far it is a match made in heaven, and Daniel did most of the coding for this release.

This is also our first release since Ask helped me move the project (thank you Ask!) to Google Code. That means you finally get a decent interface for entering issues, etc, etc. The only thing remaining on Sourceforge at this point is the online documentation, which I will probably move to maatkit.org soon. But more importantly, it means the developers have a decent interface for issues, etc etc. Sourceforge is just a bloody nightmare — their site keeps getting harder and harder to use, both as a developer and as a user. It had gotten to the point where simply adding the files to the site for download would take me hours. I tried to automate it, in true Perl fashion, but their make-a-release forms resisted my every effort. I cannot say what a relief it is to have usable project hosting that gets out of my way and lets me work. A double thanks to Ask for pushing me over the edge on this — it had been on my mind a long time. And thanks to Google, too, for a great project management interface.

Also note that the Sourceforge forums and mailing lists are dead. Google Groups is the preferred replacement.

Keep reporting those bugs and feature requests!

As you might expect, the changelog for such a long release cycle is, er, large. There’s a lot of new stuff here. I’d like to highlight the new features in mk-parallel-dump and mk-parallel-restore — which I just used to reduce a job that would have taken weeks down to mere days — and a lot of new code in mk-table-sync, as well as the up-and-coming mk-audit, which is in release-early/often mode.

Changelog for mk-archiver:

2008-08-11: version 1.0.10

   * Files downloaded directly from SVN crashed due to version information.
   * Added more information to --statistics and changed --whyquit slightly.

Changelog for mk-audit:

2008-08-11: version 0.9.1

   * Files downloaded directly from SVN crashed due to version information.
   * Added useful functionality.

Changelog for mk-deadlock-logger:

2008-08-11: version 1.0.11

   * Files downloaded directly from SVN crashed due to version information.

Changelog for mk-duplicate-key-checker:

2008-08-11: version 1.1.7

   * Files downloaded directly from SVN crashed due to version information.
   * Full-text indexes were not treated specially (issue 10).

Changelog for mk-fifo-split:

2008-08-11: version 1.0.1

   * Files downloaded directly from SVN crashed due to version information.
   * Added --offset option.
   * --statistics didn't calculate lines/sec properly.
   * Removed --sleep; EOF doesn't mean anything to a non-terminal.

Changelog for mk-find:

2008-08-11: version 0.9.12

   * Files downloaded directly from SVN crashed due to version information.
   * Added --exec_dsn so you can execute SQL on a different server.

Changelog for mk-heartbeat:

2008-08-11: version 1.0.10

   * Files downloaded directly from SVN crashed due to version information.

Changelog for mk-parallel-dump:

2008-08-11: version 1.0.9

   * Files downloaded directly from SVN crashed due to version information.
   * Added --progress option.
   * CHANGE MASTER TO in 00_master_data.sql used the I/O thread position.
   * Added features to permit resuming of dumps.
   * --age without --sets did the opposite of what it should (isssue 7)
   * --stopslave died after complaining the slave was not running.

Changelog for mk-parallel-restore:

2008-08-11: version 1.0.8

   * Files downloaded directly from SVN crashed due to version information.
   * Added --progress option.

Changelog for mk-query-profiler:

2008-08-11: version 1.1.11

   * Files downloaded directly from SVN crashed due to version information.

Changelog for mk-show-grants:

2008-08-11: version 1.0.11

   * Files downloaded directly from SVN crashed due to version information.
   * Anonymous users were not permitted (issue 28).

Changelog for mk-slave-delay:

2008-08-11: version 1.0.8

   * Files downloaded directly from SVN crashed due to version information.

Changelog for mk-slave-find:

2008-08-11: version 1.0.2

   * Files downloaded directly from SVN crashed due to version information.

Changelog for mk-slave-move:

2008-08-11: version 0.9.2

   * The -m option was not recognized as an alias for --timeout.
   * Files downloaded directly from SVN crashed due to version information.

Changelog for mk-slave-prefetch:

2008-08-11: version 1.0.3

   * Files downloaded directly from SVN crashed due to version information.
   * Added the --numprefix option for use in sharded data stores.
   * The Rotate binary log event type was not handled.

Changelog for mk-slave-restart:

2008-08-11: version 1.0.8

   * Files downloaded directly from SVN crashed due to version information.

Changelog for mk-table-checksum:

2008-08-11: version 1.1.28

   * Files downloaded directly from SVN crashed due to version information.

Changelog for mk-table-sync:

2008-08-11: version 1.0.8

   * Files downloaded directly from SVN crashed due to version information.
   * --synctomaster did not abort when unable to discover the master.
   * An error waiting for the master to catch up caused other tables to fail.
   * Added --bufferinmysql to help make GroupBy algorithm more efficient.
   * Added safety checks to prevent changing data on a slave server.
   * Added --skipslavecheck to prevent safety checks on destination server.
   * Made the GroupBy algorithm the default replacement for Stream.
   * Added the GroupBy algorithm, which can sync tables without unique keys.
   * Syncing could stop and leave a row to delete in the destination.
   * Generate command-line help from the POD.

Changelog for mk-visual-explain:

2008-08-11: version 1.0.9

   * Files downloaded directly from SVN crashed due to version information.
Technorati Tags:, , , ,

You might also like:

  1. Maatkit version 1709 released
  2. Maatkit version 1877 released
  3. Maatkit version 1508 released
  4. Maatkit version 1674 released
  5. Maatkit version 1753 released

How to select the Nth greatest/least/first/last row in SQL

This is a continuation of my articles on how to select the desired rows from ranked data. A user recently posed a question in the comments that I thought was particularly intriguing:

What is the best way to query 1) Sum of min price of all types? 2) Sum of 2nd highest price of all types?

Sounds like fun! Let me start by saying the sum is the easy part. You can always do that like so:

select sum(price) from (
   -- find desired rows here
) as x;

Finding the desired rows is the hard part. In my previous articles I focused on extrema:

  • The single biggest/smallest/extremest row in each group. (Pretty easy.)
  • The N most extreme rows in each group. (Doable, but harder.)

In this article, we’re going to see how to get not the most extreme row, not the N most extreme rows, but — hold your breath — the single Nth most extreme row per group. (In a future article I might talk about how to get the Nth through Mth most extreme rows.)

The setup

Let’s create some sample data to get started.

drop table if exists fruits;

create table fruits (
   type varchar(20) not null,
   variety varchar(20) not null,
   price int not null,
   primary key (type, variety)
);

insert into fruits values
('apple',  'fuji',       1),
('apple',  'gala',       2),
('apple',  'limbertwig', 3),
('cherry', 'bing',       4),
('cherry', 'chelan',     5),
('orange', 'navel',      6),
('orange', 'valencia',   7),
('pear',   'bartlett',   8),
('pear',   'bradford',   9);

For convenience so it’s easier to see how they are ordered, I’ve just ordered the fruits alphabetically and given them unique prices.

The desired results — second-cheapest prices for each fruit — are as follows:

+--------+-----------------+
| type   | second_cheapest |
+--------+-----------------+
| apple  |               2 | 
| cherry |               5 | 
| orange |               7 | 
| pear   |               9 | 
+--------+-----------------+

The solution

The intuition you need here is that if you get the 2 cheapest fruits in each group, and then take the single most extreme from each group, you can get the Nth offset. Let’s begin with one of the queries from my earlier article. (You should be able to use any of them. I’m just using this one because it’s convenient and pretty clear.)

select type, variety, price
from fruits
where (
   select count(*) from fruits as f
   where f.type = fruits.type and f.price < fruits.price
) <= 1;
+--------+----------+-------+
| type   | variety  | price |
+--------+----------+-------+
| apple  | fuji     |     1 | 
| apple  | gala     |     2 | 
| cherry | bing     |     4 | 
| cherry | chelan   |     5 | 
| orange | navel    |     6 | 
| orange | valencia |     7 | 
| pear   | bartlett |     8 | 
| pear   | bradford |     9 | 
+--------+----------+-------+

The result is the 2 cheapest fruits from each type. (Notice that all we really did was eliminate one row — the most expensive apple.) Now let’s get the second cheapest — and what is that? It’s simply the most expensive of the fruits we found in that query. And that’s just a MAX().

select type, max(price) as second_cheapest
from (
   select type, variety, price
   from fruits
   where (
      select count(*) from fruits as f
      where f.type = fruits.type and f.price < fruits.price
   ) <= 1
) as x
group by type;
+--------+-----------------+
| type   | second_cheapest |
+--------+-----------------+
| apple  |               2 | 
| cherry |               5 | 
| orange |               7 | 
| pear   |               9 | 
+--------+-----------------+

That’s it!

Sum of the second cheapest

By now you probably see the pattern: do it one step at a time, turning each thing into a simpler question that’s easy to answer. So how do we sum the second cheapest prices for each type of fruit? First, we find them (done!), then we sum them.

select sum(second_cheapest) from (
   select type, max(price) as second_cheapest
   from (
      select type, variety, price
      from fruits
      where (
         select count(*) from fruits as f
         where f.type = fruits.type and f.price < fruits.price
      ) <= 1
   ) as x
   group by type
) as y;
+----------------------+
| sum(second_cheapest) |
+----------------------+
|                   23 | 
+----------------------+

Conclusion

In this post I showed you how to decompose the problem into simpler and simpler pieces. Often what’s hardest about a complex query is trying to do it all at once. I have lots of tips elsewhere on this blog about how to make things faster — this is not a particularly fast query — but here I just wanted to show how to get the correct answer.

Technorati Tags:

You might also like:

  1. How to number rows in MySQL
  2. How to select the first/least/max row per group in SQL
  3. How to simulate the SQL ROW_NUMBER function
  4. Advanced MySQL user variable techniques
  5. How to select the first or last row per group in SQL

My new favorite comic: The Adventures of Ace, DBA

I follow only a few comics in my feed reader: Get Fuzzy, Dilbert, XKCD, and now The Adventures of Ace, DBA. It’s kind of XKCD-ish, only it’s about Oracle and it doesn’t have the extra punch line when you hover your mouse over the picture. And it’s proof that a DBMS (even one I don’t use) can be pretty damn funny.

Technorati Tags:, ,

No related posts.

How to scale writes with master-master replication in MySQL

This post is SEO bait for people trying to scale MySQL’s write capacity by writing to both servers in master-master replication. The short answer: you can’t do it. It’s impossible.

I keep hearing this line of reasoning: “if I make a MySQL replication ‘cluster’ and move half the writes to machine A and half of them to machine B, I can increase my overall write capacity.” It’s a fallacy. All writes are repeated on both machines: the writes you do on machine A are repeated via replication on machine B, and vice versa. You don’t shield either machine from any of the load.

In addition, doing this introduces a very dangerous side effect: in case of a problem, neither machine has the authoritative data. Neither machine’s data can be trusted, but neither machine’s data can be discarded either. This is a very difficult situation to recover from. Save yourself grief, work, and money. Never write to both masters.

Technorati Tags:, , ,

You might also like:

  1. How to sync tables in master-master MySQL replication

How I hacked the HP Media Vault to support OGG and FLAC files

Let me begin by saying “I am so not a gadget guy.” I don’t have an iPhone. Heck, I didn’t have a cellphone at all until April when I joined Percona as a consultant. I don’t ooh and aah over other people’s gadgets most of the time. I don’t have, you know, that kind of envy. I’m sure you see where this is going: I got a gadget and I think it’s really cool.

Anyway, my wife and I have a bunch of computers (desktops and laptops) and we had been feeling the pain for a long time: the files were only on one computer, and we wanted them available. I built a file server and then realized that it was going to be really expensive in terms of power alone, so I went back to USB drives for backups, and kept thinking about it.

HP Media Vault

After a long time I decided to buy an HP Media Vault and install ultra-low-power, oversized disks in it — I did that, and will write about it elsewhere. And then I discovered that it has a media server in it. And not being a gadget guy, I had honestly never heard about these things before. Really. I read up on it a little bit and decided hell, sharing files is nice, but I have about a thousand CDs that could go on this thing, and my wife has hundreds too. That’s even better than file sharing! I copied the music from her iTunes library to the shared Music folder on the server and boom, Rhythmbox magically saw it all. I couldn’t believe I’d never heard about this before. Best thing since sliced bread.

I even had all my music ripped already to my iRiver HD340. In OGG format. And then I found out the HP Media Vault doesn’t support OGG format. Boo! Boo! Rubbish! Filth! Slime!

So I fixed that. Now I’ll show you how.

Disclaimer

If you try what’s on this page and something breaks, it is your fault, not mine. I make this information available without any warranties or representations.

The basics: log into the server

The HP documentation for the Media Vault is totally incomplete and assumes you want to install their GUI program and control the thing from your Windows desktop. There’s a much better way. The Media Vault has a full-featured web interface. Log into the web console. I’m going to assume that your HP Media Vault’s DNS name is hpmediavault, so you can log into it with this URL. Once you do, set the admin password to secure the server. Remember it.

The next fun thing: the server runs GNU/Linux and has SSH enabled by default. Yes, that’s right: you can just SSH into the thing! The password you set in the previous step is now your SSH password. Your SSH username is root, no matter what you set the admin username to.

Next, open up a terminal and SSH right in:

ssh root@hpmediavault

Type the password you chose in the previous step. You should see the following:

baron@kanga:~$ ssh root@hpmediavault
root@hpmediavault's password: 


BusyBox v1.01 (2008.02.08-22:41+0000) Built-in shell (ash)
Enter 'help' for a list of built-in commands.

-sh: can't access tty; job control turned off
# 

As you can see, the server runs with a stripped-down set of command-line tools called BusyBox. You’re golden. Let’s get working on installing OGG and FLAC support. This will not be hard at all if you can use a command-line editor.

Step 1: install ipkg

Behind the scenes, the Media Vault’s media streaming is provided by Firefly, formerly known as mt-daapd (DAAP is the iTunes server protocol). This is a Free Software media server, and it’s highly capable. But the version that ships on the device is old and doesn’t support OGG. You’re going to fix that by installing a newer version. But first, you have to install a package management system that will install the newest Firefly software for you.

The package management system is ipkg, the Itsy Package Management System. It’s really easy to install. First, let’s see where your hard drives are mounted:

# mount
/dev/md6 on /share/1000 type ext3 (data=writeback)

If yours isn’t /share/1000, use a different value in the following commands. Now you want to make an installation directory and change to that directory:

# mkdir -p /share/1000/tmp && cd /share/1000/tmp

Now let’s find the installation image to download. Go look here for the latest version of the image:

http://ipkg.nslu2-linux.org/feeds/optware/cs05q3armel/cross/unstable/

Search for “hpmv2-bootstrap” on that page. You should find a file something like this: hpmv2-bootstrap_1.2-4_arm.xsh. Copy the link location for that, and go back to your command prompt. Now download the file to the Media Vault, substituting the correct URL into the command below:

# wget http://ipkg.nslu2-linux.org/feeds/optware/cs05q3armel/cross/unstable/hpmv2-bootstrap_1.2-4_arm.xsh

And now, just execute it:

# sh ./hpmv2-bootstrap_1.2-4_arm.xsh

You should see “Setup complete” when it’s done. That’s it. It installs itself and mounts the installation directory as /opt, which is where all your software will appear in the future. This will persist after a reboot. You can see the changes with the mount command:

# mount
/dev/md6 on /share/1000 type ext3 (data=writeback)
/share/1000/.optware on /opt type ext3 (rw)

Before you move on, update its cache of available software:

# ipkg update

I got this installation procedure from the Yahoo group on hacking the Media Vault.

Step 2: Install Firefly Nightly

I wasn’t able to determine whether the latest stable Firefly release has OGG streaming enabled, so I installed the latest nightly release. At some point in the future I’m sure a stable release will have it, but I breathed a prayer to Saint Hewlett and installed the nightly, following instructions I also found on Hacking the Media Vault. Fortunately it seems to work fine for me. Here’s how I did it:

# ipkg install mt-daapd-svn

Pretty easy. After you do this, it will download a bunch of things and install them until it says “Successfully terminated.” Now you need to configure it.

You probably noticed that the installation said “To complete this installation, make any necessary changes to the config file in /opt/etc/mt-daapd/mt-daapd.conf, and start the daemon by running /opt/etc/init.d/S60mt-daapd”. Here’s how to do that.

# vi /opt/etc/mt-daapd/mt-daapd.conf

If you like a different editor, feel free to use it. I like vi. Here are the lines that you need to change:

mp3_dir = /share/1000/Music                                            
servername = HPMediaVault                                              
extensions = .mp3,.m4a,.m4p,.ogg,.flac                                 

I’m assuming you are keeping the defaults, as I did on mine. All my music is in the Music share, I want to keep the same server name (what shows up in iTunes/Rhythmbox), and I want to add .ogg and .flac to the extensions Firefly will index and stream.

Step 3: Stop the built-in server, start the new one

Next you need to stop the built-in media server and start the one you just installed. Here’s how to see what’s running:

# ps -eaf | grep daap
32530 nobody     1096 S < /usr/sbin/mt-daapd 
32531 nobody     1984 S < /usr/sbin/mt-daapd 
32160 root        488 S   grep daap 

There are two processes running. This is normal. Let’s stop them:

# killall mt-daapd

If you now run the ps command above, you shouldn’t see anything running. You can start the new server:

# /opt/etc/init.d/S60mt-daapd

Now you should be able to see the daemon running:

# ps -eaf | grep daap
32681 nobody     3796 S   /opt/sbin/mt-daapd -c /opt/etc/mt-daapd/mt-daapd.conf
32682 nobody     4512 D   /opt/sbin/mt-daapd -c /opt/etc/mt-daapd/mt-daapd.conf
32703 root        488 S   grep daap 

Notice that it’s a different binary running — not the one in /usr/sbin.

At this point you ought to be able to start up your favorite music player (iTunes, Rhythmbox) and stream OGG and FLAC files from the media server. Test that before you go on to the next little bit.

Step 4: Change which media server starts on boot

There’s one last little detail. If you shut down your Media Vault and restart it, the old media server will start instead of the new one. The GNU/Linux variant on the Media Vault doesn’t have any nice init scripts, so I had to hunt around to find out how to do this.

After a bit of poking, I found that the /etc/inc/func_daapd.inc script has the start and stop commands. The startup process for the Media Vault is written in PHP, oddly enough. Here are the relevant lines:

   144         $ret=mwexec("/usr/sbin/mt-daapd -k");
   147         killbyname("mt-daapd","");
   162         $ret=mwexec("/usr/sbin/mt-daapd");

I commented them out and changed them to

   143          $ret=mwexec("/opt/etc/init.d/S60mt-daapd -k");
   144  #       $ret=mwexec("/usr/sbin/mt-daapd -k");
   161          $ret=mwexec("/opt/etc/init.d/S60mt-daapd");
   162  #        $ret=mwexec("/usr/sbin/mt-daapd");

Notice I didn’t change the killbyname command, since once it is started the binary has the same command name as the old one did. I tested restarting the Media Vault and after restart, it was working OK again. I do not know whether the built-in command to reset the media server will work with these changes; I suspect not. But if you want to do that, you can log in and do it from the command line.

Conclusion

If you followed the steps I listed above, your Media Vault ought to be serving FLAC and OGG files in WAV format to your music player (audiophiles, rejoice: your FLAC is not downconverted to MP3!).

After doing this, I have to say I think this piece of equipment is pretty darned awesome, and I’m really happy I bought a low-power, quiet, small, fun gadget that I have full control over. And I haven’t even talked about sharing files yet! That’ll be another post.

Postscript

A few miscellaneous things I’ve learned:

The default mt-daapd configuration file doesn’t have a defined rescan_interval. This means it’ll never notice when you add music to your filesystem. But you can poke it via the web interface (http://hpmediavault:3689/index.html; the username is empty, the password is defined in your config file) to make it update. Also, and I’m not sure how well this works, there’s an option to gzip the list of songs, which might make startup quite a bit faster when your iTunes/Rhythmbox connects and gets the song list. This is documented in the config file too.

Technorati Tags:, , , , , , , , , , ,

You might also like:

  1. How to auto-mount removable devices in GNU/Linux