Archive for category Tips and Tricks

HOWTO: Determine the size of a Java Object or Class

Finding out the size (memory/heap usage) of Java Objects and Classes can be somewhat tricky as there is no built-in sizeof()-like functionality.  Luckily most people don’t have to face this problem.  But when you do, it can be difficult and in the end is almost always a best-guess approximation.

The reason that I started to look in to this subject was to build a memory-aware, LRU eviction cache for the open source database project HBase.  You can see the final result of that effort here.

When sizing a Class you can take two approaches.  One is to use the methods totalMemory and freeMemory from the Java Runtime class. The other strategy is to analyze the internal structure of your Class.  The first approach has the benefit that it gives you the actual memory footprint of your object rather than the expected one, but has the downside of relying on the garbage collector to do it’s thing.  So for this to work you often have to make repeated calls to the GC and sleep in between to give it time to process and minimize heap usage.  As one can see, this is not a solution suitable to be used in real time system, but can be a solid solution that you go back to when everything else fails, or just to verify your findings.

The second way of sizing your Class is to dissect it into it’s fundamental building blocks.  Let’s start by introducing the notion of a reference, known in other languages as a pointer.  A reference, or REF for short, is 4 bytes on a 32 bit system and 8 bytes on a 64 bit system.  Also note, the total size of an Object will always be word-aligned (4 byte aligned on 32bit, 8 byte aligned on 64bit).

Every instantiated Object in Java has a fixed overhead of 2 REFs.  To that, you add the contributions from all the non-static variables/fields contained within the Object.  For the most basic primitives, these sizes are:

byte = 1 byte, int = 4 bytes, long = 8 bytes

For example, let’s define and size the Class PrimitiveContainer

public class PrimitiveContainer {
  public int x = 1;
  public long y = 2;
}

The calculation for this would be:

overhead + sizeof(int) + sizeof(long) = (2 * REF) + 4 + 8 = 28 bytes on 64-bit, after word-alignment we get 32 bytes

Another example that uses Objects instead of primitives:

public class ObjectContainer {
  public Integer x = 1;
  public Long y = 2;
}

Since this Object contains references to other Objects (not primitives), we will calculate and include both the size of the referenced Objects (Integer and Long) as well as the references to those Objects (in ObjectContainer).  Both Integer and Long will have the fixed Object overhead of 2 * REF in addition to either an int or long primitive in each.  So:

sizeof(Integer) = (2 * REF) + 4 = 20
sizeof(Long) = (2 * REF) + 8 = 24

The total calculation would be:

overhead + reference to Integer + sizeof(Integer) + reference to Long + sizeof(Long) = (2 * REF) + (REF + 20) + (REF + 24) = 80 bytes

As you can see that there is a significant increase in memory usage between using primitives when compared to their Object counterparts, in this case 2.5X larger.

Arrays in Java, though underneath are really Objects, are unlike other complex Classes (Lists, Maps, etc) because they can be used with primtives directly.  Through experimentation and instrumentation of the JVM, the fixed overhead of any array is 3 * REF.

So with this in mind you can treat it as a regular object that aligns with it’s daughter objects.

For example if you have:

byte [] bs = {1,2,3,4};

on a 32 bit system, you get a total size of : 3 * REF + 4 = 3 * 4 + 4 = 16 bytes
on a 64 bit system, you get a total size of:  3 * REF + 4 = 3 * 8 + 4 = 28 which aligns to 32 bytes (twice the memory usage in this case)

One more thing that is worth mentioning is the cost for static and final variables. Usually static variables are not taken into consideration when sizing you object, since this is only done once and not for each instantiated Object.  Final variables should almost always be included, but of course it depends on your use case and the reason that you are sizing your objects.

, , , , , , ,

9 Comments

Debugging FastCGI and Lighttpd with spawn-fcgi and fcgi-debug

The web server of choice here at Streamy is Lighttpd. Written in C, it’s designed for performance, efficiency, security and concurrency. In addition to being one of the best performing http daemons for serving static content, it’s integration with FastCGI makes it simple to turn applications in almost any language into a CGI script with little or no code modification.  The communication between your script’s process and lighttpd is done with unix sockets or via tcp, making it easy to load-balance and have scripts that run on remote nodes.

Lighttpd makes it very easy to launch your cgi processes automatically without having to worry much about the interface.  However, debugging them like this can be arduous, if even possible.  When you want to get serious about developing fastcgi apps, you’ll need to run them yourself using spawn-fcgi and can then debug them using fcgi-debug.  spawn-fcgi is a way to separate the launching of the web server from the launching of the cgi proccess(es).  fcgi-debug is a nifty program that sits in between the web server and your script that allows you to see everything that’s going on.

A HUGE thanks to stbuehler who not only develops and maintains both of these applications but also spent time helping me debug a nasty issue we were having at Streamy.  IRC (and for developers, Freenode) is still one of the most powerful social networks out there! :)

Let’s imagine you have a compiled binary FastCGI program called myscript and you want to debug it.  At first, a simple lighttpd configuration for it might look like this:

fastcgi.server ( "/myscript" => ( "myscript" => (
  "socket" => "/home/user/myscript.sock",
  "bin-path" => "/home/user/myscript",
  "min-procs" => 1,
  "max-procs" => 1
)))

Full configuration information for mod_fastcgi is available here

The first step towards debugging is to change from Lighttpd spawning the process to using spawn-fcgi.  There are more detailed instructions available here and here, but in short what you will do is modify your lighty config to just point to the socket.  Remove bin-path and all process/load settings, otherwise you will run into conflicts.

fastcgi.server ( "/myscript" => ( "myscript" => (
  "socket" => "/home/user/myscript.sock"
)))

If we were not interested in debugging the interface/communications between lighttpd and fastcgi, we would spawn the script like this:

spawn-fcgi -s /home/user/myscript.sock -C 0 -F 1 -n -- /home/user/myscript

Adding fcgi-debug in the middle is as easy as calling it first rather than the script:

spawn-fcgi -s /home/user/myscript.sock -C 0 -F 1 -n -- fcgi-debug /home/user/myscript

You should now see lots of debug information directly to stdout.

You should also ensure that you have fastcgi debug turned on in your lighttpd logs. This is done with the setting:

server.errorlog = "/home/user/lighttpd_error.log"
fastcgi.debug = 1
If using fcgi-debug, you are required to use spawn-fcgi 1.6.2 rather than the one packaged with your lighttpd installation. spawn-fcgi 1.4.22 will run but will not show any debug output.

, , , , , , , , , , ,

No Comments

HBase 101: Row key design for paging (LIMIT, OFFSET) queries

Paging is a very common use-case for web sites and many other applications.  In relational databases, this is easily implemented with LIMIT and OFFSET, or by selecting the row number in the query and adding conditionals based on it’s value.  HBase 0.19.x, on the other hand, does not provide any queries or filters that support paging directly. After a quick example using SQL, I will show how to implement the same functionality in HBase.

Let’s assume that we have a large number of users.  Each user has performed a number of actions.  Each action has a unique identifier, a timestamp, and a name.

This is how you might get the third page of an individual users’ actions using SQL:

SELECT id, name, stamp FROM actions WHERE userid = 1
ORDER BY stamp DESC LIMIT 10 OFFSET 20;

This utilizes secondary indexes on both userid and stamp, meaning to accomplish this query you need at least three indexes on this table as id is the primary key.  Though a simple query to write, you will run into problems as the actions table grows to millions of rows and beyond.  Insertions would look like:

INSERT INTO actions (id, userid, name, stamp) VALUES (newid(), 1, 'Joe User', epoch());

HBase has no real indexes.  Rows are stored in sorted order, and columns in a family are sorted.  For more information, read the HBase Architecture page on the HBase Wiki.

Very conscious of the primary queries we will run on user-actions, we will design an HBase table to support paging queries on per-user, time-ordered lists of actions.

We will use the Java Client API for HBase, specifically the HTable class.  What we are looking for are two methods:

public static List<Action> getUserActions(int userid, int offset, int limit)
public static void putUserAction(Action action)

Please note, I am using a custom object, Action, for simplicity.  It is a client-side holder for the four action fields (id, userid, name, stamp).

There are a number of ways to store your data in HBase that will allow the getUserActions query, but in this case we will go with a very tall table design (lots of rows with few columns in them) rather than wide (lots of columns in each row).  Specifically, the difference here would be whether you have a row-per-action or a row-per-user.  We will do a row-per-action, but will be designing our row key (the primary key) to be a composite key to allow for grouping and sorting of actions, rather than just the action id.  This means we will not have random-access to an action by it’s id, so rather than defining this as the actions table (which might also exist if you needed actionid random access) we will define it as the useractions table, and we will only store a single column in a single family, content:name.

The row key that we will use in our HBase useractions table is:

<userid><reverse_order_stamp><actionid>

It’s very important that each of these fields is fixed-length and binary so that the lexicographical/ascending byte-ordering of HBase will properly sort our rows.

The userid field will be a 4 byte, big endian integer.  reverse_order_stamp is an 8 byte, big endian long with a value of (Long.MAX_VALUE - epoch).  This is so the most recent stamp is at the top rather than the bottom.  actionid is another 4 byte, big endian integer.  Thankfully, HBase provides helpful utilties in the org.apache.hadoop.hbase.util.Bytes class to deal with this (unfortunately it lacked some key features in 0.19, so the code below makes use of the Bytes class available in 0.20/TRUNK).  Before we get into HBase code, let’s define the helper methods makeActionRow and readActionRow to deal with the composite key:

public static byte [] makeActionRow(int userid, long stamp, int actionid)
throws Exception {
  byte [] useridBytes = Bytes.toBytes(userid);
  byte [] stampBytes = Bytes.toBytes(stamp);
  byte [] actionidBytes = Bytes.toBytes(actionid);
  return Bytes.add(useridBytes, stampBytes, actionidBytes);
}
 
public static Action readActionRow(byte [] row)
throws Exception {
  // Bytes.toInt(byte [] buf, int offset, int length)
  int userid = Bytes.toInt(row,0,4);
  long stamp = Long.MAX_VALUE - Bytes.toLong(row,4,8);
  int actionid = Bytes.toInt(row,12,4);
  return new Action(userid,stamp,actionid);
}

Now that we can deal with the composite keys, insertion is very straightforward:

public static void putUserAction(Action action) throws Exception {
  // Get the fields from the Action object
  int userid = action.getUserID();
  long stamp = Long.MAX_VALUE - action.getStamp();
  int actionid = action.getID();
  String name = action.getName();
 
  // Build the composite row, column, and value
  byte [] row = makeActionRow(userid,stamp,actionid);
  byte [] column = Bytes.toBytes("content:name");
  byte [] value = Bytes.toBytes(name);
 
  // Insert to HBase
  HTable ht = new HTable("useractions");
  BatchUpdate bu = new BatchUpdate(row);
  bu.put(column,value)
  ht.commit(bu);
}

We just serialize the fields into the composite row, and write the single column to HBase in a BatchUpdate.  Reading will deal with unserializing the fields and Scanners.  In addition to matching for the content:name column, we will also specify a startRow and stopRow so that the Scanner only returns results from the user we are looking at.  This way we do not have to worry about jumping to the next user in our code, the Scanner will just stop.

public static List<Action> getUserActions(int userid, int offset, int limit)
throws Exception {
  // Initialize counter and List to return
  int count = 0;
  List<Action> actions = new ArrayList<Action>(limit);
 
  // Initialize startRow, stopRow, and columns to match
  byte [] startRow = makeActionRow(userid,0,0);
  byte [] stopRow = makeActionRow(userid,Long.MAX_VALUE,Integer.MAX_VALUE);
  byte [][] columns = {Bytes.toBytes("content:name")};
 
  // Open Scanner
  HTable ht = new HTable("useractions");
  Scanner s = ht.getScanner(columns,startRow,stopRow);
  RowResult res = null;
 
  // Iterate over Scanner
  while((res = s.next()) != null) {
    // Check if past offset
    if(++count <= offset) continue;
 
    // Get data from RowResult
    byte [] row = res.getRow();
    byte [] value = res.get(columns[0]).getValue();
 
    // Build Action
    Action action = readActionRow(row);
    String name = Bytes.toString(value);
    action.setName(name);
    actions.add(action);
 
    // Check limit
    if(count == offset + limit) break;
  }
  // Cleanup and return
  s.close();
  return actions;
}

The storage of your data must be tied to how you need to query it.  Without a sophisticated query engine or indexing capabilities, you must design to take advantage of sorted rows and columns, potentially designing a table per query type.  Denormalization is okay!

In my next posts, I will show more interesting ways to use HBase for persisted dictionary/keyval/Object storage and directly address secondary indexing with HBase.

Disclaimer: The code in these examples is designed to illustrate the practical use of HBase.  While the design is sound, the code itself may not optimized for performance.

, , , , , , , , , , , , , , , ,

No Comments