Archive for category Hadoop/HBase
HUG7: HBase User Group Wrap-Up
Posted by Jonathan Gray in Hadoop/HBase, Meetup, Presentations on August 11th, 2009
The biggest ever HBase User Group went off this past Friday, August 7 at StumbleUpon HQ in San Francisco. We had over 30 attendees, ranging from new users to committers, and four presentations. Big thanks to everyone for coming out, and especially to StumbleUpon for hosting us.
Jonathan Gray (me) covered the new features, architecture, and API of the latest HBase 0.20 release, Michael Stack introduced the proposed features for 0.21, Ryan Rawson led a great practical discussion from his experiences using HBase at StumbleUpon, and finally Bradford Stephens gave an interesting talk on building an analytics framework atop Hadoop and HBase.
To read about what’s planned for the next release of HBase, 0.21, check out the HBase 0.21 Roadmap, notes from the HBase Hackathon Day One Wrap-Up, or issues in JIRA.
HBase 0.20 Introduction
-
Practical HBase at StumbleUpon
-
Hadoop Analytics - A DB for 80% of Big Data sites
JG
HBase Hackathon 2 Wrap-Up: Day One
Posted by Jonathan Gray in Hadoop/HBase, Meetup on August 9th, 2009
The second HBase Hackathon took place this weekend at the StumbleUpon headquarters in San Francisco. Big thanks to them for hosting us and feeding us.
The primary goal of this Hackathon was to roadmap, design, and hack on new features for the next release of HBase, 0.21. The major targets are a Master rewrite, expanded use of Zookeeper, and cross-cluster replication.
Here are notes from Day One:
HBase Hackathon Notes
Saturday, August 8, 2009
-
· Data Integrity
o WAL
§ HADOOP-4379/HDFS-200 needs testing on large clusters
o HBfsck (HBASE-7)
§ Sequence IDs
· Change to stamps?
· Must be unique
· Fix for merges
§ Start/end keys match
· Figures out / runs merges
§ Repair references
§ Two modes
· Quick mode scans META
· Full mode reads HFiles and verifies fully
-
· Master Rewrite (HBASE-1110)
o HCD + HTD
§ Schema -> JSON
§ Stored in ZK
o JNI ZK Client – Nitay
§ Fix ZK being a black box
§ Not kicked out from GC pauses
§ Only responsible for ephemeral, other stuff still Java API
§ Monitor existence of JVM RS process
o Master is “powerless”
§ Should be written as though its ok we are offline for minutes
o Region assignment to ZK
§ Consensus to definitely do this
§ Master verify assignment state periodically?
o META information in ZK? (HBASE-1755)
§ Replacement for getClosestRowBefore
§ /hbase/TABLE/region
§ Goal: 10k regions * 10k clients
o Load balancing
§ RS publish load statistics to ZK
§ Master makes all decisions
§ Zookeeper coordination
o Get rid of heartbeats (HBASE-1502)
§ Master uses RPC?
§ Zookeeper coordination / messaging instead?
-
· Gets -> Scan
o Timestamp collisions, known issues here (HBASE-1485)
o Bloom filters?
-
· Replication (HBASE-1295)
o Sharded counters possible?
o Log shipping initially
o Separate process alongside HRS
§ Who does he talk to on the other cluster?
§ Coordination with shared ZK?
o When network partition
§ Buffer in memory, then buffer on hdfs
· How big can it get? Hours, days, years?
§ Eventually must fall back on snapshot or full table replay?
-
· Snapshot Backup Mechanism
o Flush, stop compactions
o Local copy first
o Should be per table
o Utilize distcp
o Can be used to initialize a slave replicated cluster
-
· New RPC
o Punted to 0.22
-
· Unit test speedup (HBASE-1556)
o HTableInterface, HRSInterface, HRInterface, etc
o Assigned to: Stack
-
· New UI
o Punt with redirect
o JG building something that taps into ZK and can publish JSON/XML/etc
§ Used for nagios plugin also being built
-
· Behind API Upload Process
o Patch exists, need to test HBASE-48
o Does not currently support multi-family or importing into table with data
-
· Cascading
-
· Distributed log splitting (HBASE-1364)
o Stripped down version of MR?
o Use ZK to map each HLog to which regions it contains edits for?
o We want to open any regions during a recovery that do not have edits immediately
o JD leading on the design
-
· Intra-row Scanning (HBASE-1537)
o We need it, but it’s hard
-
Thanks again to everyone at StumbleUpon and to all the committers and contributors for the good ideas and the good times!
JG
Streamy @ Hadoop Summit: HBase Goes Realtime
Posted by Jonathan Gray in Hadoop/HBase, Meetup, Presentations, Video on July 24th, 2009
The Hadoop Summit was a great success this year and I had a ton of fun giving the presentation on HBase in front of a standing-room-only crowd. Videos from the conference are now available online from Yahoo here.
Jean-Daniel Cryans and I presented on the (any day now) HBase 0.20.0 release.
Check out the slides and video from HBase Goes Realtime below…
HBase 101: Row key design for paging (LIMIT, OFFSET) queries
Posted by Jonathan Gray in Hadoop/HBase, Tips and Tricks on April 23rd, 2009
Paging is a very common use-case for web sites and many other applications. In relational databases, this is easily implemented with LIMIT and OFFSET, or by selecting the row number in the query and adding conditionals based on it’s value. HBase 0.19.x, on the other hand, does not provide any queries or filters that support paging directly. After a quick example using SQL, I will show how to implement the same functionality in HBase.
Let’s assume that we have a large number of users. Each user has performed a number of actions. Each action has a unique identifier, a timestamp, and a name.
This is how you might get the third page of an individual users’ actions using SQL:
SELECT id, name, stamp FROM actions WHERE userid = 1 ORDER BY stamp DESC LIMIT 10 OFFSET 20;
This utilizes secondary indexes on both userid and stamp, meaning to accomplish this query you need at least three indexes on this table as id is the primary key. Though a simple query to write, you will run into problems as the actions table grows to millions of rows and beyond. Insertions would look like:
INSERT INTO actions (id, userid, name, stamp) VALUES (newid(), 1, 'Joe User', epoch());
HBase has no real indexes. Rows are stored in sorted order, and columns in a family are sorted. For more information, read the HBase Architecture page on the HBase Wiki.
Very conscious of the primary queries we will run on user-actions, we will design an HBase table to support paging queries on per-user, time-ordered lists of actions.
We will use the Java Client API for HBase, specifically the HTable class. What we are looking for are two methods:
public static List<Action> getUserActions(int userid, int offset, int limit) public static void putUserAction(Action action)
Please note, I am using a custom object, Action, for simplicity. It is a client-side holder for the four action fields (id, userid, name, stamp).
There are a number of ways to store your data in HBase that will allow the getUserActions query, but in this case we will go with a very tall table design (lots of rows with few columns in them) rather than wide (lots of columns in each row). Specifically, the difference here would be whether you have a row-per-action or a row-per-user. We will do a row-per-action, but will be designing our row key (the primary key) to be a composite key to allow for grouping and sorting of actions, rather than just the action id. This means we will not have random-access to an action by it’s id, so rather than defining this as the actions table (which might also exist if you needed actionid random access) we will define it as the useractions table, and we will only store a single column in a single family, content:name.
The row key that we will use in our HBase useractions table is:
<userid><reverse_order_stamp><actionid>
It’s very important that each of these fields is fixed-length and binary so that the lexicographical/ascending byte-ordering of HBase will properly sort our rows.
The userid field will be a 4 byte, big endian integer. reverse_order_stamp is an 8 byte, big endian long with a value of (Long.MAX_VALUE - epoch). This is so the most recent stamp is at the top rather than the bottom. actionid is another 4 byte, big endian integer. Thankfully, HBase provides helpful utilties in the org.apache.hadoop.hbase.util.Bytes class to deal with this (unfortunately it lacked some key features in 0.19, so the code below makes use of the Bytes class available in 0.20/TRUNK). Before we get into HBase code, let’s define the helper methods makeActionRow and readActionRow to deal with the composite key:
public static byte [] makeActionRow(int userid, long stamp, int actionid) throws Exception { byte [] useridBytes = Bytes.toBytes(userid); byte [] stampBytes = Bytes.toBytes(stamp); byte [] actionidBytes = Bytes.toBytes(actionid); return Bytes.add(useridBytes, stampBytes, actionidBytes); } public static Action readActionRow(byte [] row) throws Exception { // Bytes.toInt(byte [] buf, int offset, int length) int userid = Bytes.toInt(row,0,4); long stamp = Long.MAX_VALUE - Bytes.toLong(row,4,8); int actionid = Bytes.toInt(row,12,4); return new Action(userid,stamp,actionid); }
Now that we can deal with the composite keys, insertion is very straightforward:
public static void putUserAction(Action action) throws Exception { // Get the fields from the Action object int userid = action.getUserID(); long stamp = Long.MAX_VALUE - action.getStamp(); int actionid = action.getID(); String name = action.getName(); // Build the composite row, column, and value byte [] row = makeActionRow(userid,stamp,actionid); byte [] column = Bytes.toBytes("content:name"); byte [] value = Bytes.toBytes(name); // Insert to HBase HTable ht = new HTable("useractions"); BatchUpdate bu = new BatchUpdate(row); bu.put(column,value) ht.commit(bu); }
We just serialize the fields into the composite row, and write the single column to HBase in a BatchUpdate. Reading will deal with unserializing the fields and Scanners. In addition to matching for the content:name column, we will also specify a startRow and stopRow so that the Scanner only returns results from the user we are looking at. This way we do not have to worry about jumping to the next user in our code, the Scanner will just stop.
public static List<Action> getUserActions(int userid, int offset, int limit) throws Exception { // Initialize counter and List to return int count = 0; List<Action> actions = new ArrayList<Action>(limit); // Initialize startRow, stopRow, and columns to match byte [] startRow = makeActionRow(userid,0,0); byte [] stopRow = makeActionRow(userid,Long.MAX_VALUE,Integer.MAX_VALUE); byte [][] columns = {Bytes.toBytes("content:name")}; // Open Scanner HTable ht = new HTable("useractions"); Scanner s = ht.getScanner(columns,startRow,stopRow); RowResult res = null; // Iterate over Scanner while((res = s.next()) != null) { // Check if past offset if(++count <= offset) continue; // Get data from RowResult byte [] row = res.getRow(); byte [] value = res.get(columns[0]).getValue(); // Build Action Action action = readActionRow(row); String name = Bytes.toString(value); action.setName(name); actions.add(action); // Check limit if(count == offset + limit) break; } // Cleanup and return s.close(); return actions; }
The storage of your data must be tied to how you need to query it. Without a sophisticated query engine or indexing capabilities, you must design to take advantage of sorted rows and columns, potentially designing a table per query type. Denormalization is okay!
In my next posts, I will show more interesting ways to use HBase for persisted dictionary/keyval/Object storage and directly address secondary indexing with HBase.
Disclaimer: The code in these examples is designed to illustrate the practical use of HBase. While the design is sound, the code itself may not optimized for performance.
HBase Hackathon Wrap-up
Posted by Jonathan Gray in Hadoop/HBase, Meetup on February 4th, 2009
HBase contributors came together last weekend for the first ever HBase Hackathon here at Streamy HQ in Manhattan Beach, California. In attendance were most of the HBase committers, guys from Sun and StumbleUpon… nearly 20 developers in total. There are photos posted on the Meetup Page (for members only). If anyone else who attended has pictures please post links in the comments!
We spent a great deal of time discussing the new features set for the 0.20 release of HBase. You can follow all the issues slated for 0.20 here. It’s been a few days and so much has already come out of the weekend so I thought I’d post a quick follow-up to share some of the cool stuff being worked on now.
Cascading Support for HBase
Chris Wensel, of Concurrent Inc., has successfully implemented the first version of HBase adapters for Cascading. Streamy devs are really looking forward to refactoring some of their MapReduce jobs for Cascading! We will report back soon.
HBase New File Format
We have hit a performance wall in HBase with the Hadoop MapFile format. It was never intended for a random-access read pattern which is really the primary purpose of HBase. Based on the hard work by guys over at Yahoo on the TFile binary file format, work is well under way on a new HBase-specific file format, currently being called HFile. Michael Stack of Powerset and Ryan Rawson of StumbleUpon are leading the effort.
The emphasis is on speed and efficiency. By switching to a block-based segmenting/indexing of the file, we can have predictable memory usage and an ideal abstraction for caching. Once in memory, we can use something like Java’s NIO ByteBuffers to allow high numbers of concurrent scanners with minimal memory copying. Remember, even random-access reads require scanning. The new format also supports meta blocks for additional indexes, bloom filters, meta data and anything else we want to add in the future.
Cell Caching
Streamy’s own Erik Holstad is wrapping up testing, benchmarking and optimizing the new Cell Cache. We’re seeing a 5-10X improvement in random-access speed when serving out of the cache. A big part of this feature is implementing a memory-aware LRU in Java. Since Java will not tell you the size of an Object in memory, we have had to hack our way around through profiling and some tools we’ve built to determine sizes. More on this in a later post.
Zookeeper Integration
Jean-Daniel Cryans and Nitay Joffe have made leaps and bounds with Zookeper integration into HBase. Initially designed to remove the single point of failure, discussions at the Hackathon opened the door to future improvements such as configuration management and even to eventually distribute master functionality and eliminate the HMaster all together. They have already committed 5 issues to 0.20 trunk. You can follow their progress here.
Datanode Network I/O Improvements
Andrew Purtell at Trend Micro is working on a Hadoop issue that creates a big headache for the users of HBase. We keep a large numbers of files open at a time and since it is currently implemented using a thread-per-connection model, we end up with thousands of idle threads and having to keep increasing the total number of receivers. I’m not sure where this currently stands.
My Random Contributions
In addition to voicing my opinion and contributing to design and decision making, I’m currently working on a number of small issues: a binary key range splitting algorithm, the ability to run more than one mapper per region or to specify start and stop rows for MR jobs sourcing from HBase, and a number of benchmarking tools to evaluate all the new stuff.
All in all, it was a terrific weekend. The weather was absolutely perfect and it was as friendly and smart a group as I could have imagined. Thanks to everyone who came, especially those who made the trip down from Norcal and JD who came from Canada to defrost for a bit in the Cali sun. Summer is coming soon, stay tuned for the next beach city hackathon
Streamy hosting the LA HBase Hackathon
Posted by Jonathan Gray in Hadoop/HBase, Meetup on January 6th, 2009
To help kick off development on some of the major improvements coming in the 0.20 release of HBase, Streamy will be hosting a hackathon at our office in Manhattan Beach on January 30th.
Also of interest this month is the San Francisco based HBase User Group meetup on January 14th and the Los Angeles Hadoop Meetup on January 13th.
HOWTO: Change replication factor of existing files in HDFS
Posted by Jonathan Gray in Hadoop/HBase on December 23rd, 2008
There are references around the web regarding changing the replication factor on a running Hadoop system. For example, if you don’t have even distribution of blocks across your Datanodes, you can increase replication temporarily and then bring it back down.
To set replication of an individual file to 4:
./bin/hadoop dfs -setrep -w 4 /path/to/file
You can also do this recursively. To change replication of entire HDFS to 1:
./bin/hadoop dfs -setrep -R -w 1 /
Chris Wensel of Cascading talks Hadoop with Sohrab Modi of Sun
Posted by Jonathan Gray in Cascading, Hadoop/HBase, MapReduce, Video on December 16th, 2008
Interesting conversation between Chris Wensel, founder of Concurrent Inc and author of Cascading, and Sohrab Modi, VP Chief Technology Office at Sun Microsystems.
Sun has definitely been paying attention to Hadoop (and increasingly HBase), so it will be interesting to see if they can make a case for using (typically high-end) Sun hardware to run this new distributed, commodity hardware driven software model. Sohrab mentions increasing the Disk-to-Core ratio on Hadoop nodes above the 1-to-1 ratio typical in many clusters today.
This thinking seems at odds with most of the Hadoop community who are often CPU bound, or who feel more nodes with fewer disks is better than fewer nodes with more disks.
They don’t speak about HBase, but from that perspective it might make sense to squeeze more disks per node and fewer total nodes, especially with the new findings from George Porter about tapping the local FS when possible. However it still seems to me that on the surface Sun hardware does not necessarily fit the new distributed, commodity hardware model.
Part one of that conversation (originally posted here):
[youtube=http://www.youtube.com/watch?v=CMt-IqQlnQ8&hl=en&fs=1&rel=0&color1=0x234900&color2=0x4e9e00]
Part two:
[youtube=http://www.youtube.com/watch?v=YtkaDQOuJ4k&hl=en&fs=1&rel=0&color1=0x234900&color2=0x4e9e00]
Another Hadoop-related video of interest by Stefan Groschupf of Scale Unlimited visualizing the evolution of the Hadoop codebase:
[vimeo vimeo.com/2513321]
Hadoop and HBase Presentation
Posted by Jonathan Gray in Hadoop/HBase, Presentations on December 12th, 2008
Today I gave a presentation on Hadoop, MapReduce, and HBase to the Los Angeles CTO Forum. In addition to introducing the technologies and basic information about their implementations, there was a focus on how they compare to a traditional RDBMS.
You can find the presentation here