Fluo 1.0.0-beta-1 released

Fluo 1.0.0-beta-1 is the second official release of Fluo. Fluo is an implementation of Google’s percolator paper, which adds large-scale incremental processing of data using distributed transactions and notifications. It runs on YARN and stores its data in Accumulo.

Below are resources for this release:

Download the Fluo binary tarball for 1.0.0-beta-1 from GitHub.
View the documentation for help getting started with Fluo.
Javadocs are available for this release.
A tag of Fluo codebase for 1.0.0-beta-1 is available.
Fluo jars have been deployed to Maven Central.
The Quickstart and Phrasecount applications were updated to work with this release.

This release closed 133 tickets. This release is not recommended for production use. There is no upgrade path from 1.0.0-alpha-1 to 1.0.0-beta-1.

Significant features

This release contains many new features that makes it easier to run, develop, and monitor Fluo applications.

Simplified Fluo administration on a local machine or EC2 cluster

Developers can now run Fluo and its dependencies on their local machine (#92) or an AWS EC2 cluster (#356) using a few simple commands. This was done by creating two administration tools called Fluo-dev and Fluo-deploy whose scripts and configuration reside in repos separate from the Fluo code base. These tools allow developers to collaborate and share configuration for running Fluo.

Transaction metrics are viewable using common monitoring tools

Fluo now publishes metrics (#20) about transactions, collisions, and timestamps using Dropwizard metrics. These metrics are by default published using JMX and are viewable using JConsole or JVisualVM. Fluo can also be configured to publish metrics to Graphite or Ganglia. View the metrics documentation for more information.

Improved processing of notifications in Fluo workers

Fluo workers were refactored to separate the code used for finding and executing work. Each worker uses a single thread for finding work (#19). A new method was introduced to partition work among workers using a hash+mod of notifications (#282). The commit message for 4100e23 contains a good description of some of the benefits and drawback of the current hashing approach.

Improved the deletion of observer notifications

When a cell is deleted in Accumulo, a delete marker is inserted. Delete markers stay around until all files in a tablet are compacted. For Fluo this could cause a lot of notification delete markers to build up over time. To avoid this buildup, the way Fluo deletes notifications was changed in (#457). Accumulo delete markers are no longer used, Fluo now uses a custom delete marker for notifications. The custom deleter marker allows Fluo to do analysis when Accumulo flushes memory to disk and avoid writing many delete markers to persistent storage.

Easier management of Fluo from the command line

Fluo now provides different scripts (fluo, mini-fluo, & local-fluo) for managing Fluo using a YARN cluster, MiniFluo, or local processes. Several commands were created for these scripts. A scan command allows users to print a snapshot of a Fluo table (#319). A info command shows locations of containers when running Fluo in YARN (#297). A classpath command gives users a list of jars needed to execute Fluo client code (#436). A wait command will sleep until all notifications are processed (#434).

Support for running multiple Fluo applications on a single cluster

Users can now run multiple Fluo applications using a single cluster (#454). This enables different Fluo users to share the same cluster. Fluo applications can be started and stopped independently. Each application has its own configuration.

Fluo build improvements

On each build, all Java code is automatically formatted based on Google Java Style (#479). Also, checkstyle and findbugs will fail the build if certain standards are not reached (#185). The POM is also sorted (#493).

Organized Fluo code base

The Fluo stress test was moved to its own repo and is no longer a sub-module (#385). MiniFluo was moved from fluo-core to the fluo-mini module/jar (#439). This reduced the number of dependencies in fluo-core. However, developers will now need to include the fluo-mini jar in their Maven POM if they start MiniFluo.

Fluo testing improvements

Integration tests can now be run from Eclipse (#322). Several new unit tests were created.

Other important improvements and bug fixes

#470 - Replaced FluoFileOutputFormat with an Accumulo Key/Value generator
#460 - Reduced Fluo API module dependencies
#456 - Fixed bug with notifications being lost when processes died
#446 - Simplified log configuration and configure rolling log files in YARN
#442 - Reduced the number of curator clients in FluoAdmin
#383 - Improved transaction logging to help users debug collisions. See debugging documentation.
#365 - Analyze Fluo code to see what non-public Accumulo APIs are used
#362 - Made API data objects immutable
#349 - Support application level configuration in fluo.properties
#342 - Add a configurable retry timeout to Fluo clients
#294 - Fluo now uses chroot suffix in its Zookeeper connection.
#293 - Add argument checking to FluoConfiguration
#244 - Make re-initialization easier for user

Testing

A successful long stress test run was conducted using Fluo built from commit fb647dd. The test ran very well and never fell behind like a previous long run of stress did. The test had the following properties.

Initialized stress test using 1 billion random integers.
Ran 150 incremental loads of 100 thousand integers. Slept 3 minutes between loads.
Used 19 m3.xlarge nodes on EC2. 16 workers and 3 masters
Configuration for the test committed and tagged in git : fluo-deploy tag and fluo-stress tag
Opened two issues as a result of test #499 and fluo-stress#30

Below is the trailing output from running the test.

*****Generating and loading incremental data set 148*****
*****Generating and loading incremental data set 149*****
*****Generating and loading incremental data set 150*****
*****Calculating # of unique integers using MapReduce*****
                UNIQUE=1014486419
*****Wait for Fluo to finish processing*****
05:33:40.158 [main] INFO  io.fluo.cluster.runner.AppRunner - The wait command will exit when all notifications are processed
05:33:40.417 [Thread-3] INFO  io.fluo.core.oracle.OracleClient - Connected to oracle at worker4:9913
05:33:41.308 [main] INFO  io.fluo.cluster.runner.AppRunner - All processing has finished!
*****Printing # of unique integers calculated by Fluo*****
Total at root : 1014486419
Nodes Scanned : 59605
*****Verifying Fluo & MapReduce results match*****
Success! Fluo & MapReduce both calculated 1014486419 unique integers

The test ran for a little over 12 hours. Below are two plots pulled from graphite showing the number of notifications queued and transaction rate over the entire test run. Load transactions are not included in the rate. The rate is transactions per second. The accuracy of these plots is uncertain because no graphite configuration changes were made. The plots do seem within the ballpark.

Notifications Queued Transaction rate