Fluo 1.0.0-beta-2 is the third release of Fluo and likely the final release before 1.0.0. Fluo is now at a point where its two cluster test suites, Webindex and Stress, are running well for long periods on EC2.

Below are resources for this release:

This release closed 48 tickets. There is no upgrade path from 1.0.0-beta-1 to 1.0.0-beta-2. Many improvements in this release were driven by the creation of two new Fluo related projects:

  • Fluo recipes is a collection of common development patterns designed to make Fluo application development easier. Creating Fluo recipes required new Fluo functionality and updates to the Fluo API. The first release of Fluo recipes has been made and is available in Maven Central.

  • WebIndex is an example Fluo application that indexes links to web pages in multiple ways. Webindex enabled the testing of Fluo on real data at scale. It also inspired improvements to Fluo to allow it to work better with Apache Spark.

Significant features

This release contains many new features that makes it easier to run, develop, and monitor Fluo applications.

Improved Fluo metrics that can be sent to InfluxDB and viewed in Grafana

In #569, #570, & #580, Fluo metrics and monitoring were refactored to fix several bugs and allow metrics to be sent to InfluxDB and viewed in Grafana. Fluo metrics are still instrumented using Dropwizard metrics but in #574 metrics configuration was moved from its own file and to now reside in fluo.properties. While Fluo metrics can still be sent to many different tools (i.e Graphite, Ganglia), Fluo now ships with configuration that can be used to configure a Fluo dashboard in Grafana that queries InfluxDB. To set up the sending of Fluo metrics to InfluxDB/Grafana, view this documentation or consider using Fluo-dev or Zetten to run Fluo as they can install InfluxDB+Grafana and setup metrics for you.

Improved Fluo administration

Several commands were added to the fluo script which is used to administer Fluo. A fluo exec command (#581) was created to provide an easy way to execute application code using Fluo classes and dependencies. A fluo list command (#523) was created to let users list all Fluo applications within a Fluo instance. The fluo scan command now has a --raw option (#597) that prints Fluo data as stored in Accumulo. This was accomplished by moving the Fluo formatter from Accumulo shell to the scan command. The scan command can now format non-ascii characters as hex (#568). The fluo new command was improved to prevent users from setting invalid Fluo application names (#510). A bug was fixed in the fluo start command that was causing time outs when starting applications (#562). Finally, the fluo command can now be run without the apps/ directory configured for an application for most commands (#524). Only the init and start commands need this directory configured. All other commands only require the default properties file to be configured at conf/fluo.properties.

Made Fluo work better with Spark

Several changes were made to Fluo to allow it work better with Apache Spark. All Fluo data types now implement Serializable and can be use in Spark RDDs (#539). Fluo data types also now implement Comparable which allows RDDs of Fluo data types to be sorted (#544). Also, a no args constructor was created for the MutableBytes data type to enable Kryo serialization in Spark (#549). Finally, a new InputFormat called FluoEntryInputFormat was created that returns key/value entries and the existing FluoInputFormat that returns rows was renamed FluoRowInputFormat (#538,#540).

Performance improvements

A good bit of time was spent analyzing Fluo while it was running to determine where time is spent when executing transactions. Based on this analysis, it was found that a good bit of time was spent committing transactions. Changes were made in Fluo and Accumulo in order to decrease commit time. For Fluo, the following changes were made :

  • #591 - Shared batch writer increases transaction history
  • #590 - Increased batch writer threads and made configurable
  • #589 - Added 2nd conditional writer and logging of commit times
  • #584 - Adjust number of conditional writer threads based on cluster size

For Accumulo, changes are being made in ACCUMULO-4066 to decrease the time it takes to process conditional mutations. Conditional mutations are used when Fluo commits a transaction.

These changes resulted in nice improvements over beta-1 in testing. However there is probably still room for improvement. More analysis is needed.

API Changes

Once Fluo 1.0.0 is released, all releases after that will follow semver. For now some small API changes are still being made. The following API changes happened between beta-1 and beta-2.

  • #566 - Added RowColumnValue and made Accumulo init code use it
  • #551 - Added method to get start timestamp of transaction
  • #550 - Changed setObservers() to addObservers()

Other important improvements and bug fixes

  • #598 - Upgraded Hadoop to 2.6.3 and Accumulo to 1.6.4
  • #587 - Specified datasource for all graphs in fluo’s Grafana dashboard
  • #586 - Added efficient and easy way to build Bytes objects
  • #578 - Plot nothing in Grafana when no data exists
  • #573 - Fixed issues building against Accumulo 1.8.0-SNAPSHOT
  • #561 - Stopped checkstyle mvn plugin from running at validate
  • #559 - Eventually drop deleted data
  • #558 - Added arguments to deploy command to skip findbugs, checkstyle, and auto-formatting
  • #556 - Make TravisCI deploy snapshot jars after successful builds
  • #552 - Made eclipse stop complaining about unknown plugins
  • #547 - Provide better documentation for LoaderExecutor
  • #535 - Upgraded Twill to 0.6.0-incubating
  • #520 - Consolidate all implementation properties into FluoConfigurationImpl
  • #518 - Make Oracle run on a random port
  • #513 - Unable to pass spaces to scan command
  • #495 - Add support for notifications to Fluo formatter

Testing

For this release, a long run of the Webindex application was performed and is documented in a blog post. A long run of Fluo stress was run and documented in another blog post.