Fluo 1.0.0-beta-2 is the third release of Fluo and likely the final release before 1.0.0. Fluo is now at a point where its two cluster test suites, Webindex and Stress, are running well for long periods on EC2.
Below are resources for this release:
- Download the Fluo binary tarball for 1.0.0-beta-2 from GitHub.
- View the documentation for help getting started with Fluo.
- Javadocs are available for this release.
- A tag of Fluo codebase for 1.0.0-beta-2 is available.
- The Quickstart and Phrasecount applications were updated to work with this release.
This release closed 48 tickets. There is no upgrade path from 1.0.0-beta-1 to 1.0.0-beta-2. Many improvements in this release were driven by the creation of two new Fluo related projects:
-
Fluo recipes is a collection of common development patterns designed to make Fluo application development easier. Creating Fluo recipes required new Fluo functionality and updates to the Fluo API. The first release of Fluo recipes has been made and is available in Maven Central.
-
WebIndex is an example Fluo application that indexes links to web pages in multiple ways. Webindex enabled the testing of Fluo on real data at scale. It also inspired improvements to Fluo to allow it to work better with Apache Spark.
Significant features
This release contains many new features that makes it easier to run, develop, and monitor Fluo applications.
Improved Fluo metrics that can be sent to InfluxDB and viewed in Grafana
In #569, #570, & #580, Fluo metrics and monitoring were refactored to fix several
bugs and allow metrics to be sent to InfluxDB and viewed in Grafana. Fluo metrics
are still instrumented using Dropwizard metrics but in #574 metrics configuration was
moved from its own file and to now reside in fluo.properties
. While Fluo metrics can still be sent to
many different tools (i.e Graphite, Ganglia), Fluo now ships with configuration that can be used to configure
a Fluo dashboard in Grafana that queries InfluxDB. To set up the sending of Fluo metrics to InfluxDB/Grafana,
view this documentation or consider using Fluo-dev or Zetten
to run Fluo as they can install InfluxDB+Grafana and setup metrics for you.
Improved Fluo administration
Several commands were added to the fluo
script which is used to administer Fluo. A fluo exec
command
(#581) was created to provide an easy way to execute application code using Fluo classes and dependencies.
A fluo list
command (#523) was created to let users list all Fluo applications within a Fluo instance.
The fluo scan
command now has a --raw
option (#597) that prints Fluo data as stored in Accumulo. This
was accomplished by moving the Fluo formatter from Accumulo shell to the scan command. The scan command can now
format non-ascii characters as hex (#568). The fluo new
command was improved to prevent users from
setting invalid Fluo application names (#510). A bug was fixed in the fluo start
command that was causing
time outs when starting applications (#562). Finally, the fluo
command can now be run without the apps/
directory configured for an application for most commands (#524). Only the init
and start
commands need
this directory configured. All other commands only require the default properties file to be configured at
conf/fluo.properties
.
Made Fluo work better with Spark
Several changes were made to Fluo to allow it work better with Apache Spark. All Fluo data types now implement
Serializable
and can be use in Spark RDDs (#539). Fluo data types also now implement Comparable
which
allows RDDs of Fluo data types to be sorted (#544). Also, a no args constructor was created for the
MutableBytes data type to enable Kryo serialization in Spark (#549). Finally, a new InputFormat called
FluoEntryInputFormat
was created that returns key/value entries and the existing FluoInputFormat
that returns
rows was renamed FluoRowInputFormat
(#538,#540).
Performance improvements
A good bit of time was spent analyzing Fluo while it was running to determine where time is spent when executing transactions. Based on this analysis, it was found that a good bit of time was spent committing transactions. Changes were made in Fluo and Accumulo in order to decrease commit time. For Fluo, the following changes were made :
- #591 - Shared batch writer increases transaction history
- #590 - Increased batch writer threads and made configurable
- #589 - Added 2nd conditional writer and logging of commit times
- #584 - Adjust number of conditional writer threads based on cluster size
For Accumulo, changes are being made in ACCUMULO-4066 to decrease the time it takes to process conditional mutations. Conditional mutations are used when Fluo commits a transaction.
These changes resulted in nice improvements over beta-1 in testing. However there is probably still room for improvement. More analysis is needed.
API Changes
Once Fluo 1.0.0 is released, all releases after that will follow semver. For now some small API changes are still being made. The following API changes happened between beta-1 and beta-2.
- #566 - Added RowColumnValue and made Accumulo init code use it
- #551 - Added method to get start timestamp of transaction
- #550 - Changed setObservers() to addObservers()
Other important improvements and bug fixes
- #598 - Upgraded Hadoop to 2.6.3 and Accumulo to 1.6.4
- #587 - Specified datasource for all graphs in fluo’s Grafana dashboard
- #586 - Added efficient and easy way to build Bytes objects
- #578 - Plot nothing in Grafana when no data exists
- #573 - Fixed issues building against Accumulo 1.8.0-SNAPSHOT
- #561 - Stopped checkstyle mvn plugin from running at validate
- #559 - Eventually drop deleted data
- #558 - Added arguments to deploy command to skip findbugs, checkstyle, and auto-formatting
- #556 - Make TravisCI deploy snapshot jars after successful builds
- #552 - Made eclipse stop complaining about unknown plugins
- #547 - Provide better documentation for LoaderExecutor
- #535 - Upgraded Twill to 0.6.0-incubating
- #520 - Consolidate all implementation properties into FluoConfigurationImpl
- #518 - Make Oracle run on a random port
- #513 - Unable to pass spaces to scan command
- #495 - Add support for notifications to Fluo formatter
Testing
For this release, a long run of the Webindex application was performed and is documented in a blog post. A long run of Fluo stress was run and documented in another blog post.