Fluo 1.0.0-beta-2 is the third release of Fluo and likely the final release before 1.0.0. Fluo is now at a point where its two cluster test suites, Webindex and Stress, are running well for long periods on EC2.
Below are resources for this release:
- Download the Fluo binary tarball for 1.0.0-beta-2 from GitHub.
- View the documentation for help getting started with Fluo.
- Javadocs are available for this release.
- A tag of Fluo codebase for 1.0.0-beta-2 is available.
- The Quickstart and Phrasecount applications were updated to work with this release.
This release closed 48 tickets. There is no upgrade path from 1.0.0-beta-1 to 1.0.0-beta-2. Many improvements in this release were driven by the creation of two new Fluo related projects:
Fluo recipes is a collection of common development patterns designed to make Fluo application development easier. Creating Fluo recipes required new Fluo functionality and updates to the Fluo API. The first release of Fluo recipes has been made and is available in Maven Central.
WebIndex is an example Fluo application that indexes links to web pages in multiple ways. Webindex enabled the testing of Fluo on real data at scale. It also inspired improvements to Fluo to allow it to work better with Apache Spark.
This release contains many new features that makes it easier to run, develop, and monitor Fluo applications.
Improved Fluo metrics that can be sent to InfluxDB and viewed in Grafana
In #569, #570, & #580, Fluo metrics and monitoring were refactored to fix several
bugs and allow metrics to be sent to InfluxDB and viewed in Grafana. Fluo metrics
are still instrumented using Dropwizard metrics but in #574 metrics configuration was
moved from its own file and to now reside in
fluo.properties. While Fluo metrics can still be sent to
many different tools (i.e Graphite, Ganglia), Fluo now ships with configuration that can be used to configure
a Fluo dashboard in Grafana that queries InfluxDB. To set up the sending of Fluo metrics to InfluxDB/Grafana,
view this documentation or consider using Fluo-dev or Zetten
to run Fluo as they can install InfluxDB+Grafana and setup metrics for you.
Improved Fluo administration
Several commands were added to the
fluo script which is used to administer Fluo. A
fluo exec command
(#581) was created to provide an easy way to execute application code using Fluo classes and dependencies.
fluo list command (#523) was created to let users list all Fluo applications within a Fluo instance.
fluo scan command now has a
--raw option (#597) that prints Fluo data as stored in Accumulo. This
was accomplished by moving the Fluo formatter from Accumulo shell to the scan command. The scan command can now
format non-ascii characters as hex (#568). The
fluo new command was improved to prevent users from
setting invalid Fluo application names (#510). A bug was fixed in the
fluo start command that was causing
time outs when starting applications (#562). Finally, the
fluo command can now be run without the
directory configured for an application for most commands (#524). Only the
start commands need
this directory configured. All other commands only require the default properties file to be configured at
Made Fluo work better with Spark
Several changes were made to Fluo to allow it work better with Apache Spark. All Fluo data types now implement
Serializable and can be use in Spark RDDs (#539). Fluo data types also now implement
allows RDDs of Fluo data types to be sorted (#544). Also, a no args constructor was created for the
MutableBytes data type to enable Kryo serialization in Spark (#549). Finally, a new InputFormat called
FluoEntryInputFormat was created that returns key/value entries and the existing
FluoInputFormat that returns
rows was renamed
A good bit of time was spent analyzing Fluo while it was running to determine where time is spent when executing transactions. Based on this analysis, it was found that a good bit of time was spent committing transactions. Changes were made in Fluo and Accumulo in order to decrease commit time. For Fluo, the following changes were made :
- #591 - Shared batch writer increases transaction history
- #590 - Increased batch writer threads and made configurable
- #589 - Added 2nd conditional writer and logging of commit times
- #584 - Adjust number of conditional writer threads based on cluster size
For Accumulo, changes are being made in ACCUMULO-4066 to decrease the time it takes to process conditional mutations. Conditional mutations are used when Fluo commits a transaction.
These changes resulted in nice improvements over beta-1 in testing. However there is probably still room for improvement. More analysis is needed.
Once Fluo 1.0.0 is released, all releases after that will follow semver. For now some small API changes are still being made. The following API changes happened between beta-1 and beta-2.
- #566 - Added RowColumnValue and made Accumulo init code use it
- #551 - Added method to get start timestamp of transaction
- #550 - Changed setObservers() to addObservers()
Other important improvements and bug fixes
- #598 - Upgraded Hadoop to 2.6.3 and Accumulo to 1.6.4
- #587 - Specified datasource for all graphs in fluo’s Grafana dashboard
- #586 - Added efficient and easy way to build Bytes objects
- #578 - Plot nothing in Grafana when no data exists
- #573 - Fixed issues building against Accumulo 1.8.0-SNAPSHOT
- #561 - Stopped checkstyle mvn plugin from running at validate
- #559 - Eventually drop deleted data
- #558 - Added arguments to deploy command to skip findbugs, checkstyle, and auto-formatting
- #556 - Make TravisCI deploy snapshot jars after successful builds
- #552 - Made eclipse stop complaining about unknown plugins
- #547 - Provide better documentation for LoaderExecutor
- #535 - Upgraded Twill to 0.6.0-incubating
- #520 - Consolidate all implementation properties into FluoConfigurationImpl
- #518 - Make Oracle run on a random port
- #513 - Unable to pass spaces to scan command
- #495 - Add support for notifications to Fluo formatter