Apache Fluo™ lets users make incremental updates to large data sets stored in Apache Accumulo

Download GitHub Follow

Overview

Apache Fluo is an open source implementation of Percolator (which populates Google's search index) for Apache Accumulo. Fluo makes it possible to update the results of a large-scale computation, index, or analytic as new data is discovered. For more information, take the Fluo tour.

Major Features

Reduced Latency

When combining new data with existing data, Fluo offers reduced latency when compared to batch processing frameworks (e.g Spark, MapReduce).

Reliable

Incremental updates are implemented using transactions which allow thousands of updates to happen concurrently without corrupting data.

Core API

The core Fluo API supports simple, cross-node transactional updates using get/set methods.

Avoid Reprocessing Data

Combine new data with existing data without having to reprocess the entire dataset.

General Purpose

Fluo applications consist of a series of observers that execute user code when observed data is updated.

Recipes API

The Fluo Recipes API builds on the core API to offer complex transactional updates.