Apache Fluo™ lets users make incremental updates to large data sets stored in Apache Accumulo

Download GitHub Follow

Overview

Apache Fluo is an open source implementation of Percolator (which populates Google's search index) for Apache Accumulo. With Fluo, users can continuously join new data into large existing data sets without reprocessing all data. Unlike batch and streaming frameworks, Fluo offers much lower latency and can operate on extremely large data sets. If interested in trying Fluo, take the Fluo tour. For any questions you may have, contact us.

Major Features

Reduced Latency

When combining new data with existing data, Fluo offers reduced latency when compared to batch processing frameworks (e.g Spark, MapReduce).

Reliable

Incremental updates are implemented using transactions which allow thousands of updates to happen concurrently without corrupting data.

Core API

The core Fluo API supports simple, cross-node transactional updates using get/set methods.

Avoid Reprocessing Data

Combine new data with existing data without having to reprocess the entire dataset.

General Purpose

Fluo applications consist of a series of observers that execute user code when observed data is updated.

Recipes API

The Fluo Recipes API builds on the core API to offer complex transactional updates.