Apache software foundation has announced that CarbonData is being made a top level project.
Apache CarbonData is an indexed columnar data format for fast analytics on big data platform, e.g. Apache Hadoop, Apache Spark, etc.
- Unique data organization to allow faster filtering and better compression;
- Multi-level Indexing to enable faster search and speeding up query processing;
- Deep Apache Spark Integration for dataframe + SQL compliance;
- Advanced push down optimization to minimize the amount of data being read processed, converted, transmitted, and shuffled;
- Efficient compression and global encoding schemes to further improve aggregation query performance;
- Dictionary encoding for reduced storage space and faster processing; and
- Data update + delete support using standard SQL syntax.