byte
, int
, etc.) from a single template
to conserve memory and boost performance.See: Description
Package | Description |
---|---|
com.carrotsearch.hppc | |
com.carrotsearch.hppc.cursors | |
com.carrotsearch.hppc.predicates | |
com.carrotsearch.hppc.procedures | |
com.carrotsearch.hppc.sorting |
byte
, int
, etc.) from a single template
to conserve memory and boost performance.
The Java Collections package is in many ways excellent, but it cannot be used for primitive types without autoboxing (which kills the runtime performance due to increased memory use and garbage collector overhead).
This library has slightly different design goals than Java Collections (and many other collection packages). For example, the internals of each class are open and subject to free hacking and tuning up to one's wishes.
There are a few projects implementing collections over primitive types: fastutil, Koloboke, PCJ, GNU Trove, Apache Primitive Collections.
Historically, some of these libraries were released on commercially forbidding licenses (fastutil) and others appeared to be abandoned (PCJ), so HPPC was created to fill this gap. Since then things have changed quite a lot (fastutil is now Apache-licensed, Koloboke has been released), but HPPC still remains a small memory footprint and does the job. If you require full Java collections compliance, try fastutil, or Koloboke, they are both very, very good.
We assume that:
From these assumptions stem the following design drivers:
-ea
switch). When the algorithm is
tested and verified, it can run with no additional overhead from
contract checks.Object
types), but at the same time we want
specialized classes to be automatically-generated for primitive-types
(to limit memory consumption and boost performance due to JIT optimisations).HPPC is not strictly modeled after Java Collections API, although we did
try to make the APIs look similar enough for comfortable use.
One particular thing largely missing in HPPC are "view" projections
(sublists or views over the collection of keys or values). Certain classes provide such views
(like ObjectObjectHashMap
), but for the most part, specific methods are
provided that accept ranges or closure-like filters. If performance is still unsatisfactory,
the internals of each class are available for direct manipulation.
Rough relationships between Java Collections classes and HPPC classes are presented in the table below.
Java Collections | HPPC (primitives) | HPPC (generics) | |
---|---|---|---|
bit sets | java.util.BitSet |
BitSet |
n/a |
array-backed lists | java.util.ArrayList , java.util.Vector |
[type]ArrayList |
ObjectArrayList<T> |
stacks | java.util.Stack |
[type]Stack |
ObjectStack<T> |
deques | java.util.ArrayDeque |
[type]ArrayDeque |
ObjectArrayDeque<T> |
hash sets | java.util.HashSet |
[keyType]HashSet [keyType]ScatterSet |
ObjectHashSet<K> [keyType]HashSet ObjectScatterSet<K> [keyType]ScatterSet |
hash maps | java.util.HashMap |
[keyType][valueType]HashMap |
ObjectObjectHashMap<K, V> [keyType]ObjectHashMap<V> Object[valueType]HashMap<K> ObjectObjectScatterMap<K, V> [keyType]ObjectScatterMap<V> Object[valueType]ScatterMap<K>
|
The method-level API of the corresponding types is also similar, but distinct differences exist (consult the JavaDoc of each class).
All key-redistribution containers (maps and sets) implemented in HPPC use open addressing with linear probing. Given the speed of access to CPU caches on modern processors this type of key lookup is the fastest one in practice. However, there is a potential problem with this approach: because keys are distributed in the underlying array at approximately their hash value, when they are copied to another hash container, the number of conflicts to find an empty slot may quickly increase. In the worst case this may lead to an exponential cost of any operation that requires a slot-lookup (adding, removal of a key). The "worst" case is, unfortunately, not as unlikely as it may seem; it's enough that keys are just copied from one container to another, in particular with load factors higher than 0.5.
In HPPC there is a distinction between "hash" and "scatter" sets and maps. Any hash-based structure will try to redistribute keys differently on a per-object basis. So the likelihood of the worst case (described above) is minimized to nearly zero at the cost of slightly increased slot lookup cost and lower CPU cache utilization when keys are copied from one container to another. The "scatter" sets and maps will use a very simple hashing strategy to be as fast as possible, but this can lead to conflict avalanches if scatter container's keys are copied to another scatter container. So when are scatter sets (or maps) useful? In short, whenever there is a need to check for existence of a key or for collecting (aggregating) some value for an existing key but assuming the content of the scatter container is not later propagated to some other (scatter) set or map.
If you're unsure which one to use, use regular hash containers. The difference shows only for large data sets (measured in millions of elements) anyway.
HPPC's interfaces are modeled to encapsulate the common functionality enforced from all the classes that implement a given interface. The interface hierarchy is loosely inspired by STL.
An overview of interfaces and their relationship to data structures implemented in HPPC is depicted graphically below.
Copyright © 2015 Carrot Search s.c.. All Rights Reserved.