Testing Algorithm Robustness

Do you know what your heuristics are doing?

http://bit.ly/testing-algorithmic-robustness

Problem

We have some algorithm that works on data.
We want to test that it's correct.
How? We feed it data and assert it got the correct answer!
How do we know what the correct answer is...?

Solution

We don't test that the answer is right only that it is robust.

What does that mean?

We test the behaviour of our algorithms under changes to the data.
If it shouldn't improve the answer, it doesn't.
If it should improve the answer, it does.

How?

Using Hypothesis!
You write tests that something should be true for all inputs.
Hypothesis runs your test against a wide range of input.

Knapsack Packing Problem

Given \(n\) items with weights \(w_i\) and values \(v_i\), and \(C \geq 0\), find a set of items with total weight \(\leq C\) that maximizes the total value.

NP-hard, so we'll use a \(O(n \log(n))\) heuristic solution instead.


def solve_knapsack(items, capacity):
    """
    items: List of (value, weight) pairs
    capacity: Non-negative number.

    Returns a list of items from the list with weight summing
    to <= capacity that tries to maximize sum(value).
    """
    items.sort(key=lambda x: (x[0] / x[1], x[1]), reverse=True)
    result = []
    for item in items:
        if item[1] + current_fill <= capacity:
            result.append(item)
            current_fill += item[1]
    return result


from hypothesis import given, assume
import hypothesis.strategies as st

Item = st.tuples(
  st.integers(min_value=1), st.integers(min_value=1))
ItemSet = st.lists(Item)
Capacity = st.integers(min_value=0)


@given(ItemSet, Capacity)
def test_increasing_score_improves_things(
    items, capacity
):
    result = solve_knapsack(items, capacity)
    original_score = score_solution(result)
    assert result
    for item in result:
        new_items = list(items)
        new_items.append((item[0] + 1, item[1]))
        new_result = solve_knapsack(new_items, capacity)
        assert score_solution(new_result) > original_score


Falsifying example:

test_increasing_score_improves_things(
  items=[(6, 3), (5, 2)],
  capacity=5
)

If we add a (6, 2) item in, our chosen solution is [(6, 2), (5, 2)], scoring 11. We could have picked [(6, 2), (6, 3)], scoring 12.


@given(ItemSet, Capacity)
def test_increasing_weight_does_not_improve_things(
    items, capacity
):
    result = solve_knapsack(items, capacity)
    original_score = score_solution(result)
    for item in result:
        new_items = list(items)
        new_items.remove(item)
        new_items.append((item[0], item[1] + 1))
        new_result = solve_knapsack(new_items, capacity)
        assert score_solution(new_result) <= original_score


Falsifying example:

test_increasing_weight_of_chosen_item_does_not_improve_things(
  items=[(66, 33), (65, 32)],
  capacity=33
)

The chosen example is (65, 32), which has a lower score but higher score per unit weight. Raising its weight causes us to choose the other one which has a higher score.

In Summary

Not knowing the right answer shouldn't stop you from testing your code.
By testing your algorithms on random data, you'll get a much better handle on how it behaves in the wild.
By defining how your algorithm responds to changing data, you will get more robust heuristics.

Obligatory Sales Pitch

I'm available for hire for consulting and training.

drmaciver.com / @DRMacIver

https://hypothesis.readthedocs.org/

http://bit.ly/testing-algorithmic-robustness