Sergii Dymchenko

Facebook Hacker Cup 2015 Round 1: full score with Python, Prolog, and GAP

The results of the Facebook Hacker Cup 2015 Round 1 have been announced. I've got a full score using Python, Prolog, and GAP, and I want to share my solutions. All code in this article is the actual code I wrote and submitted during the round.

Homework: 10 points

The first problem of the round, Homework, asks to find how many integers in the range [A, B] have exactly K different prime divisors.
2 ≤ A, B ≤ 107, and there can be up to 100 test cases in one input file.

My (very inefficient, but still fast enough for a given time constraints) solution precalculates the number of different prime divisors for every integer up to 107 using GAP (Groups, Algorithms, Programming, http://www.gap-system.org/):

#!/bin/sh
# GAP (http://www.gap-system.org):
echo 'for n in [2..10^7] do  Print(n, " ", Length(PrimeDivisors(n)), "\n"); od;' | gap -q > prime_data.txt

After that my Python (I used Python 2.7.5+) program uses that info to construct a hash table, where the keys are numbers of different prime divisors, and the values are lists of integers that have that number of prime divisors. Then for every test case the program loops through the list of integers with the given number of different prime divisors and counts the ones in the given range.

MEM = {}

def do_case(case_num, a, b, k):
    result = 0
    if k in MEM:
        for n in MEM[k]:
            if n >= a and n <= b:
                result += 1
    print "Case #%d: %d" % (case_num, result)

def main():
    # prime_data.txt must be created with create_prime_data.sh
    with open('prime_data.txt', 'r') as f:
        prime_data = f.readlines()
    for line in prime_data:
        try:
            n, primacity = map(int, line.split())
            if primacity in MEM:
                MEM[primacity].append(n)
            else:
                MEM[primacity] = [n]
        except ValueError:
            pass

    T = int(raw_input())
    for case_num in range(1, T + 1):
        a, b, k = map(int, raw_input().split())
        do_case(case_num, a, b, k)

if __name__ == "__main__":
    main()

The try...except construct is needed because I haven't found a way to make GAP (I use version 4.7.2 on Linux) not to put a carriage return at the beginning of the output.

Autocomplete: 25 points

The essence of second problem of the round, Autocomplete, is as follows:

  • you are given a list of words (only lowercase letters a-z are used)
  • there is a dictionary, initially empty
  • for every word, in the given order, you find the length of its shortest prefix that is not a prefix of any word previously added to the dictionary, or the length of the word itself if such prefix does not exist, and afterwards add the word to the dictionary
  • output the sum of these lengths.

There can be up to 100,000 words with a total length of no more than 1,000,000 characters in a test case, and up to 100 test cases in the input.

For me it was obvious that the problem is about using the trie datastructure. I found a comparison of various trie impementations in Python and used the first one from the list, from BioPython:

# http://biopython.org
from Bio.trie import trie

def do_case(case_num, words):
    Dict = trie()
    result = 0
    for w in words:
        lo = 0
        hi = len(w)
        while lo < hi:
            mid = 1 + (lo + hi) // 2
            if Dict.has_prefix(w[:mid]):
                lo = mid
            else:
                hi = mid - 1
        result += min(len(w), (lo + 1))
        Dict[w] = True
    print "Case #%d: %d" % (case_num, result)

def main():
    T = int(raw_input())
    for case_num in range(1, T + 1):
        n = int(raw_input())
        words = []
        for i in range(n):
            words.append(raw_input().strip())
        do_case(case_num, words)

if __name__ == "__main__":
    main()

The program uses binary search and has_prefix method of the trie to find the length of the minimum prefix that is not already in the dictionary for every word.

Winning at Sports: 25 points

The essence of the third problem, Winning at Sports, is as following:

Given a winning score (two integers from 0 to 2000, the first one is always larger), find the number of ways (modulo 1,000,000,007) to obtain this score from 0-0 by adding one point at a time, such that:

  • you score the first point and from then on you always have more points than your opponent (stress-free victory)
  • your opponent's score if always greater until after their score is equal to their final score (stressful victory).

Interestingly, according to the problem statement, 0-0 → 1-0 is at the same time a stress-free way and a stressful way to obtain the 1-0 score.

I implemented a top-down dynamic programming solution in B-Prolog:

% B-Prolog 8.1 - http://www.probp.com/
% Usage:
% sed 's/^ */[/; s/ *$/]./; s/ \+/, /g' in-file | bp -g "cl('WinningAtSports.pl'),main,halt" -l | tail -n +5 > out-file

:- table stressfree(+,+,-).
stressfree(0, 0, 1).
stressfree(X, Y, W) :-
    X >= 0, Y >= 0,
    X > Y,
    X1 is X - 1,
    ( stressfree(X1, Y, Wx) ; Wx = 0 ),
    Y1 is Y - 1,
    ( stressfree(X, Y1, Wy) ; Wy = 0 ),
    W is (Wx + Wy) mod 1000000007.

:- table stressful(+,+,+,-).
stressful(0, 0, _, 1).
stressful(X, Y, End, W) :-
    X >= 0, Y >= 0,
    ( X =< Y ; End == 1),
    X1 is X - 1,
    ( stressful(X1, Y, End, Wx) ; Wx = 0 ),
    Y1 is Y - 1,
    ( stressful(X, Y1, 0, Wy) ; Wy = 0 ),
    W is (Wx + Wy) mod 1000000007.

do_case(Case_num, X, Y) :-
    stressfree(X, Y, W1),
    stressful(X, Y, 1, W2),
    format("Case #~w: ~w ~w\n", [Case_num, W1, W2]).

main :-
    read([T]),
    foreach(Case_num in 1..T, [X, Y], (
            read([X-Y]),
            do_case(Case_num, X, Y)
        )).

The solution assumes that the input is preprocessed to simplify reading in Prolog with the sed script given in the comments: every line becomes a list in Prolog format and ends with a full stop.

The stressfree predicate has two input parameters – the numbers in the given score, and one output parameter – the number of ways to achieve the score in a stress-free manner. There is one way to get 0-0 score – this is the base case of recursion. For a X Y score, the number of stress-free ways to get this score is the sum of the numbers of stress-free ways to get scores X-1 Y and X Y-1.

The stressful predicate is similar, but it has an additional input parameter that indicates whether the current opponent's score is their final score.

Both stressfree and stressful are memoized using the B-Prolog's table construct. The predicates work as memoized functions and don't rely on a changing state, so we can reuse the results even between different test cases.

For more details about tabling and another example of dynamic programming in B-Prolog see Dynamic programming solution for Facebook Hacker Cup problem "AAAAAA" in B-Prolog.

Corporate Gifting: 40 points

The hardest problem of the round, Corporate Gifting, has a lengthy and convoluted statement, but in essence asks to find a chromatic sum of a tree described as a list of parents of its nodes. Chromatic sum of a graph is the smallest possible sum of all proper colorings of the graph using positive integers.

A paper that introduced the concept of chromatic sum, E. Kubicka and A. J. Schwenk. 1989. An introduction to chromatic sums, also contains a pseudocode of a linear time algorithm for finding a chromatic sum of a tree.

My Python solution is basically just an implementation of the algorithm from the paper (after constructing a list of children for each vertex and identifying "leaves to root" order of the vertices):

from collections import deque

class Node:
    def __init__(self):
        self.minsum = 0
        self.rcolor = 0
        self.delta = 0
        self.ncolor = 0
        self.sons = []
        self.nosons = 0

def minsum(managers):
    n = len(managers)
    tree = {}
    for i in xrange(1, n + 1):
        tree[i] = Node()
    for i in xrange(2, n + 1):
        tree[managers[i - 1]].sons.append(i)
        tree[managers[i - 1]].nosons += 1

    leaf_to_root = [1]
    frontier = deque([1])
    while frontier:
        x = frontier.popleft()
        leaf_to_root.extend(tree[x].sons)
        frontier.extend(tree[x].sons)
    leaf_to_root = leaf_to_root[::-1]

    # Algo:  
    # E. Kubicka and A. J. Schwenk. 1989. An introduction to chromatic sums. 
    # In Proceedings of the 17th conference on ACM Annual Computer Science Conference (CSC '89). ACM, New York, NY, USA, 39-45. 
    # DOI=10.1145/75427.75430 http://doi.acm.org/10.1145/75427.75430 
    coloradd = {}
    for i in leaf_to_root:
        if not tree[i].sons:
            tree[i].minsum = 1
            tree[i].rcolor = 1
            tree[i].delta = 1
            tree[i].ncolor = 2
        else:
            mintotal = 0
            for k in xrange(1, tree[i].nosons + 2 + 1):
                coloradd[k] = k
            for son in tree[i].sons:
                mintotal += tree[son].minsum
                coloradd[tree[son].rcolor] += tree[son].delta
            sum1 = float("Inf")
            sum2 = float("Inf")

            for k in xrange(1, tree[i].nosons + 2 + 1):
                value = coloradd[k]
                if value < sum1:
                    sum2 = sum1
                    color1 = k
                    color2 = color1
                    sum1 = value
                else:
                    if value < sum2:
                        color2 = k
                        sum2 = value
            tree[i].minsum = sum1 + mintotal
            tree[i].rcolor = color1
            tree[i].delta = sum2 - sum1
            tree[i].ncolor = color2
    return tree[1].minsum

def do_case(case_num, managers):
    print "Case #%d: %d" % (case_num, minsum(managers))

def main():
    T = int(raw_input())
    for case_num in range(1, T + 1):
        n = int(raw_input())
        managers = map(int, raw_input().split())
        do_case(case_num, managers)

if __name__ == "__main__":
    main()

Overall, the round was not well-prepared. Two problems have very large input files; that caused problems for many contestants with poor Internet connection and added a lot of confusion after the organizers decided to accept solutions by email. The hardest problem can be solved by directly implementing the algorithm from the paper (and also very similar problems have been already posed in other programming competitions). All of the problems promised 100 test cases, but only 20 were given.

I'm looking forward to the second round and I hope it will be better prepared.

Comments

comments powered by Disqus