Account Mapping to SKR04 with OpenAI Assistant

For the uninitiated in accounting, and tax reporting: though it must be absolutely standardised, everyone uses their own accounting “standards” (at the very least, they use their own accounting descriptions, and numbers)

And when I say account numbers, it is not bank account numbers I mean. Every/any thing that happens in accouting goes to and comes from one or another account. That’s something we’ll address another day.

Coming to the issue at hand: When clients report their accounting to tax consultants (who then must calculate the tax adjusted P&L¹ and Balance Sheet), the account numbering is based on the clients internal system.

The tax consultant then has to take these, map them to the accepted standard² - SKR (StandardKontenRahmen) in Germany. ³

As we have been working a lot with taxation - though this problem does not affect us directly - we decided to spend half a day on this. The members of our team were @Steffen, @Dimitar, @Pam, and yours truly.

How hard can it be?

Poor man’s attempt

It all starts with taking the client’s input (it is always excel) in on hand, and the SKR04⁴ definitions on the other hand.

A simple loop inside a loop was the first attempt. Loop through the lists and just compare all pairs to finally sort them based on (string) similarity.

How rudimentary this may seem, it was actually somewhat effective, and already gave us a number of matches in the best match list. At least enough to rais our confidence.

But it wasn’t enough in the general sense, and was a failure when the client input was in English (failed matches all around)

Here’s the code⁵

def get_matching_account_names(
    account_names: list[str], datev_account_names: list[str]
) -> tuple[
    dict[str, list[str]],
    dict[str, list[str]],
    dict[str, list[str]],
    dict[str, list[str]],
]:
    # Initialize dictionaries for different match levels
    match_levels = {
        "best": dict[str, list[tuple[float, str]]](),
        "close": dict[str, list[tuple[float, str]]](),
        "okay": dict[str, list[tuple[float, str]]](),
        "rest": dict[str, list[tuple[float, str]]](),
    }

    # Define similarity thresholds
    thresholds = {
        "best": 0.95,
        "close": 0.8,
        "okay": 0.6,
    }

    for account_name in account_names:
        # Initialize lists for each match level
        for level in match_levels:
            match_levels[level][account_name] = []

        for datev_account_name in datev_account_names:
            similarity = similar(account_name, datev_account_name)
            if similarity >= thresholds["best"]:
                match_levels["best"][account_name].append(
                    (similarity, datev_account_name)
                )
            elif similarity >= thresholds["close"]:
                match_levels["close"][account_name].append(
                    (similarity, datev_account_name)
                )
            elif similarity >= thresholds["okay"]:
                match_levels["okay"][account_name].append(
                    (similarity, datev_account_name)
                )
            else:
                match_levels["rest"][account_name].append(
                    (similarity, datev_account_name)
                )

        # Sort the lists by similarity
        for level in match_levels:
            match_levels[level][account_name] = sort_by_similarity(
                match_levels[level][account_name]
            )

    best, close, okay, rest = [
        create_dict_to_list_from_dict_to_list_of_tuples(
            remove_keys_with_no_matches(match_levels[level])
        )
        for level in ["best", "close", "okay", "rest"]
    ]
    return best, close, okay, rest

Rich man’s attempt

Now that we were thrilled with two lists and two loops, we were ready to take the big guns. Throw everything to an LLM (OpenAI) and then get the result.

The result was quite underwhelming. It wasn’t too bad, but compared to what we received from simple loops, LLM didn’t impress us.

What impressed us was the speed with which our billing went up. With every request (test) we made, we could see our usage stats in OpenAI billing dashboard.

The cost effectiveness (or lack thereof) discouraged us from continuing in this direction

The code is here though⁶

# Step 3: Function to compare account descriptions using the Chat Completion API
def find_best_match(new_account_desc: str, datev_accounts: list[str]) -> list[str]:
    # Prepare the conversation prompt
    system_message = "You are an expert assistant that matches account descriptions to find the 3 closest semantic matches. Just provide the description of the new accounts without any further explanation or numbering, just split them by semi colon"
    user_message = (
        f"Here is a list of account descriptions:\n"
        f"{', '.join(datev_accounts)}\n"
        f"Match the following description to the closest one in the list: '{new_account_desc}'"
    )

    # Send request to OpenAI API
    response = openai.chat.completions.create(
        model="gpt-4o",  # Replace with gpt-3.5-turbo if preferred
        messages=[
            {"role": "system", "content": system_message},
            {"role": "user", "content": user_message},
        ],
        max_tokens=100,
    )

    # Extract and return the model's response
    return response.choices[0].message.content.strip()

The pragmatist’s approach

WIP

Profit and Loss ↩︎
Accepted accounting standard is set by DATEV (effectively) ↩︎
Our story happens in Germany ↩︎
SKR04 was our chosen version of SKR as the cases we usually deal with had to do with this version. ↩︎
In the time restricted hackathon day, we didn’t care much about “style” ↩︎
Perhaps we should have worked more on our prompt; But for usage stats we saw, we stopped pretty quickly. ↩︎