Recently we ran into a bug in our software. We couldn’t store long titles because our database columns are constrained to 180 characters. Until then we hadn’t anticipated on, nor entered titles this long.

To store the records in the database again, we truncated the product title in our application code. Which was easy enough, but the problem remained.

Time to look into it in more detail, what could be wrong?

Our Oracle database is expressing its column length in bytes. Java’s String, however, uses Unicode Standard.

This means that more complex characters like an ä have a String length of 1 in Java but need more bytes in the database, which can result in failing to store the text in the database.

Let’s look at the word Knäckebröd. This has a String length of 10. And a crispness of about 12.


Photo by MatthiasKabel at the German language Wikipedia, CC BY-SA 3.0

The length in bytes can be determined like this:

So, how to solve this in a bit more decent fashion in our code?

For that, we use the CharSetDecoder. To transform our Unicode String into the character set that is being used in the database:

What are the ignore configurations for? Basically for two situations:

  • Malformed input: when the character sequence is not a legal Unicode sequence.
  • Unmappable characters: for when a byte sequence is valid, but the result is not mappable.

We should specify in our code what we want to do in those cases. This can be specified with the CodingErrorAction. There are three options:

  • IGNORE: Drop input and continue to next
  • REPLACE: Replace the input by specified replacement
  • REPORT: Throw a CharacterCodingException

Since there is no logical replacement and reporting makes no sense in our case, since we would want to truncate anyway, we choose to ignore any malformed or mapping errors.

This results in the following code:

Of course, a more structural approach is to make sure lengthy input does not reach our service in te first place. But while we are working on that, for now we won’t get any database errors, and our service can continue its work.