11.07.10

Sébastien

,

Unmappable character for encoding UTF8

This classically happens in the following scenario: developers happily code in their Windows environment in Eclipse or whatever IDE they love, check in their stuff, and suddenly, CruiseControl spits out a whole lot of warnings, or even errors depending on how the build is configured. Looking at the code, everything compiles nicely on the developer’s machine:

public class EncodingExample {
	private final static String TEXT = "Éáíó";
	public static void main(String[] args) {
		System.out.println(EncodingExample.TEXT);
	}
}

Here is the Ant file used by the build in CC:

<?xml version="1.0" encoding="utf-8" ?>
<project name="test" default="compile">
	<target name="compile">
		<javac srcdir="src" destdir="classes" debug="true" />
	</target>
</project>

And yet, the CruiseControl logs show the following:

    [javac] Compiling 1 source file to /home/sebastien/workspace/sandbox/classes
    [javac] /home/sebastien/workspace/sandbox/src/EncodingExample.java:2: warning: unmappable character for encoding UTF8
    [javac] 	private final static String TEXT = "����";
    [javac] 	                                    ^
    [javac] /home/sebastien/workspace/sandbox/src/EncodingExample.java:2: warning: unmappable character for encoding UTF8
    [javac] 	private final static String TEXT = "����";
    [javac] 	                                     ^
    [javac] /home/sebastien/workspace/sandbox/src/EncodingExample.java:2: warning: unmappable character for encoding UTF8
    [javac] 	private final static String TEXT = "����";
    [javac] 	                                      ^
    [javac] /home/sebastien/workspace/sandbox/src/EncodingExample.java:2: warning: unmappable character for encoding UTF8
    [javac] 	private final static String TEXT = "����";
    [javac] 	                                       ^
    [javac] 4 warnings

Here is what happens: when working on Windows, the IDE is more than likely configured to edit files in Cp1252, which is a Microsoft adaptation of latin-11. Teh developer checks in, and the Continuous Integration server (usually running on Linux, which nowadays is all utf8) picks up the file, and tries to compile as a UTF-8 file, hence the warning.

The way to solve this is: – Either save the file as UTF-8 (you can configure Eclipse for example to use UTF-8; make sure that you check in Eclipse preference files as well as so that everybody uses the same), but everybody has to make sure they use that encoding, – Or modify the Ant script to compile the file as latin-1:

<?xml version="1.0" encoding="utf-8" ?>
<project name="test" default="compile">
	<target name="compile">
		<javac srcdir="src" destdir="classes" 
                           encoding="cp1252" debug="true" />
	</target>
</project>

You can also try encoding="iso-8859-1". It is not wrong not to use utf-8 in itself (as in, cp1252 is not a bad “encoding”); you just have to make sure you keep the same encoding everywhere… And working with Windows and Linux at the same time, it can sometimes prove tricky.

1 It contains, in particular, French characters missing from latin-1 such as œ, Œ, and Ÿ. As well as our beloved European €.

 
---

Comment

 
---