JAXB DataFormat encoding

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

JAXB DataFormat encoding

Stig Døssing
Hi,

I have a route like the following:

From(sjms2)
.unmarshal().jaxb("myjaxbpackage")

When I send an XML message with the following content

<?xml version="1.0" encoding="ISO-8859-1"?>
... Rest of XML content here ...

To the sjms2 endpoint, any Danish characters (e.g. ø) in the message get mangled. The Camel message body in this example is a String.

Looking at the unmarshal implementation, it looks like Camel forces messages to InputStream (seemingly with UTF-8 encoding by default) before passing them to the JAXB data format. See https://github.com/apache/camel/blob/3312243b32af03ac39c3af170e318f03e01d64f0/core/camel-support/src/main/java/org/apache/camel/support/processor/UnmarshalProcessor.java#L56

I can work around this by converting the message body to a Latin-1 InputStream before unmarshalling, or by setting the encoding property on the data format, but I'm wondering why Camel is implemented this way? For at least JAXB unmarshalling, there is no reason to serialize a String to InputStream before handing it off to JAXB, and it is less flexible than just passing the String to JAXB, as my code now needs to decide the input message's charset, which JAXB would otherwise handle for me.

In the current code, the serialization looks to be necessary because DataFormat.unmarshal takes an Exchange and an InputStream. Wouldn't it be more flexible to only pass the Exchange to the DataFormat, and leave the implementation free to check whether the message is already a format it can process before trying to serialize to bytes? For instance, the JAXB data format could check whether the input is a Reader or a String, and use the matching JAXB Unmarshaller methods.


Reply | Threaded
Open this post in threaded view
|

Re: JAXB DataFormat encoding

Claus Ibsen-2
Hi

The data format API was designed that way when Camel was created.

On Mon, Sep 30, 2019 at 12:46 PM Stig Døssing
<[hidden email]> wrote:

>
> Hi,
>
> I have a route like the following:
>
> From(sjms2)
> .unmarshal().jaxb("myjaxbpackage")
>
> When I send an XML message with the following content
>
> <?xml version="1.0" encoding="ISO-8859-1"?>
> ... Rest of XML content here ...
>
> To the sjms2 endpoint, any Danish characters (e.g. ø) in the message get mangled. The Camel message body in this example is a String.
>
> Looking at the unmarshal implementation, it looks like Camel forces messages to InputStream (seemingly with UTF-8 encoding by default) before passing them to the JAXB data format. See https://github.com/apache/camel/blob/3312243b32af03ac39c3af170e318f03e01d64f0/core/camel-support/src/main/java/org/apache/camel/support/processor/UnmarshalProcessor.java#L56
>
> I can work around this by converting the message body to a Latin-1 InputStream before unmarshalling, or by setting the encoding property on the data format, but I'm wondering why Camel is implemented this way? For at least JAXB unmarshalling, there is no reason to serialize a String to InputStream before handing it off to JAXB, and it is less flexible than just passing the String to JAXB, as my code now needs to decide the input message's charset, which JAXB would otherwise handle for me.
>
> In the current code, the serialization looks to be necessary because DataFormat.unmarshal takes an Exchange and an InputStream. Wouldn't it be more flexible to only pass the Exchange to the DataFormat, and leave the implementation free to check whether the message is already a format it can process before trying to serialize to bytes? For instance, the JAXB data format could check whether the input is a Reader or a String, and use the matching JAXB Unmarshaller methods.
>
>


--
Claus Ibsen
-----------------
http://davsclaus.com @davsclaus
Camel in Action 2: https://www.manning.com/ibsen2
Reply | Threaded
Open this post in threaded view
|

RE: JAXB DataFormat encoding

Stig Døssing
Thanks for responding. Is there any interest in updating the API by adding an unmarshal method that doesn't require serialization to bytes? I think it could probably be done in a backwards compatible way, given Java 8's default methods.

-----Original Message-----
From: Claus Ibsen <[hidden email]>
Sent: 30. september 2019 17:38
To: [hidden email]
Subject: Re: JAXB DataFormat encoding

Hi

The data format API was designed that way when Camel was created.

On Mon, Sep 30, 2019 at 12:46 PM Stig Døssing <[hidden email]> wrote:

>
> Hi,
>
> I have a route like the following:
>
> From(sjms2)
> .unmarshal().jaxb("myjaxbpackage")
>
> When I send an XML message with the following content
>
> <?xml version="1.0" encoding="ISO-8859-1"?> ... Rest of XML content
> here ...
>
> To the sjms2 endpoint, any Danish characters (e.g. ø) in the message get mangled. The Camel message body in this example is a String.
>
> Looking at the unmarshal implementation, it looks like Camel forces
> messages to InputStream (seemingly with UTF-8 encoding by default)
> before passing them to the JAXB data format. See
> https://github.com/apache/camel/blob/3312243b32af03ac39c3af170e318f03e
> 01d64f0/core/camel-support/src/main/java/org/apache/camel/support/proc
> essor/UnmarshalProcessor.java#L56
>
> I can work around this by converting the message body to a Latin-1 InputStream before unmarshalling, or by setting the encoding property on the data format, but I'm wondering why Camel is implemented this way? For at least JAXB unmarshalling, there is no reason to serialize a String to InputStream before handing it off to JAXB, and it is less flexible than just passing the String to JAXB, as my code now needs to decide the input message's charset, which JAXB would otherwise handle for me.
>
> In the current code, the serialization looks to be necessary because DataFormat.unmarshal takes an Exchange and an InputStream. Wouldn't it be more flexible to only pass the Exchange to the DataFormat, and leave the implementation free to check whether the message is already a format it can process before trying to serialize to bytes? For instance, the JAXB data format could check whether the input is a Reader or a String, and use the matching JAXB Unmarshaller methods.
>
>


--
Claus Ibsen
-----------------
http://davsclaus.com @davsclaus
Camel in Action 2: https://www.manning.com/ibsen2
Reply | Threaded
Open this post in threaded view
|

Re: JAXB DataFormat encoding

Claus Ibsen-2
On Tue, Oct 1, 2019 at 9:45 AM Stig Døssing <[hidden email]> wrote:
>
> Thanks for responding. Is there any interest in updating the API by adding an unmarshal method that doesn't require serialization to bytes? I think it could probably be done in a backwards compatible way, given Java 8's default methods.
>

Yeah possible. At first then lets create a JIRA ticket.

For the unmarshal then the unmarshal processor would need to be
adaptable to this, and try the newer api and if its not in use, eg
return null, then it would use the current etc. Or something like
that.



> -----Original Message-----
> From: Claus Ibsen <[hidden email]>
> Sent: 30. september 2019 17:38
> To: [hidden email]
> Subject: Re: JAXB DataFormat encoding
>
> Hi
>
> The data format API was designed that way when Camel was created.
>
> On Mon, Sep 30, 2019 at 12:46 PM Stig Døssing <[hidden email]> wrote:
> >
> > Hi,
> >
> > I have a route like the following:
> >
> > From(sjms2)
> > .unmarshal().jaxb("myjaxbpackage")
> >
> > When I send an XML message with the following content
> >
> > <?xml version="1.0" encoding="ISO-8859-1"?> ... Rest of XML content
> > here ...
> >
> > To the sjms2 endpoint, any Danish characters (e.g. ø) in the message get mangled. The Camel message body in this example is a String.
> >
> > Looking at the unmarshal implementation, it looks like Camel forces
> > messages to InputStream (seemingly with UTF-8 encoding by default)
> > before passing them to the JAXB data format. See
> > https://github.com/apache/camel/blob/3312243b32af03ac39c3af170e318f03e
> > 01d64f0/core/camel-support/src/main/java/org/apache/camel/support/proc
> > essor/UnmarshalProcessor.java#L56
> >
> > I can work around this by converting the message body to a Latin-1 InputStream before unmarshalling, or by setting the encoding property on the data format, but I'm wondering why Camel is implemented this way? For at least JAXB unmarshalling, there is no reason to serialize a String to InputStream before handing it off to JAXB, and it is less flexible than just passing the String to JAXB, as my code now needs to decide the input message's charset, which JAXB would otherwise handle for me.
> >
> > In the current code, the serialization looks to be necessary because DataFormat.unmarshal takes an Exchange and an InputStream. Wouldn't it be more flexible to only pass the Exchange to the DataFormat, and leave the implementation free to check whether the message is already a format it can process before trying to serialize to bytes? For instance, the JAXB data format could check whether the input is a Reader or a String, and use the matching JAXB Unmarshaller methods.
> >
> >
>
>
> --
> Claus Ibsen
> -----------------
> http://davsclaus.com @davsclaus
> Camel in Action 2: https://www.manning.com/ibsen2



--
Claus Ibsen
-----------------
http://davsclaus.com @davsclaus
Camel in Action 2: https://www.manning.com/ibsen2
Reply | Threaded
Open this post in threaded view
|

RE: JAXB DataFormat encoding

Stig Døssing
Sounds good. Created https://issues.apache.org/jira/browse/CAMEL-14028, and described how I think we can implement this change. I'll try implementing it and put up a PR when I get a chance.

-----Original Message-----
From: Claus Ibsen <[hidden email]>
Sent: 2. oktober 2019 09:19
To: [hidden email]
Subject: Re: JAXB DataFormat encoding

On Tue, Oct 1, 2019 at 9:45 AM Stig Døssing <[hidden email]> wrote:
>
> Thanks for responding. Is there any interest in updating the API by adding an unmarshal method that doesn't require serialization to bytes? I think it could probably be done in a backwards compatible way, given Java 8's default methods.
>

Yeah possible. At first then lets create a JIRA ticket.

For the unmarshal then the unmarshal processor would need to be adaptable to this, and try the newer api and if its not in use, eg return null, then it would use the current etc. Or something like that.



> -----Original Message-----
> From: Claus Ibsen <[hidden email]>
> Sent: 30. september 2019 17:38
> To: [hidden email]
> Subject: Re: JAXB DataFormat encoding
>
> Hi
>
> The data format API was designed that way when Camel was created.
>
> On Mon, Sep 30, 2019 at 12:46 PM Stig Døssing <[hidden email]> wrote:
> >
> > Hi,
> >
> > I have a route like the following:
> >
> > From(sjms2)
> > .unmarshal().jaxb("myjaxbpackage")
> >
> > When I send an XML message with the following content
> >
> > <?xml version="1.0" encoding="ISO-8859-1"?> ... Rest of XML content
> > here ...
> >
> > To the sjms2 endpoint, any Danish characters (e.g. ø) in the message get mangled. The Camel message body in this example is a String.
> >
> > Looking at the unmarshal implementation, it looks like Camel forces
> > messages to InputStream (seemingly with UTF-8 encoding by default)
> > before passing them to the JAXB data format. See
> > https://github.com/apache/camel/blob/3312243b32af03ac39c3af170e318f0
> > 3e
> > 01d64f0/core/camel-support/src/main/java/org/apache/camel/support/pr
> > oc
> > essor/UnmarshalProcessor.java#L56
> >
> > I can work around this by converting the message body to a Latin-1 InputStream before unmarshalling, or by setting the encoding property on the data format, but I'm wondering why Camel is implemented this way? For at least JAXB unmarshalling, there is no reason to serialize a String to InputStream before handing it off to JAXB, and it is less flexible than just passing the String to JAXB, as my code now needs to decide the input message's charset, which JAXB would otherwise handle for me.
> >
> > In the current code, the serialization looks to be necessary because DataFormat.unmarshal takes an Exchange and an InputStream. Wouldn't it be more flexible to only pass the Exchange to the DataFormat, and leave the implementation free to check whether the message is already a format it can process before trying to serialize to bytes? For instance, the JAXB data format could check whether the input is a Reader or a String, and use the matching JAXB Unmarshaller methods.
> >
> >
>
>
> --
> Claus Ibsen
> -----------------
> http://davsclaus.com @davsclaus
> Camel in Action 2: https://www.manning.com/ibsen2



--
Claus Ibsen
-----------------
http://davsclaus.com @davsclaus
Camel in Action 2: https://www.manning.com/ibsen2