Parsing issue with unmarshal and bindy.csv with double quotes and commas in a single field

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Parsing issue with unmarshal and bindy.csv with double quotes and commas in a single field

sadiq
I'm experiencing a parsing issue using Bindy to unmarshal a CSV file into a List of POJOs (my POJO class is annotated with @CsvRecord).

The CSV file contains the following:

partNumber,longDescription,status
123,"1970-84 Windshield Washer Jar, Multi Application",Available
234,"1967-75 6 Cyl/ Small Block 9"" Clutch Bellcrank Assembly",Available
345,"1971-79 Fan Blade 19-1/2"", 7 Blade",Available

It's the fourth line that is causing an issue: java.lang.IllegalArgumentException: No position 4 defined for the field: Available, line: 3 must be specified (it says line 3, but I believe that's because my CsvRecord class has the skipFirstLine=true)

The CSV parsing seems to be handling commas within a field surrounded by double quotes since the 2nd line is okay. It's also handling escaping double quotes correctly within fields since the 3rd line is fine too.

But the issue is when commas are present after an escaped double quote within a field that is surrounded by double quotes -- it seems to want to separate each comma into a new field when it should just be treating it as a single field.

I'm using the latest version of Camel 2.18.3 and still observing this issue.

My route is:

from("sftp://me@myhost.com?sortBy=file:modified&antInclude=*.csv&password=xxxx")
        .unmarshal()
        .bindy(BindyType.Csv, ProductDeltaCsvDataModel.class) //throwing exception here
        .to("direct:processCsv");

This is my CsvRecord POJO class:

@CsvRecord(separator = ",", skipFirstLine = true, quote = "\"", quoting = true)
public class ProductDeltaCsvDataModel {

        @DataField(pos = 1, required = true)
        private String partNumber;
       
        @DataField(pos = 2)
        private String longDescription;
       
        @DataField(pos = 3)
        private String status;

        //setters and getters
        ...
}

I believe this to be a bug unless there is some configuration I need to set?

Can someone confirm that this is a bug and how I go about logging this bug?

Thanks!
Sadiq
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Parsing issue with unmarshal and bindy.csv with double quotes and commas in a single field

Onder SEZGIN
this is not a bug. as you have comma as separator, in the line you have
exception you have too many fields already. so illegal argument exception
seems correct in the given case.


On Wed, 19 Apr 2017 at 08:10, sadiq <[hidden email]> wrote:

> I'm experiencing a parsing issue using Bindy to unmarshal a CSV file into a
> List of POJOs (my POJO class is annotated with @CsvRecord).
>
> The CSV file contains the following:
>
> partNumber,longDescription,status
> 123,"1970-84 Windshield Washer Jar, Multi Application",Available
> 234,"1967-75 6 Cyl/ Small Block 9"" Clutch Bellcrank Assembly",Available
> 345,"1971-79 Fan Blade 19-1/2"", 7 Blade",Available
>
> It's the fourth line that is causing an issue:
> java.lang.IllegalArgumentException: No position 4 defined for the field:
> Available, line: 3 must be specified (it says line 3, but I believe that's
> because my CsvRecord class has the skipFirstLine=true)
>
> The CSV parsing seems to be handling commas within a field surrounded by
> double quotes since the 2nd line is okay. It's also handling escaping
> double
> quotes correctly within fields since the 3rd line is fine too.
>
> But the issue is when commas are present after an escaped double quote
> within a field that is surrounded by double quotes -- it seems to want to
> separate each comma into a new field when it should just be treating it as
> a
> single field.
>
> I'm using the latest version of Camel 2.18.3 and still observing this
> issue.
>
> My route is:
>
> from("sftp://
> [hidden email]?sortBy=file:modified&antInclude=*.csv&password=xxxx")
>         .unmarshal()
>         .bindy(BindyType.Csv, ProductDeltaCsvDataModel.class) //throwing
> exception
> here
>         .to("direct:processCsv");
>
> This is my CsvRecord POJO class:
>
> @CsvRecord(separator = ",", skipFirstLine = true, quote = "\"", quoting =
> true)
> public class ProductDeltaCsvDataModel {
>
>         @DataField(pos = 1, required = true)
>         private String partNumber;
>
>         @DataField(pos = 2)
>         private String longDescription;
>
>         @DataField(pos = 3)
>         private String status;
>
>         //setters and getters
>         ...
> }
>
> I believe this to be a bug unless there is some configuration I need to
> set?
>
> Can someone confirm that this is a bug and how I go about logging this bug?
>
> Thanks!
> Sadiq
>
>
>
> --
> View this message in context:
> http://camel.465427.n5.nabble.com/Parsing-issue-with-unmarshal-and-bindy-csv-with-double-quotes-and-commas-in-a-single-field-tp5797871.html
> Sent from the Camel - Users mailing list archive at Nabble.com.
>
--
Sent from my iPhone
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Parsing issue with unmarshal and bindy.csv with double quotes and commas in a single field

souciance
In reply to this post by sadiq
You have four fields in the fourth row hence the error.

On Wed, Apr 19, 2017 at 7:10 AM, sadiq [via Camel] <[hidden email]> wrote:
I'm experiencing a parsing issue using Bindy to unmarshal a CSV file into a List of POJOs (my POJO class is annotated with @CsvRecord).

The CSV file contains the following:

partNumber,longDescription,status
123,"1970-84 Windshield Washer Jar, Multi Application",Available
234,"1967-75 6 Cyl/ Small Block 9"" Clutch Bellcrank Assembly",Available
345,"1971-79 Fan Blade 19-1/2"", 7 Blade",Available

It's the fourth line that is causing an issue: java.lang.IllegalArgumentException: No position 4 defined for the field: Available, line: 3 must be specified (it says line 3, but I believe that's because my CsvRecord class has the skipFirstLine=true)

The CSV parsing seems to be handling commas within a field surrounded by double quotes since the 2nd line is okay. It's also handling escaping double quotes correctly within fields since the 3rd line is fine too.

But the issue is when commas are present after an escaped double quote within a field that is surrounded by double quotes -- it seems to want to separate each comma into a new field when it should just be treating it as a single field.

I'm using the latest version of Camel 2.18.3 and still observing this issue.

My route is:

from("sftp://me@...?sortBy=file:modified&antInclude=*.csv&password=xxxx")
        .unmarshal()
        .bindy(BindyType.Csv, ProductDeltaCsvDataModel.class) //throwing exception here
        .to("direct:processCsv");

This is my CsvRecord POJO class:

@CsvRecord(separator = ",", skipFirstLine = true, quote = "\"", quoting = true)
public class ProductDeltaCsvDataModel {

        @DataField(pos = 1, required = true)
        private String partNumber;
       
        @DataField(pos = 2)
        private String longDescription;
       
        @DataField(pos = 3)
        private String status;

        //setters and getters
        ...
}

I believe this to be a bug unless there is some configuration I need to set?

Can someone confirm that this is a bug and how I go about logging this bug?

Thanks!
Sadiq


To start a new topic under Camel - Users, email [hidden email]
To unsubscribe from Camel - Users, click here.
NAML

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Parsing issue with unmarshal and bindy.csv with double quotes and commas in a single field

sadiq
Hi souciance and Onder,

Thanks for your reply.

In a CSV, a field can be encapsulated by double quotes in case there are commas within the field that should not split the field.

So the following should be just 3 fields:
345,"1971-79 Fan Blade 19-1/2, 7 Blade",Available

And Bindy handles this scenario.

The issue is that the second field also has a double quote (to represent inches) that needs to be escaped (escaping a double quote in CSV requires 2 double quotes) in order to be stored as a double quote:
345,"1971-79 Fan Blade 19-1/2""",Available

Bindy also handles this scenario.

However, when both an escaped double quote and a comma are present within a double quote wrapped field, this is when the parsing breaks:
345,"1971-79 Fan Blade 19-1/2"", 7 Blade",Available

This is what I believe to be a bug.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Parsing issue with unmarshal and bindy.csv with double quotes and commas in a single field

Gregor Zurowski-2
Hi Sadiq,

I also believe this is a bug.  I ran into the exact same issue
recently, but did not have the time to further look into this.  It
would be great if you could file a bug in JIRA
(https://issues.apache.org/jira/browse/CAMEL) and attach a failing
unit test with your use case.

As a temporary alternative, you can use the Camel CSV component
(http://camel.apache.org/csv.html) which should work with your data.

Thanks,
Gregor

On Mon, Apr 24, 2017 at 6:04 PM, sadiq <[hidden email]> wrote:

> Hi souciance and Onder,
>
> Thanks for your reply.
>
> In a CSV, a field can be encapsulated by double quotes in case there are
> commas within the field that should not split the field.
>
> So the following should be just 3 fields:
> 345,"1971-79 Fan Blade 19-1/2, 7 Blade",Available
>
> And Bindy handles this scenario.
>
> The issue is that the second field also has a double quote (to represent
> inches) that needs to be escaped (escaping a double quote in CSV requires 2
> double quotes) in order to be stored as a double quote:
> 345,"1971-79 Fan Blade 19-1/2""",Available
>
> Bindy also handles this scenario.
>
> However, when both an escaped double quote and a comma are present within a
> double quote wrapped field, this is when the parsing breaks:
> 345,"1971-79 Fan Blade 19-1/2"", 7 Blade",Available
>
> This is what I believe to be a bug.
>
>
>
> --
> View this message in context: http://camel.465427.n5.nabble.com/Parsing-issue-with-unmarshal-and-bindy-csv-with-double-quotes-and-commas-in-a-single-field-tp5797871p5798199.html
> Sent from the Camel - Users mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Parsing issue with unmarshal and bindy.csv with double quotes and commas in a single field

sadiq
Hi Gregor,

I'd be happy to do so; however, I have not created a unit test before in order to be able to attach one.

Can you share an example and I can try to create it?

Thanks!
Sadiq
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Parsing issue with unmarshal and bindy.csv with double quotes and commas in a single field

Gregor Zurowski-2
Hi Sadiq,

A unit test would be ideal, so it can be included in the test suite
when the issue is addressed.  Take a look at the other tests that
already exist for camel-bindy:
https://github.com/apache/camel/tree/master/components/camel-bindy/src/test

If you have problems with coming up with a formal unit test, attaching
some code that demonstrates the issue should be sufficient.

Thanks,
Gregor


On Wed, Apr 26, 2017 at 2:41 AM, sadiq <[hidden email]> wrote:

> Hi Gregor,
>
> I'd be happy to do so; however, I have not created a unit test before in
> order to be able to attach one.
>
> Can you share an example and I can try to create it?
>
> Thanks!
> Sadiq
>
>
>
> --
> View this message in context: http://camel.465427.n5.nabble.com/Parsing-issue-with-unmarshal-and-bindy-csv-with-double-quotes-and-commas-in-a-single-field-tp5797871p5798278.html
> Sent from the Camel - Users mailing list archive at Nabble.com.
Loading...