Split streaming and order of sub-messages

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Split streaming and order of sub-messages

Pawch
Hello,

I'm currently trying to get a Route going to read large csv files and writing their content to xml. The xml has to look like the following:
<base>
<head>
static content here
</head>
<body>
csv content here by line
</body>
<body>
...
</base>


Following an example of how my route operates:

// read files
from("file:/path/to/file/file.csv")
.multicast("direct:header", "direct:data", "direct:footer");

// write XML header
from("direct:header")
.process(WriteXmlHeaderProcessor)
.to("file:/path/to/file/?fileName=${file:onlyname.noext}.xml");

// write data
from("direct:data")
.split(body().tokenize("\r\n|\n|\r")).streaming()
.process(ReadSingleLine)
.to("file:/path/to/file/?fileName=${file:onlyname.noext}.xml&fileExist=Append")
.end();

// write closing XML tags
from("direct:footer")
.process(WriteXmlFooterProcessor)
.to("file:/path/to/file/?fileName=${file:onlyname.noext}.xml&fileExist=Append");

Now my question: http://camel.apache.org/splitter says that streaming will not guarantee the order of sub-messages but I need the xml to contain exactly the same order as the csv. As the intput files can be rather large, I need some form of streaming to read them one line/chunk at a time. Does streaming still cause lines to be out of order even if I'm not using parallel processing? In the end I will use 40+ Routes for different directories containing files, could that be a problem?

I tried writing my own splitter using an Iterator over a BufferedReader/LineIterator but it will still load the whole file.

What would be the best course of action to take?

Thanks for any help
Christopher
Reply | Threaded
Open this post in threaded view
|

Re: Split streaming and order of sub-messages

Pawch
I looked through the Splitter class and the streaming flag seems to decide between using an Iterable directly or saving all entries of it into a list. So as long as parallel processing is not active, it lookes like the sub messages should still be in order, regardless of stream or not. I don't know if streaming does some other hidden stuff though.