We are a huge retail organization with over 1500 stores as well as online shopping websites. Volumes of messages at the middleware run probably over a billion daily. Currently we have a commercial product by another vendor taking up the responsibility.
We plan to move out systems soon into complete open source stack. That will include products like Camel, Fuse , Jenkins, Gerrit etc.
We have 2 scenarios where we are trying to fit in Camel
1) For real time traffic. That will be the typical busy Fuse ESB layer which takes care of messaging / transformation / reliable delivery etc between enterprise applications. We are clear on that and going ahead with Camel and OSGi containers.
2) Scheduled Batch/ETL jobs. Now these will be really heavy jobs and may contain payloads running to few GB's in size. We want to fit in Camel for these and alternative in case we hit a road block will be Talend ETL.
My queries are below
1) Is camel capable of doing heavy batch ETL's reliably? If so what would be the programmer's best practices & strategy for dealing heavy csv files. I wish not to see OOM issues.
2) What would be the best suitable container for running batch ETL jobs? Should I use Servicemix / Jboss Fuse or consider other Java based containers?
Any insights from you all would help evaluate on options.