Does Camel MongoDB use cursors on findAll ?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Does Camel MongoDB use cursors on findAll ?

Ephemeris Lappis
Hello.

After some tests, it seems that the Camel MongoDB "findAll" operation tries to load all the matching queried data into memory before process them. With collections whose content is about tens millions of documents, this naturally leads to OutOfMemoryErrors...

Does this component may use cursors to read the input data and stream them ?

An idea ?

Thanks in advance.

Regards.
Reply | Threaded
Open this post in threaded view
|

Re: Does Camel MongoDB use cursors on findAll ?

Raul Kripalani
Hi,

We use Mongo cursors to read from the DB. But a DBCursor is not
something we can return to the route because not all technologies
support Streams, Cursors, Chunking, etc. For example, how would you go
about returning a DBCursor to a JMS endpoint?

That's why we offer the skipping and limiting option so you can
perform pagination in such scenarios. You can also specify a batch
size. Take a look at the component page for further details.

Hope that helps!
Raúl.

> On 17 Apr 2014, at 15:41, Ephemeris Lappis <[hidden email]> wrote:
>
> Hello.
>
> After some tests, it seems that the Camel MongoDB "findAll" operation tries
> to load all the matching queried data into memory before process them. With
> collections whose content is about tens millions of documents, this
> naturally leads to OutOfMemoryErrors...
>
> Does this component may use cursors to read the input data and stream them ?
>
> An idea ?
>
> Thanks in advance.
>
> Regards.
>
>
>
> --
> View this message in context: http://camel.465427.n5.nabble.com/Does-Camel-MongoDB-use-cursors-on-findAll-tp5750352.html
> Sent from the Camel - Users mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: Does Camel MongoDB use cursors on findAll ?

Ephemeris Lappis
Hello.

I have tried different options; like batch size; to evaluate some scenario to optimize some cases.
But for cases with a really big volume of data, retrieving them all in memory always leads to an error.

Our current case should be something as simple as :
A first route :
- receive a soap request from a web client with some kind of filter form to select data
- push the xml request to an active queue, and send back a simple soap response
A main route :
- get back the xml request from the queue
- make a json body to set the query from the xml request (10 of 15 lines of groovy for example)
- set a header to select the needed collection's attributes
- call mongo findAll
- marshal the result to csv
- write the result into a file.
- send a mail to the caller to inform the job is done.

This may be done with a very simple blueprint with very few lines and no complexity at all.

Do you mean that the only way to process a big volume of Mongo data is to set up a more "smart" algorithm like :
- build a first request to count the data.
- loop  over the data set reading batch parts using "skip" and "page size"
- write the paged results appending them to the file.
- etc ?

Have you an example of paging process ?

Thanks for you help.

Ephemeris Lappis
Le 18/04/2014 02:52, Raul Kripalani [via Camel] a écrit :
Hi,

We use Mongo cursors to read from the DB. But a DBCursor is not
something we can return to the route because not all technologies
support Streams, Cursors, Chunking, etc. For example, how would you go
about returning a DBCursor to a JMS endpoint?

That's why we offer the skipping and limiting option so you can
perform pagination in such scenarios. You can also specify a batch
size. Take a look at the component page for further details.

Hope that helps!
Raúl.

> On 17 Apr 2014, at 15:41, Ephemeris Lappis <[hidden email]> wrote:
>
> Hello.
>
> After some tests, it seems that the Camel MongoDB "findAll" operation tries
> to load all the matching queried data into memory before process them. With
> collections whose content is about tens millions of documents, this
> naturally leads to OutOfMemoryErrors...
>
> Does this component may use cursors to read the input data and stream them ?
>
> An idea ?
>
> Thanks in advance.
>
> Regards.
>
>
>
> --
> View this message in context: http://camel.465427.n5.nabble.com/Does-Camel-MongoDB-use-cursors-on-findAll-tp5750352.html
> Sent from the Camel - Users mailing list archive at Nabble.com.



If you reply to this email, your message will be added to the discussion below:
http://camel.465427.n5.nabble.com/Does-Camel-MongoDB-use-cursors-on-findAll-tp5750352p5750355.html
To unsubscribe from Does Camel MongoDB use cursors on findAll ?, click here.
NAML

Reply | Threaded
Open this post in threaded view
|

Re: Does Camel MongoDB use cursors on findAll ?

Raul Kripalani
There are unit tests that showcase this functionality of the component:

https://github.com/apache/camel/blob/e7563a7611667fb9b449d8a7f8c3fa7e3a0524bd/components/camel-mongodb/src/test/java/org/apache/camel/component/mongodb/MongoDbFindOperationTest.java#L90

I think we could enhance it anyway to enable returning the DBCursor
directly to the route, so you can then handle the result in any manner you
want, e.g. writing it to an OutputStream or whatever.

I've created [1] to track this new feature.

[1] https://issues.apache.org/jira/browse/CAMEL-7378

Regards,

*Raúl Kripalani*
Apache Camel PMC Member & Committer | Enterprise Architect, Open Source
Integration specialist
http://about.me/raulkripalani | http://www.linkedin.com/in/raulkripalani
http://blog.raulkr.net | twitter: @raulvk

On Fri, Apr 18, 2014 at 5:30 AM, Ephemeris Lappis <
[hidden email]> wrote:

> Hello.
>
> I have tried different options; like batch size; to evaluate some
> scenario to optimize some cases.
> But for cases with a really big volume of data, retrieving them all in
> memory always leads to an error.
>
> Our current case should be something as simple as :
> A first route :
> - receive a soap request from a web client with some kind of filter form
> to select data
> - push the xml request to an active queue, and send back a simple soap
> response
> A main route :
> - get back the xml request from the queue
> - make a json body to set the query from the xml request (10 of 15 lines
> of groovy for example)
> - set a header to select the needed collection's attributes
> - call mongo findAll
> - marshal the result to csv
> - write the result into a file.
> - send a mail to the caller to inform the job is done.
>
> This may be done with a very simple blueprint with very few lines and no
> complexity at all.
>
> Do you mean that the only way to process a big volume of Mongo data is
> to set up a more "smart" algorithm like :
> - build a first request to count the data.
> - loop  over the data set reading batch parts using "skip" and "page size"
> - write the paged results appending them to the file.
> - etc ?
>
> Have you an example of paging process ?
>
> Thanks for you help.
>
> Ephemeris Lappis
>
> Le 18/04/2014 02:52, Raul Kripalani [via Camel] a écrit :
> > Hi,
> >
> > We use Mongo cursors to read from the DB. But a DBCursor is not
> > something we can return to the route because not all technologies
> > support Streams, Cursors, Chunking, etc. For example, how would you go
> > about returning a DBCursor to a JMS endpoint?
> >
> > That's why we offer the skipping and limiting option so you can
> > perform pagination in such scenarios. You can also specify a batch
> > size. Take a look at the component page for further details.
> >
> > Hope that helps!
> > Raúl.
> >
> > > On 17 Apr 2014, at 15:41, Ephemeris Lappis <[hidden email]
> > </user/SendEmail.jtp?type=node&node=5750355&i=0>> wrote:
> > >
> > > Hello.
> > >
> > > After some tests, it seems that the Camel MongoDB "findAll"
> > operation tries
> > > to load all the matching queried data into memory before process
> > them. With
> > > collections whose content is about tens millions of documents, this
> > > naturally leads to OutOfMemoryErrors...
> > >
> > > Does this component may use cursors to read the input data and
> > stream them ?
> > >
> > > An idea ?
> > >
> > > Thanks in advance.
> > >
> > > Regards.
> > >
> > >
> > >
> > > --
> > > View this message in context:
> >
> http://camel.465427.n5.nabble.com/Does-Camel-MongoDB-use-cursors-on-findAll-tp5750352.html
> > > Sent from the Camel - Users mailing list archive at Nabble.com.
> >
> >
> > ------------------------------------------------------------------------
> > If you reply to this email, your message will be added to the
> > discussion below:
> >
> http://camel.465427.n5.nabble.com/Does-Camel-MongoDB-use-cursors-on-findAll-tp5750352p5750355.html
> >
> > To unsubscribe from Does Camel MongoDB use cursors on findAll ?, click
> > here
> > <
> >.
> > NAML
> > <
>
http://camel.465427.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml
> >
> >
>
>
>
>
>
> --
> View this message in context:
> http://camel.465427.n5.nabble.com/Does-Camel-MongoDB-use-cursors-on-findAll-tp5750352p5750357.html
> Sent from the Camel - Users mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: Does Camel MongoDB use cursors on findAll ?

Ephemeris Lappis
Hello Raul.


Thanks for your help.

I've had a look the test code and I understand how it should be possible to read the big result of the Mongo query using the batch size. But, at the opposite of your example, in my case I have no idea of the total count of matching documents to divide the reading process and loop over the paged results. I've test the "count" operation, but just as the documentation says, it returns the total number of documents in the collection and doesn't take the query to restrict the count.

If I'm not wrong, there's not yet any EIP construct to use a "<WHILE>" style block, and the "<LOOP>" needs a predefined value that requires that the total number of data is evaluated before...

Any idea of some means to retrieve such an information ?

Thanks again.

Regards.
Ephemeris Lappis
Le 19/04/2014 03:27, Raul Kripalani [via Camel] a écrit :
There are unit tests that showcase this functionality of the component:

https://github.com/apache/camel/blob/e7563a7611667fb9b449d8a7f8c3fa7e3a0524bd/components/camel-mongodb/src/test/java/org/apache/camel/component/mongodb/MongoDbFindOperationTest.java#L90

I think we could enhance it anyway to enable returning the DBCursor
directly to the route, so you can then handle the result in any manner you
want, e.g. writing it to an OutputStream or whatever.

I've created [1] to track this new feature.

[1] https://issues.apache.org/jira/browse/CAMEL-7378

Regards,

*Raúl Kripalani*
Apache Camel PMC Member & Committer | Enterprise Architect, Open Source
Integration specialist
http://about.me/raulkripalani | http://www.linkedin.com/in/raulkripalani
http://blog.raulkr.net | twitter: @raulvk

On Fri, Apr 18, 2014 at 5:30 AM, Ephemeris Lappis <
[hidden email]> wrote:

> Hello.
>
> I have tried different options; like batch size; to evaluate some
> scenario to optimize some cases.
> But for cases with a really big volume of data, retrieving them all in
> memory always leads to an error.
>
> Our current case should be something as simple as :
> A first route :
> - receive a soap request from a web client with some kind of filter form
> to select data
> - push the xml request to an active queue, and send back a simple soap
> response
> A main route :
> - get back the xml request from the queue
> - make a json body to set the query from the xml request (10 of 15 lines
> of groovy for example)
> - set a header to select the needed collection's attributes
> - call mongo findAll
> - marshal the result to csv
> - write the result into a file.
> - send a mail to the caller to inform the job is done.
>
> This may be done with a very simple blueprint with very few lines and no
> complexity at all.
>
> Do you mean that the only way to process a big volume of Mongo data is
> to set up a more "smart" algorithm like :
> - build a first request to count the data.
> - loop  over the data set reading batch parts using "skip" and "page size"
> - write the paged results appending them to the file.
> - etc ?
>
> Have you an example of paging process ?
>
> Thanks for you help.
>
> Ephemeris Lappis
>
> Le 18/04/2014 02:52, Raul Kripalani [via Camel] a écrit :
> > Hi,
> >
> > We use Mongo cursors to read from the DB. But a DBCursor is not
> > something we can return to the route because not all technologies
> > support Streams, Cursors, Chunking, etc. For example, how would you go
> > about returning a DBCursor to a JMS endpoint?
> >
> > That's why we offer the skipping and limiting option so you can
> > perform pagination in such scenarios. You can also specify a batch
> > size. Take a look at the component page for further details.
> >
> > Hope that helps!
> > Raúl.
> >
> > > On 17 Apr 2014, at 15:41, Ephemeris Lappis <[hidden email]
> > </user/SendEmail.jtp?type=node&node=5750355&i=0>> wrote:
> > >
> > > Hello.
> > >
> > > After some tests, it seems that the Camel MongoDB "findAll"
> > operation tries
> > > to load all the matching queried data into memory before process
> > them. With
> > > collections whose content is about tens millions of documents, this
> > > naturally leads to OutOfMemoryErrors...
> > >
> > > Does this component may use cursors to read the input data and
> > stream them ?
> > >
> > > An idea ?
> > >
> > > Thanks in advance.
> > >
> > > Regards.
> > >
> > >
> > >
> > > --
> > > View this message in context:
> >
> http://camel.465427.n5.nabble.com/Does-Camel-MongoDB-use-cursors-on-findAll-tp5750352.html
> > > Sent from the Camel - Users mailing list archive at Nabble.com.
> >
> >
> > ------------------------------------------------------------------------
> > If you reply to this email, your message will be added to the
> > discussion below:
> >
> http://camel.465427.n5.nabble.com/Does-Camel-MongoDB-use-cursors-on-findAll-tp5750352p5750355.html
> >
> > To unsubscribe from Does Camel MongoDB use cursors on findAll ?, click
> > here
> > <
> >.
> > NAML
> > <
>
http://camel.465427.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml
> >
> >
>
>
>
>
>
> --
> View this message in context:
> http://camel.465427.n5.nabble.com/Does-Camel-MongoDB-use-cursors-on-findAll-tp5750352p5750357.html
> Sent from the Camel - Users mailing list archive at Nabble.com.
>



If you reply to this email, your message will be added to the discussion below:
http://camel.465427.n5.nabble.com/Does-Camel-MongoDB-use-cursors-on-findAll-tp5750352p5750367.html
To unsubscribe from Does Camel MongoDB use cursors on findAll ?, click here.
NAML