Quartz job data deletion in clustered quartz2

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

Quartz job data deletion in clustered quartz2

lakshmi.prashant
This post was updated on .
Hi,

  While using camel quartz2 in clustered mode, the job data is not deleted from Quartz DB, when we un-deploy the bundles.
 
Due to the above, when we try to re-deploy the bundles (or) stop & start the cluster, we encounter errors:

a) After the camel blueprint bundle is un-deployed, we get the below error (as the quartz entries remain while the actual camel job instance has been undeployed):

        Failed to execute CamelJob.org.quartz.JobExecutionException: No CamelContext could be found with name: 621-Quartz2_Mig_Test
        at org.apache.camel.component.quartz2.CamelJob.getCamelContext(CamelJob.java:77)

b) If we try to re-deploy the same bundle again, we are unable to do so & get an error on many occasions:

  org.quartz.ObjectAlreadyExistsException: Unable to store Trigger with name: 'myTimerName5' and group: 'myGroup5', because one already exists with this identification.

c) On re-deployment: At times, the job data map was not updated and the job was running as per the old trigger data.

      I also noticed that the addJobInScheduler() in QuartzEndpoint.java adds the job, only if there is no existing trigger. Else, it updates the job data only. But why are we facing the above 2 issues?


d) Even if we delete the specific quartz entries from the Job_details, Trigger tables of quartz after un-deployment of the bundles, sometimes the other camel quartz2 routes (jobs / triggers)  that share that quartz scheduler instance also stop running permanently / misfire after the above deletion. Hence we have to use different quartz instances for each schedule (i.e. for each camel quartz2 route)

Exception trace:
CAMEL_CLUSTER_SCHEDULER_QuartzSchedulerThread##avatarcl#aq3appaq4t1#iflmap##An error occurred while scanning for the next triggers to fire.org.quartz.JobPersistenceException: Couldn't acquire next trigger: Unable to load class org.apache.camel.component.quartz2.CamelJob by any known loaders.      at org.quartz.impl.jdbcjobstore.JobStoreSupport.acquireNextTrigger(JobStoreSupport.java:2856)
       at org.quartz.impl.jdbcjobstore.JobStoreSupport$40.execute(JobStoreSupport.java:2759)
       at org.quartz.impl.jdbcjobstore.JobStoreSupport$40.execute(JobStoreSupport.java:2757)
       at org.quartz.impl.jdbcjobstore.JobStoreSupport.executeInNonManagedTXLock(JobStoreSupport.java:3795)
       at org.quartz.impl.jdbcjobstore.JobStoreSupport.acquireNextTriggers(JobStoreSupport.java:2757)
       at org.quartz.core.QuartzSchedulerThread.run(QuartzSchedulerThread.java:272)
Caused by: java.lang.ClassNotFoundException: Unable to load class org.apache.camel.component.quartz2.CamelJob by any known loaders.
       at org.quartz.simpl.CascadingClassLoadHelper.loadClass(CascadingClassLoadHelper.java:126)
       at org.quartz.simpl.CascadingClassLoadHelper.loadClass(CascadingClassLoadHelper.java:138)
       at org.quartz.impl.jdbcjobstore.StdJDBCDelegate.selectJobDetail(StdJDBCDelegate.java:852)
       at org.quartz.impl.jdbcjobstore.JobStoreSupport.acquireNextTrigger(JobStoreSupport.java:2824)
       ... 5 common frames omitted
Caused by: java.lang.IllegalStateException: Bundle "Quartz2_Mig_Test_5Min" has been uninstalled
       at org.eclipse.osgi.framework.internal.core.AbstractBundle.checkValid(AbstractBundle.java:1175)
       at org.eclipse.osgi.framework.internal.core.BundleHost.checkLoader(BundleHost.java:183)
       at org.eclipse.osgi.framework.internal.core.BundleHost.loadClass(BundleHost.java:225)
       at org.eclipse.osgi.framework.internal.core.AbstractBundle.loadClass(AbstractBundle.java:1212)
       at org.apache.camel.core.osgi.utils.BundleDelegatingClassLoader.findClass(BundleDelegatingClassLoader.java:47)
       at org.apache.camel.core.osgi.utils.BundleDelegatingClassLoader.loadClass(BundleDelegatingClassLoader.java:69)
       at java.lang.ClassLoader.loadClass(ClassLoader.java:415)


e) Why doesn't camel try to remove the Job / Job Data from quartz, when the routes are stopped (bundles are un- deployed) in the cluster? I know that the job details are stored for recovery when a cluster re-starts after a failure. But why do they remain after we have un-installed the camel quartz routes?

f) To circumvent this, we have tried to add a RoutePolicySupport class, that will try to delete the job data - on stopping of the routes..

g) Whenever the cluster is re-started, the bundles will be re-deployed / re-started.
When the camel quartz IFlow bundles get active, the quartz data will be re-created.

h) Route stop event will be triggered:

   a) When camel blueprint bundle is un-deployed
   b) When a cluster Node goes down & there are other nodes in the cluster
   c) When a cluster Node goes down & there are no more nodes in the cluster
   d) When cluster goes down / cluster is stopped, during a planned downtime.

We need to trigger the clean-up of quartz job data, during a, c & d only.
The flip side is: we need to check the quartz scheduler state to know if there are other nodes, after the associated check-in interval and delete the quartz data, only if there are no other nodes in the cluster.


i) Am I missing something here & have I missed out anything from the camel quartz 2 documentation?

As this is a generic issue, can this be achieved  easily with camel quartz2 endpoint configuration, without our custom route policy?
Please help.

j) My blueprint xml:beans_quartz2.xml

k) My basic question is:

 In org.apache.camel.component.quartz2.QuartzEndpoint.java: removeJobInScheduler()  removes the quartz job in the scheduler, only if it is not clustered.

   1. Then how is the issue of 'ObjectAlreadyExists' handled in clustered quartz on re-deployment / re-start of the quartz routes / bundles?

   2. If the bundle with the route is removed / undeployed, the route too will be stopped and terminated. what (routes) will the job data in DB try to trigger, as per the job schedule?

Thanks,
Lakshmi
Reply | Threaded
Open this post in threaded view
|

Re: Quartz job data deletion in clustered quartz2

lakshmi.prashant
This post was updated on .
Hi,

   We get many misfires, while quartz is working in clustered mode.

   This is when the trigger is acquired / executed on another VM than the one that inserted the job data:

   We get an error when the CamelJob in that VM gets executed for a trigger. The camel job tries to locate the camel context & the route, by looking up using the QUARTZ_CAMEL_CONTEXT_NAME in the Quartz schedulerContext.
 
    No CamelContext could be found with name: 572-Quartz2_Mig_Test1
       at org.apache.camel.component.quartz2.CamelJob.getCamelContext(CamelJob.java:77)
       at org.apache.camel.component.quartz2.CamelJob.execute(CamelJob.java:48)
       at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
       at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)

I think that the QUARTZ_CAMEL_CONTEXT_NAME  is stored by the QuartzComponent using the DefaultManagementNameStrategy & stored while creating the Quartz schedulerContext & this also gets set in the Job datamap.

The prefix of the camelcontext name is generated using a counter and may not be the same in all VM's.
Hence, if any other VM in the cluster gets the trigger callback to execute the CamelJob, it throws the above error.

Is this a known issue - can someone kindly tell me how to get this resolved?

Thanks,
Lakshmi

Reply | Threaded
Open this post in threaded view
|

Re: Quartz job data deletion in clustered quartz2

Willem.Jiang
Administrator
Hi,

Can you specify the camel context name in your cluster environment?

--  
Willem Jiang

Red Hat, Inc.
Web: http://www.redhat.com
Blog: http://willemjiang.blogspot.com (English)
http://jnn.iteye.com (Chinese)
Twitter: willemjiang  
Weibo: 姜宁willem



On October 20, 2014 at 12:01:04 PM, lakshmi.prashant ([hidden email]) wrote:

> Hi,
>  
> We get many misfires, while quartz is working in clustered mode.
>  
> This is when the trigger is acquired / executed on another VM than the
> one that inserted the job data:
>  
> We get an error while is CamelJob in that VM gets executed for a trigger.
> The camel job tries to locate the camel context & the route, by looking up
> using the QUARTZ_CAMEL_CONTEXT_NAME in the Quartz schedulerContext.
>  
> No CamelContext could be found with name: *572-Quartz2_Mig_Test1*
> at
> org.apache.camel.component.quartz2.CamelJob.getCamelContext(CamelJob.java:77)  
> at
> org.apache.camel.component.quartz2.CamelJob.execute(CamelJob.java:48)
> at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
> at
> org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)  
>  
> I think that the QUARTZ_CAMEL_CONTEXT_NAME is stored by the QuartzComponent
> using the DefaultManagementNameStrategy & stored while creating the Quartz
> schedulerContext & this also gets set in the Job datamap.
>  
> The prefix of the camelcontext name is generated using a counter and may not
> be the same in all VM's.
> Hence, if any other VM in the cluster gets the trigger callback to execute
> the CamelJob, it throws the above error.
>  
> Is this a known issue - can someone kindly tell me how to get this resolved?
>  
> Thanks,
> Lakshmi
>  
>  
>  
>  
>  
> --
> View this message in context: http://camel.465427.n5.nabble.com/Quartz-job-data-deletion-in-clustered-quartz2-tp5757508p5757783.html 
> Sent from the Camel - Users mailing list archive at Nabble.com.
>  

Reply | Threaded
Open this post in threaded view
|

Re: Quartz job data deletion in clustered quartz2

lakshmi.prashant
Hi Willem,
 Quartz2_Mig_Test1 is the camelcontext id that we set in our blueprint xml configuration. I had earlier attached the beans.xml in my earlier message for reference.

<camel:camelContext id="Quartz2_Mig_Test1" streamCache="true">

 Camel calculates the name for the camel context by calling getName() of DefaultManagementNameStrategy in line no. 76 of addTrigger() method in QuartzEndpoint.java.

Thanks,
Lakshmi
Reply | Threaded
Open this post in threaded view
|

Re: Quartz job data deletion in clustered quartz2

lakshmi.prashant
This post was updated on .
In reply to this post by Willem.Jiang
We are setting the camel Context id in the blueprint xml and have deployed it to the osgi environment.

Eg: <camel:camelContext id="Quartz2_Mig_Test1" streamCache="true">

Then we get misfires when other VM's in the cluster try to do load balancing of the trigger :

 No CamelContext could be found with name: *572-Quartz2_Mig_Test1* .

Why is the osgi bundle id (572) being appended to the camelContext id to generate the name?
If the OSGI bundle id is different for the deployed route bundle in the different VM's, we are getting misfires when those VM's acquire the triggers & read the job data from DB.

We need a way in which the same name / key is used to store / look-up a specific camel context / Timer route across VM's.

a) In createScheduler() of QuartzComponent.java, the camelContext is stored against the camelcontext name derived as above.

b) Hence, whenever the derived camel context name is different in different VM's (or) if the route bundle is re-deployed, the camel context stored in the scheduler context (in memory) is having a name different from the camel context name stored in DB as part of the Job Data map.

c) This results in misfires due to ' No CamelContext could be found with name: *572-Quartz2_Mig_Test1*' in the above 2 scenarios.

Thanks,
Lakshmi
Reply | Threaded
Open this post in threaded view
|

Re: Quartz job data deletion in clustered quartz2

Willem.Jiang
Administrator
We need to do some addition work to let clustered quartz endpoint share the same camel context id. I just created a JIRA[1] for it.


[1]https://issues.apache.org/jira/browse/CAMEL-7947

--  
Willem Jiang

Red Hat, Inc.
Web: http://www.redhat.com
Blog: http://willemjiang.blogspot.com (English)
http://jnn.iteye.com (Chinese)
Twitter: willemjiang  
Weibo: 姜宁willem



On October 28, 2014 at 1:56:35 PM, lakshmi.prashant ([hidden email]) wrote:

> We are setting the camel Context id in the blueprint xml and have deployed it
> to the osgi environment.
>  
> Eg:  
>  
> Then we get misfires when other VM's in the cluster try to do load balancing
> of the trigger :
>  
> No CamelContext could be found with name: *572-Quartz2_Mig_Test1* .
>  
> Why is the osgi bundle id (572) being appended to the camelContext id to
> generate the name?
> If the OSGI bundle id is different for the deployed route bundle in the
> different VM's, we are getting misfires when those VM's acquire the triggers
> & read the job data from DB.
>  
> We need a way in which the same name / key is used to store / look-up a
> specific camel context / Timer route across VM's.
>  
> a) In createScheduler() of QuartzComponent.java, the camelContext is stored
> against the camelcontext name derived as above.
>  
> b) Hence, whenever the derived camel context name is different in different
> VM's (or) if the route bundle is re-deployed, the camel context stored in
> the scheduler context (in memory) is different from the camel context stored
> in DB as part of the Job Data map.
>  
> c) This results in misfires due to ' No CamelContext could be found with
> name: *572-Quartz2_Mig_Test1*' in the above 2 scenarios.
>  
> Thanks,
> Lakshmi
>  
>  
>  
> --
> View this message in context: http://camel.465427.n5.nabble.com/Quartz-job-data-deletion-in-clustered-quartz2-tp5757508p5758166.html 
> Sent from the Camel - Users mailing list archive at Nabble.com.
>  

Reply | Threaded
Open this post in threaded view
|

Re: Quartz job data deletion in clustered quartz2

Claus Ibsen-2
In reply to this post by lakshmi.prashant
Hi

You can configure the jmx management name to not include the bundle
id, see details at
http://camel.apache.org/camel-jmx

On Tue, Oct 28, 2014 at 6:56 AM, lakshmi.prashant
<[hidden email]> wrote:

> We are setting the camel Context id in the blueprint xml and have deployed it
> to the osgi environment.
>
> Eg: <camel:camelContext id="Quartz2_Mig_Test1" streamCache="true">
>
> Then we get misfires when other VM's in the cluster try to do load balancing
> of the trigger :
>
>  No CamelContext could be found with name: *572-Quartz2_Mig_Test1* .
>
> Why is the osgi bundle id (572) being appended to the camelContext id to
> generate the name?
> If the OSGI bundle id is different for the deployed route bundle in the
> different VM's, we are getting misfires when those VM's acquire the triggers
> & read the job data from DB.
>
> We need a way in which the same name / key is used to store / look-up a
> specific camel context / Timer route across VM's.
>
> a) In createScheduler() of QuartzComponent.java, the camelContext is stored
> against the camelcontext name derived as above.
>
> b) Hence, whenever the derived camel context name is different in different
> VM's (or) if the route bundle is re-deployed, the camel context stored in
> the scheduler context (in memory) is different from the camel context stored
> in DB as part of the Job Data map.
>
> c) This results in misfires due to ' No CamelContext could be found with
> name: *572-Quartz2_Mig_Test1*' in the above 2 scenarios.
>
> Thanks,
> Lakshmi
>
>
>
> --
> View this message in context: http://camel.465427.n5.nabble.com/Quartz-job-data-deletion-in-clustered-quartz2-tp5757508p5758166.html
> Sent from the Camel - Users mailing list archive at Nabble.com.



--
Claus Ibsen
-----------------
Red Hat, Inc.
Email: [hidden email]
Twitter: davsclaus
Blog: http://davsclaus.com
Author of Camel in Action: http://www.manning.com/ibsen
hawtio: http://hawt.io/
fabric8: http://fabric8.io/
Reply | Threaded
Open this post in threaded view
|

Re: Quartz job data deletion in clustered quartz2

lakshmi.prashant
This post was updated on .
Hi Claus,

  Thanks a lot. Adding managementNamePattern="#name#" to <camelcontext> in blueprint  XML seems to click.

  This resolved the 2 issues  with both re-deployment of the same bundle & also the load-balancing issue when the other VM's acquire the trigger & look up the camel context.

  We still have 1 pending issue that I reported: Sharing scheduler across camelcontexts with clustered quartz..

 a) If we expose the SchedulerFactory as a OSGI service and refer to the same Scheduler Instance across the blueprint bundles (to control the number of Quartz DB accesses related to check-in and failover):

  The CamelJob class gets uninstalled when we undeploy 1 route bundle. The rest of the quartz route bundles also stop firing, once we have undeployed 1 camel-quartz2 bundle:

'Caused by: java.lang.ClassNotFoundException: Unable to load class org.apache.camel.component.quartz2.CamelJob by any known loaders.'

This is not a blocker, but a performance issue.
Appreciate any help in resolving the above issue, as well.

Thanks,
Lakshmi
Reply | Threaded
Open this post in threaded view
|

Re: Quartz job data deletion in clustered quartz2

Claus Ibsen-2
Hi

Have you tried setting deleteJob=false
http://camel.apache.org/quartz2

On Wed, Oct 29, 2014 at 8:43 AM, lakshmi.prashant
<[hidden email]> wrote:

> Hi Claus,
>
>   Thanks a lot. Adding *managementNamePattern="#name#"* to <camelcontext> in
> blueprint  XML seems to click.
>
>   This resolved the 2 issues  with both re-deployment of the same bundle &
> also the load-balancing issue when the other VM's acquire the trigger & look
> up the camel context.
>
>   We still have 1 pending issue that I reported: Sharing scheduler across
> camelcontexts with clustered quartz..
>
>  a) If we expose the SchedulerFactory as a OSGI service and refer to the
> same Scheduler Instance across the blueprint bundles (to control the number
> of Quartz DB accesses):
>
>   The CamelJob class gets uninstalled when we undeploy 1 route bundle. The
> rest of the quartz route bundles also stop firing, once we have undeployed 1
> camel-quartz2 bundle:
>
> 'Caused by: java.lang.ClassNotFoundException: Unable to load class
> org.apache.camel.component.quartz2.CamelJob by any known loaders.'
>
> This is not a blocker, but a performance issue.
> Appreciate any help in resolving the above issue, as well.
>
> Thanks,
> Lakshmi
>
>
>
> --
> View this message in context: http://camel.465427.n5.nabble.com/Quartz-job-data-deletion-in-clustered-quartz2-tp5757508p5758277.html
> Sent from the Camel - Users mailing list archive at Nabble.com.



--
Claus Ibsen
-----------------
Red Hat, Inc.
Email: [hidden email]
Twitter: davsclaus
Blog: http://davsclaus.com
Author of Camel in Action: http://www.manning.com/ibsen
hawtio: http://hawt.io/
fabric8: http://fabric8.io/
Reply | Threaded
Open this post in threaded view
|

Re: Quartz job data deletion in clustered quartz2

lakshmi.prashant
This post was updated on .
Hi,

  That does not help.

  If we have a shared scheduler instance (by exposing the StdSchedulerFactory as a OSGi service) used by the different camel quartz components / routes, we face the following issue:

    After 1 camel quartz route is un-deployed & removed, the scheduler instance starts misfiring, due to ClassLoader issues in loading the CamelJob class.
    The other camel quartz routes / bundles sharing the scheduler instance start misfiring after the CamelJob class in the I camel route bundle gets uninstalled (when that bundle is undeployed).
    When the scheduler instance tries to acquire the next triggers & load the Job class, the Quartz CascadingClassLoaderHelper tries to remember the scheme that was last used to load the CamelJob class & reports the following exception.

Caused by: java.lang.ClassNotFoundException: Unable to load class org.apache.camel.component.quartz2.CamelJob by any known loaders.
       at org.quartz.simpl.CascadingClassLoadHelper.loadClass(CascadingClassLoadHelper.java:126)
       at org.quartz.simpl.CascadingClassLoadHelper.loadClass(CascadingClassLoadHelper.java:138)
       at org.quartz.impl.jdbcjobstore.StdJDBCDelegate.selectJobDetail(StdJDBCDelegate.java:852)
       at org.quartz.impl.jdbcjobstore.JobStoreSupport.acquireNextTrigger(JobStoreSupport.java:2824)
       ... 5 common frames omitted
Caused by: java.lang.IllegalStateException: Bundle "Quartz2_Camel_Test_5Min" has been uninstalled


  I have raised this issue in Quartz forum as well. If 1 job class bundle is undeployed & the associated job is still in jobstore, an exception is thrown & all other jobs / triggers associated with that scheduler also start to  misfire.



Currently, we have tried to overcome this behavior by listening to the Undeploy event of the OSGI bundle and deleting the quartz job data of that camel route, so that the scheduler will not again acquire the corresponding triggers again, after it is undeployed.


Thanks,
Lakshmi
Reply | Threaded
Open this post in threaded view
|

Re: Quartz job data deletion in clustered quartz2

Claus Ibsen-2
Hi

Yeah unfortunately class loading in OSGi and using 3rd party libraries
that are NOT osgi friendly is a challenge, and you can hit these kind
of issues here.

I am not sure if quartz offer an api where you can provide a custom
classloader, so we can better control this when the store want's to
load a class.



On Thu, Nov 6, 2014 at 12:58 PM, lakshmi.prashant
<[hidden email]> wrote:

> Hi,
>
>   That does not help.
>
>   If we have a shared scheduler instance (by exposing the
> StdSchedulerFactory as a OSGi service) used by the different camel quartz
> components / routes, we face the following issue:
>
>     After 1 camel quartz route is un-deployed & removed, the scheduler
> instance starts misfiring, due to ClassLoader issues in loading the CamelJob
> class.
>     The other camel quartz routes / bundles sharing the scheduler instance
> start misfiring after the CamelJob class in the I camel route bundle gets
> uninstalled (when that bundle is undeployed).
>     When the scheduler instance tries to acquire the next triggers & load
> the Job class, the Quartz CascadingClassLoaderHelper tries to remember the
> scheme that was last used to load the CamelJob class & reports the following
> exception.
>
> Caused by: java.lang.ClassNotFoundException: Unable to load class
> org.apache.camel.component.quartz2.CamelJob by any known loaders.
>        at
> org.quartz.simpl.CascadingClassLoadHelper.loadClass(CascadingClassLoadHelper.java:126)
>        at
> org.quartz.simpl.CascadingClassLoadHelper.loadClass(CascadingClassLoadHelper.java:138)
>        at
> org.quartz.impl.jdbcjobstore.StdJDBCDelegate.selectJobDetail(StdJDBCDelegate.java:852)
>        at
> org.quartz.impl.jdbcjobstore.JobStoreSupport.acquireNextTrigger(JobStoreSupport.java:2824)
>        ... 5 common frames omitted
> Caused by: java.lang.IllegalStateException: Bundle "Quartz2_Camel_Test_5Min"
> has been uninstalled
>
>
>   I have raised this issue in  Quartz forum
> <https://groups.google.com/forum/#!topic/quartz-scheduler/Ptek0hAhQJw>   as
> well.
>
>
>
>  Can you please let me know if I can configure in quartz.properties any
> recommended value for the org.quartz.scheduler.classLoadHelper.class, so
> that quartz will load the CamelJob class correctly, when multiple camel
> quartz bundles share the same scheduler instance...
>
> <http://camel.465427.n5.nabble.com/file/n5758609/Acquire_Triggers_After_Undeploy_Route2.png>
>
>
> Thanks,
> Lakshmi
>
>
>
> --
> View this message in context: http://camel.465427.n5.nabble.com/Quartz-job-data-deletion-in-clustered-quartz2-tp5757508p5758609.html
> Sent from the Camel - Users mailing list archive at Nabble.com.



--
Claus Ibsen
-----------------
Red Hat, Inc.
Email: [hidden email]
Twitter: davsclaus
Blog: http://davsclaus.com
Author of Camel in Action: http://www.manning.com/ibsen
hawtio: http://hawt.io/
fabric8: http://fabric8.io/
Reply | Threaded
Open this post in threaded view
|

Re: Quartz job data deletion in clustered quartz2

Claus Ibsen-2
Hi

I found an api, and logged a ticket
https://issues.apache.org/jira/browse/CAMEL-8020

On Sun, Nov 9, 2014 at 7:59 AM, Claus Ibsen <[hidden email]> wrote:

> Hi
>
> Yeah unfortunately class loading in OSGi and using 3rd party libraries
> that are NOT osgi friendly is a challenge, and you can hit these kind
> of issues here.
>
> I am not sure if quartz offer an api where you can provide a custom
> classloader, so we can better control this when the store want's to
> load a class.
>
>
>
> On Thu, Nov 6, 2014 at 12:58 PM, lakshmi.prashant
> <[hidden email]> wrote:
>> Hi,
>>
>>   That does not help.
>>
>>   If we have a shared scheduler instance (by exposing the
>> StdSchedulerFactory as a OSGi service) used by the different camel quartz
>> components / routes, we face the following issue:
>>
>>     After 1 camel quartz route is un-deployed & removed, the scheduler
>> instance starts misfiring, due to ClassLoader issues in loading the CamelJob
>> class.
>>     The other camel quartz routes / bundles sharing the scheduler instance
>> start misfiring after the CamelJob class in the I camel route bundle gets
>> uninstalled (when that bundle is undeployed).
>>     When the scheduler instance tries to acquire the next triggers & load
>> the Job class, the Quartz CascadingClassLoaderHelper tries to remember the
>> scheme that was last used to load the CamelJob class & reports the following
>> exception.
>>
>> Caused by: java.lang.ClassNotFoundException: Unable to load class
>> org.apache.camel.component.quartz2.CamelJob by any known loaders.
>>        at
>> org.quartz.simpl.CascadingClassLoadHelper.loadClass(CascadingClassLoadHelper.java:126)
>>        at
>> org.quartz.simpl.CascadingClassLoadHelper.loadClass(CascadingClassLoadHelper.java:138)
>>        at
>> org.quartz.impl.jdbcjobstore.StdJDBCDelegate.selectJobDetail(StdJDBCDelegate.java:852)
>>        at
>> org.quartz.impl.jdbcjobstore.JobStoreSupport.acquireNextTrigger(JobStoreSupport.java:2824)
>>        ... 5 common frames omitted
>> Caused by: java.lang.IllegalStateException: Bundle "Quartz2_Camel_Test_5Min"
>> has been uninstalled
>>
>>
>>   I have raised this issue in  Quartz forum
>> <https://groups.google.com/forum/#!topic/quartz-scheduler/Ptek0hAhQJw>   as
>> well.
>>
>>
>>
>>  Can you please let me know if I can configure in quartz.properties any
>> recommended value for the org.quartz.scheduler.classLoadHelper.class, so
>> that quartz will load the CamelJob class correctly, when multiple camel
>> quartz bundles share the same scheduler instance...
>>
>> <http://camel.465427.n5.nabble.com/file/n5758609/Acquire_Triggers_After_Undeploy_Route2.png>
>>
>>
>> Thanks,
>> Lakshmi
>>
>>
>>
>> --
>> View this message in context: http://camel.465427.n5.nabble.com/Quartz-job-data-deletion-in-clustered-quartz2-tp5757508p5758609.html
>> Sent from the Camel - Users mailing list archive at Nabble.com.
>
>
>
> --
> Claus Ibsen
> -----------------
> Red Hat, Inc.
> Email: [hidden email]
> Twitter: davsclaus
> Blog: http://davsclaus.com
> Author of Camel in Action: http://www.manning.com/ibsen
> hawtio: http://hawt.io/
> fabric8: http://fabric8.io/



--
Claus Ibsen
-----------------
Red Hat, Inc.
Email: [hidden email]
Twitter: davsclaus
Blog: http://davsclaus.com
Author of Camel in Action: http://www.manning.com/ibsen
hawtio: http://hawt.io/
fabric8: http://fabric8.io/
Reply | Threaded
Open this post in threaded view
|

Re: Quartz job data deletion in clustered quartz2

lakshmi.prashant
Hi Claus,

  There is a mis-communication - we need not have a special classloader helper, I think.

  The issue was because on the un-deployment of 1 camel blueprint bundle (with camel quartz2 route), the quartz job data is not deleted from db - if it is clustered quartz.

 Unfortunately, we do not want to delete the job data, when the route is stopped using RoutePolicySupport class, as the main intent from clustered quartz is job recovery.
  - The scheduler will be shut down (QuartzComponent: doStop()) if there are no more jobs (if the scheduler is not shared across camel context bundles) & it works fine.
  - But if the scheduler configuration / scheduler instance is shared across camel quartz routes / bundles, the scheduler continues to run.
  - When the scheduler acquires next trigger, the trigger related to undeployed bundle is also obtained & then it tries to execute that trigger by executing CamelJob class from uninstalled bundle, using CascadingClassLoaderHelper.
  - If it cannot load the class for that trigger, it throws exception and the rest of the triggers do not get executed at that time - So we get misfires.

Please refer to  line no. 876 in  org.quartz.impl.jdbcjobstore.StdJDBCDelegate.java - this quartz class throws exception, if job class is not loaded and does not proceed further.
     job.setJobClass(loadHelper.loadClass(rs  .getString(COL_JOB_CLASS)));

  1. I have written an osgi EventHandler service that will listen to 'bundle undeploy' events, that get published.
  2. If the osgi bundle related to camel quartz2 is undeployed, it will remove the corresponding job data from DB.

If this can be handled by camel quartz2, it will become simple for end-users.
 
a) There is an issue in camel QuartzEndpoint.java in addJobInScheduler(). We were getting misfires in some nodes of the cluster, due to below issue.

   a) If the trigger does not exist in DB, it tries to schedule the job
   b) But this is not an atomic transaction - After the call to find a trigger from DB  is made, some other node in the cluster could have created the trigger, resulting in ObjectAlreadyExistsException when call to schedule job is made
  c) Then misfires happen in that cluster node, as the Quartz component / camel context itself does not get started.

  private void addJobInScheduler() throws Exception {
        // Add or use existing trigger to/from scheduler
        Scheduler scheduler = getComponent().getScheduler();
        JobDetail jobDetail;
        Trigger trigger = scheduler.getTrigger(triggerKey);
        if (trigger == null) {
            jobDetail = createJobDetail();
            trigger = createTrigger(jobDetail);

            updateJobDataMap(jobDetail);

            // Schedule it now. Remember that scheduler might not be started it, but we can schedule now.
            try{
                    Date nextFireDate = scheduler.scheduleJob(jobDetail, trigger);
                    if (LOG.isInfoEnabled()) {
                        LOG.info("Job {} (triggerType={}, jobClass={}) is scheduled. Next fire date is {}",
                                 new Object[] {trigger.getKey(), trigger.getClass().getSimpleName(),
                                               jobDetail.getJobClass().getSimpleName(), nextFireDate});
                    }
            }
            catch(ObjectAlreadyExistsException e){
            //double-check if Some other VM might has already stored the job & trigger in clustered mode
            if(!(getComponent().isClustered())){            
            throw e;
            }
            else{
            trigger = scheduler.getTrigger(triggerKey);
            if(trigger==null){
             throw new SchedulerException("Trigger could not be found in quartz scheduler.");
            }
            }
            }

        } else {
            ensureNoDupTriggerKey();
        }

Can the above correction in QuartzComponent.java be made?

Thanks,
Lakshmi
Reply | Threaded
Open this post in threaded view
|

Re: Quartz job data deletion in clustered quartz2

Willem.Jiang
Administrator
Hi Lakshmi,

I just have some time to revisit the issue of clustered quartz2.

It’s the scheduler work to avoid triggers the job data when the bundle is stop, camel-quartz cannot monitor the OSGi bundle event for that.

But I think I can catch ObjectAlreadyExistsException to do a double check for it.

--  
Willem Jiang

Red Hat, Inc.
Web: http://www.redhat.com
Blog: http://willemjiang.blogspot.com (English)
http://jnn.iteye.com (Chinese)
Twitter: willemjiang  
Weibo: 姜宁willem



On November 10, 2014 at 12:34:50 PM, lakshmi.prashant ([hidden email]) wrote:

> Hi Claus,
>  
> There is a mis-communication - we need not have a special classloader
> helper, I think.
>  
> The issue was because on the un-deployment of 1 camel blueprint bundle
> (with camel quartz2 route),* the quartz job data is not deleted from db - if
> it is clustered quartz.*
>  
> Unfortunately, we do not want to delete the job data, when the route is
> stopped using RoutePolicySupport class, as the main intent from clustered
> quartz is job recovery.
> - The scheduler will be shut down (QuartzComponent: doStop()) if there are
> no more jobs (if the scheduler is not shared across camel context bundles) &
> it works fine.
> - But if the scheduler configuration / scheduler instance is shared across
> camel quartz routes / bundles, the scheduler continues to run.
> - When the scheduler acquires next trigger, the trigger related to
> undeployed bundle is also obtained & then it tries to execute that trigger
> by executing CamelJob class from uninstalled bundle, using
> CascadingClassLoaderHelper.
> - If it cannot load the class for that trigger, it throws exception and
> the rest of the triggers do not get executed at that time - So we get
> misfires.
>  
> Please refer to line no. 876 in
> org.quartz.impl.jdbcjobstore.StdJDBCDelegate.java - this quartz class throws  
> exception, if job class is not loaded and does not proceed further.
> job.setJobClass(loadHelper.loadClass(rs .getString(COL_JOB_CLASS)));
>  
> 1. I have written an osgi EventHandler service that will listen to 'bundle
> undeploy' events, that get published.
> 2. If the osgi bundle related to camel quartz2 is undeployed, it will
> remove the corresponding job data from DB.
>  
> If this can be handled by camel quartz2, it will become simple for
> end-users.
>  
> a) There is an issue in camel QuartzEndpoint.java in addJobInScheduler(). We
> were getting misfires in some nodes of the cluster, due to below issue.
>  
> a) If the trigger does not exist in DB, it tries to schedule the job
> b) But this is not an atomic transaction - After the call to find a
> trigger from DB is made, some other node in the cluster could have created
> the trigger, resulting in ObjectAlreadyExistsException when call to schedule
> job is made
> c) Then misfires happen in that cluster node, as the Quartz component /
> camel context itself does not get started.
>  
> private void addJobInScheduler() throws Exception {
> // Add or use existing trigger to/from scheduler
> Scheduler scheduler = getComponent().getScheduler();
> JobDetail jobDetail;
> Trigger trigger = scheduler.getTrigger(triggerKey);
> if (trigger == null) {
> jobDetail = createJobDetail();
> trigger = createTrigger(jobDetail);
>  
> updateJobDataMap(jobDetail);
>  
> // Schedule it now. Remember that scheduler might not be started
> it, but we can schedule now.
> try{
> Date nextFireDate = scheduler.scheduleJob(jobDetail, trigger);
> if (LOG.isInfoEnabled()) {
> LOG.info("Job {} (triggerType={}, jobClass={}) is
> scheduled. Next fire date is {}",
> new Object[] {trigger.getKey(),
> trigger.getClass().getSimpleName(),
>  
> jobDetail.getJobClass().getSimpleName(), nextFireDate});
> }
> }
> * catch(ObjectAlreadyExistsException e){
> //double-check if Some other VM might has already stored the
> job & trigger in clustered mode
> if(!(getComponent().isClustered())){
> throw e;
> }
> else{
> trigger = scheduler.getTrigger(triggerKey);
> if(trigger==null){
> throw new SchedulerException("Trigger could not be found in
> quartz scheduler.");
> }
> }
> }*
> } else {
> ensureNoDupTriggerKey();
> }
>  
> Can the above correction in QuartzComponent.java be made?
>  
> Thanks,
> Lakshmi
>  
>  
>  
> --
> View this message in context: http://camel.465427.n5.nabble.com/Quartz-job-data-deletion-in-clustered-quartz2-tp5757508p5758806.html 
> Sent from the Camel - Users mailing list archive at Nabble.com.
>  

Reply | Threaded
Open this post in threaded view
|

Re: Quartz job data deletion in clustered quartz2

lakshmi.prashant
Hi Willem,

 We are listening to the un-deployment event ourselves.

 1. Actually, if the job is deleted from any UI (that is used to schedule jobs) - that UI will have to take care to remove the job data from the scheduler.
 
 2. But, in the camel quartz scenarios, the jobs are created at the start of routes. Once the routes get removed (un-deployed), the quartz Job data will also have to be removed.

  Quartz scheduler will not automatically come to know of this. I had already raised an issue in quartz forum that they should make some improvement:

    (i.e.) Quartz scheduler has to handle the following situation while acquiring triggers to be run: if the job class related to any job is missing, it should remove that job data (or) log the issue & ignore that & continue processing the other triggers.

3. Unless the issue is resolved either in camel or in quartz, users of clustered camel quartz will continue to face this issue, as all camel quartz routes will stop running.

Thanks,
Lakshmi



Reply | Threaded
Open this post in threaded view
|

Re: Quartz job data deletion in clustered quartz2

Willem.Jiang
Administrator
camel-quartz component is managed the scheduler rightly if the scheduler is created by itself. But it is not his job to clean up job data if the scheduler is created from outside.

Quartz scheduler should be more resilient by keeping processing other triggers as you suggested.


--  
Willem Jiang

Red Hat, Inc.
Web: http://www.redhat.com
Blog: http://willemjiang.blogspot.com (English)
http://jnn.iteye.com (Chinese)
Twitter: willemjiang  
Weibo: 姜宁willem



On November 21, 2014 at 4:07:24 PM, lakshmi.prashant ([hidden email]) wrote:

> Hi Willem,
>  
> We are listening to the un-deployment event ourselves.
>  
> 1. Actually, if the job is deleted from any UI (that is used to schedule
> jobs) - that UI will have to take care to remove the job data from the
> scheduler.
>  
> 2. But, in the camel quartz scenarios, the jobs are created at the start of
> routes. Once the routes get removed (un-deployed), the quartz Job data will
> also have to be removed.
>  
> Quartz scheduler will not automatically come to know of this. I had
> already raised an issue in quartz forum that they should make some
> improvement:
>  
> (i.e.) Quartz scheduler has to handle the following situation while
> acquiring triggers to be run: if the job class related to any job is
> missing, it should remove that job data (or) log the issue & ignore that &
> continue processing the other triggers.
>  
> 3. Unless the issue is resolved either in camel or in quartz, users of
> clustered camel quartz will continue to face this issue, as all camel quartz
> routes will stop running.
>  
> Thanks,
> Lakshmi
>  
>  
>  
>  
>  
>  
>  
> --
> View this message in context: http://camel.465427.n5.nabble.com/Quartz-job-data-deletion-in-clustered-quartz2-tp5757508p5759412.html 
> Sent from the Camel - Users mailing list archive at Nabble.com.
>  

Reply | Threaded
Open this post in threaded view
|

Re: Quartz job data deletion in clustered quartz2

lakshmi.prashant
Hi Willem,

 Scheduler is not created from outside, but on deployment of camel blueprint bundles.
 But the scheduler jobs are not deleted on un-deployment of the camel bundles, in clustered mode and this is what needs to be handled - while taking care of durable jobs and job recovery'.

Thanks,
Lakshmi
Reply | Threaded
Open this post in threaded view
|

Re: Quartz job data deletion in clustered quartz2

Willem.Jiang
Administrator
Hi Lakshmi,

We did some work in camel-quartz2 recently such as reschedule the job for cluster[1], do you mind test them to see if it fix some of your cluster issue?

You can use Camel 2.14.1-SNAPSHOT for verification.

[1]https://issues.apache.org/jira/browse/CAMEL-7627

--  
Willem Jiang

Red Hat, Inc.
Web: http://www.redhat.com
Blog: http://willemjiang.blogspot.com (English)
http://jnn.iteye.com (Chinese)
Twitter: willemjiang  
Weibo: 姜宁willem



On November 24, 2014 at 12:37:37 PM, lakshmi.prashant ([hidden email]) wrote:

> Hi Willem,
>  
> Scheduler is not created from outside, but on deployment of camel blueprint
> bundles.
> But the scheduler jobs are not deleted on un-deployment of the camel
> bundles, in clustered mode and this is what needs to be handled - while
> taking care of durable jobs and job recovery'.
>  
> Thanks,
> Lakshmi
>  
>  
>  
> --
> View this message in context: http://camel.465427.n5.nabble.com/Quartz-job-data-deletion-in-clustered-quartz2-tp5757508p5759519.html 
> Sent from the Camel - Users mailing list archive at Nabble.com.
>