inProgressRepository Not clearing for items in idempotentRepository

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

inProgressRepository Not clearing for items in idempotentRepository

skelly
I'm attempting to consume messages from an FTP server using an idempotent repository to ensure that I do not re-download a file unless it has been modified.

Here is my (quite simple) camel configuration:
        <beans:bean id="downloadRepo" class="org.apache.camel.processor.idempotent.FileIdempotentRepository" >
                <beans:property name="fileStore" value="/tmp/.repo.txt"/>
                <beans:property name="cacheSize" value="25000"/>
                <beans:property name="maxFileStoreSize" value="1000000"/>
        </beans:bean>

        <camelContext trace="true" xmlns="http://camel.apache.org/schema/spring">
                <endpoint id="myFtpEndpoint" uri="ftp://me@localhost?password=****&binary=true&recursive=true&consumer.delay=15000&readLock=changed&passiveMode=true&noop=true&idempotentRepository=#downloadRepo&idempotentKey=$simple{file:name}-$simple{file:modified}" />
                <endpoint id="myFileEndpoint" uri="file:///tmp/files"/>

        <route>
            <from uri="ref:myFtpEndpoint" />
            <to uri="ref:myFileEndpoint" />
        </route>

When I start my application for the first time, all files are correctly downloaded from the FTP server and stored in the target directory, as well as recorded in the idempotent repo.

When I restart my application, all files are correctly detected as being in the idempotent repo already on the first poll of the FTP server, and are not re-downloaded:
13-11-04 16:52:10,811 TRACE [Camel (camel-1) thread #0 - ftp://me@localhost] org.apache.camel.component.file.remote.FtpConsumer: FtpFile[name=test1.txt, dir=false, file=true]
2013-11-04 16:52:10,811 TRACE [Camel (camel-1) thread #0 - ftp://me@localhost] org.apache.camel.component.file.remote.FtpConsumer: This consumer is idempotent and the file has been consumed before. Will skip this file: RemoteFile[test1.txt]

However, on all subsequent polls to the FTP server the idempotent check is short-circuited because the file is "in-progress":
2013-11-04 16:53:10,886 TRACE [Camel (camel-1) thread #0 - ftp://me@localhost] org.apache.camel.component.file.remote.FtpConsumer: FtpFile[name=test1.txt, dir=false, file=true]
2013-11-04 16:53:10,886 TRACE [Camel (camel-1) thread #0 - ftp://me@localhost] org.apache.camel.component.file.remote.FtpConsumer: Skipping as file is already in progress: test1.txt

I am using camel-ftp:2.11.1 (also observing same behavior with 2.12.1)  When I inspect the source code I notice two interesting things.
First, the GenericFileConsumer check that determines whether a file is already inProgress which is called from isValidFile() always adds the file to the inProgressRepository:
    protected boolean isInProgress(GenericFile<T> file) {
        String key = file.getAbsoluteFilePath();
        return !endpoint.getInProgressRepository().add(key);
    }

Second, if a file is determined to match an entry already present in the idempotent repository it is discarded (GenericFileConsumer.isValidFile() returns false).  This means it is never published to an exchange, and thus never reaches the code which would remove it from the inProgressRepository.

Since the inProgress check happens before the Idempotent Check, we will always short circuit after we get into the inprogress state, and the file will never actually be checked again.

Am I reading this code correctly?  Am I missing something here?  This seems like a bug in the implementation of the isInProgress(GenericFile<T> file) method to me.
Reply | Threaded
Open this post in threaded view
|

Re: inProgressRepository Not clearing for items in idempotentRepository

Claus Ibsen-2
Hi

Yeah sounds like a bug. Feel free to log a JIRA ticket

On Mon, Nov 4, 2013 at 11:05 PM, skelly <[hidden email]> wrote:

> I'm attempting to consume messages from an FTP server using an idempotent
> repository to ensure that I do not re-download a file unless it has been
> modified.
>
> Here is my (quite simple) camel configuration:
>         <beans:bean id="downloadRepo"
> class="org.apache.camel.processor.idempotent.FileIdempotentRepository" >
>                 <beans:property name="fileStore" value="/tmp/.repo.txt"/>
>                 <beans:property name="cacheSize" value="25000"/>
>                 <beans:property name="maxFileStoreSize" value="1000000"/>
>         </beans:bean>
>
>         <camelContext trace="true" xmlns="http://camel.apache.org/schema/spring">
>                 <endpoint id="myFtpEndpoint"
> uri="ftp://me@localhost?password=****&binary=true&recursive=true&consumer.delay=15000&readLock=changed&passiveMode=true&noop=true&idempotentRepository=#downloadRepo&idempotentKey=$simple{file:name}-$simple{file:modified}"
> />
>                 <endpoint id="myFileEndpoint" uri="file:///tmp/files"/>
>
>         <route>
>             <from uri="ref:myFtpEndpoint" />
>             <to uri="ref:myFileEndpoint" />
>         </route>
>
> When I start my application for the first time, all files are correctly
> downloaded from the FTP server and stored in the target directory, as well
> as recorded in the idempotent repo.
>
> When I restart my application, all files are correctly detected as being in
> the idempotent repo already on the first poll of the FTP server, and are not
> re-downloaded:
> 13-11-04 16:52:10,811 TRACE [Camel (camel-1) thread #0 - ftp://me@localhost]
> org.apache.camel.component.file.remote.FtpConsumer: FtpFile[name=test1.txt,
> dir=false, file=true]
> 2013-11-04 16:52:10,811 TRACE [Camel (camel-1) thread #0 -
> ftp://me@localhost] org.apache.camel.component.file.remote.FtpConsumer: This
> consumer is idempotent and the file has been consumed before. Will skip this
> file: RemoteFile[test1.txt]
>
> However, on all subsequent polls to the FTP server the idempotent check is
> short-circuited because the file is "in-progress":
> 2013-11-04 16:53:10,886 TRACE [Camel (camel-1) thread #0 -
> ftp://me@localhost] org.apache.camel.component.file.remote.FtpConsumer:
> FtpFile[name=test1.txt, dir=false, file=true]
> 2013-11-04 16:53:10,886 TRACE [Camel (camel-1) thread #0 -
> ftp://me@localhost] org.apache.camel.component.file.remote.FtpConsumer:
> Skipping as file is already in progress: test1.txt
>
> I am using camel-ftp:2.11.1 (also observing same behavior with 2.12.1)  When
> I inspect the source code I notice two interesting things.
> First, the GenericFileConsumer check that determines whether a file is
> already inProgress which is called from isValidFile() always adds the file
> to the inProgressRepository:
>     protected boolean isInProgress(GenericFile<T> file) {
>         String key = file.getAbsoluteFilePath();
>         return !endpoint.getInProgressRepository().add(key);
>     }
>
> Second, if a file is determined to match an entry already present in the
> idempotent repository it is discarded (GenericFileConsumer.isValidFile()
> returns false).  This means it is never published to an exchange, and thus
> never reaches the code which would remove it from the inProgressRepository.
>
> Since the inProgress check happens before the Idempotent Check, we will
> always short circuit after we get into the inprogress state, and the file
> will never actually be checked again.
>
> Am I reading this code correctly?  Am I missing something here?  This seems
> like a bug in the implementation of the isInProgress(GenericFile<T> file)
> method to me.
>
>
>
> --
> View this message in context: http://camel.465427.n5.nabble.com/inProgressRepository-Not-clearing-for-items-in-idempotentRepository-tp5742613.html
> Sent from the Camel - Users mailing list archive at Nabble.com.



--
Claus Ibsen
-----------------
Red Hat, Inc.
Email: [hidden email]
Twitter: davsclaus
Blog: http://davsclaus.com
Author of Camel in Action: http://www.manning.com/ibsen
Reply | Threaded
Open this post in threaded view
|

Re: inProgressRepository Not clearing for items in idempotentRepository

skelly
Thanks.  I've submitted a bug: https://issues.apache.org/jira/browse/CAMEL-6936

In the meantime, do you have any alternative recommendations for my requirements?  Basically, I want to consume files from an FTP server only if they are new or modified.  I guess I would need to roll my own filter for this which implements the idempotent behavior?
Reply | Threaded
Open this post in threaded view
|

Re: inProgressRepository Not clearing for items in idempotentRepository

skelly
I worked around this by removing the idempotent configuration and instead writing a GenericFileFilter which basically copies the idempotent repo's behavior.  This is working great.