HDFS component seg numbering

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

HDFS component seg numbering

Dr. Martin Menzel
Hi,
wouldn't it make sence to number the seg1 seg2 files like hadoop numbers
the parts

i.e.

seg0000001
seg0000002

etc.

Further it would make sence for me to be able to put some date / timestamp
part in the base path, so that for example every day the seg counter is
reset and the files are written in one directory per day.

On hadoop side a map reduce job can than merge the part files together if
they are too small, while respecting the order just by sorting them
alphabetically.

Regards

Martin
Reply | Threaded
Open this post in threaded view
|

Re: HDFS component seg numbering

Claus Ibsen-2
Hi

Yeah sounds like a good idea. Not sure how easy it is with resetting
the counter. As you would need to initialize, on startup, and see how
many files are there already so you avoid a naming clash.

We love contributions. So feel free to log a JIRA and work on patch.
http://camel.apache.org/contributing



On Thu, Feb 20, 2014 at 2:17 PM, Martin Menzel <[hidden email]> wrote:

> Hi,
> wouldn't it make sence to number the seg1 seg2 files like hadoop numbers
> the parts
>
> i.e.
>
> seg0000001
> seg0000002
>
> etc.
>
> Further it would make sence for me to be able to put some date / timestamp
> part in the base path, so that for example every day the seg counter is
> reset and the files are written in one directory per day.
>
> On hadoop side a map reduce job can than merge the part files together if
> they are too small, while respecting the order just by sorting them
> alphabetically.
>
> Regards
>
> Martin



--
Claus Ibsen
-----------------
Red Hat, Inc.
Email: [hidden email]
Twitter: davsclaus
Blog: http://davsclaus.com
Author of Camel in Action: http://www.manning.com/ibsen
Make your Camel applications look hawt, try: http://hawt.io