Parsing unstructured Text in Camel

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Parsing unstructured Text in Camel

Jan Bernhardt
Hi Camel Users,


is there any component which helps me to parse plain text? Not JSON, XML or CSV.


My use case is that I receive an E-Mail with multiple keywords in the Subject as well as in the body.

I could not find any component that would help me to parse certain values from my multiline plaintext.


I need something like freemarker but the other way around. Getting the fulltext and parsing certain values from this text (for example with regex).


Any help would be much appreciated.


Many thanks

Jan
Reply | Threaded
Open this post in threaded view
|

Re: Parsing unstructured Text in Camel

Jean-Baptiste Onofré
Hi Jan

You can always use a custom processor for that.

Regards
JB⁣​

On Nov 25, 2016, 08:16, at 08:16, Jan Bernhardt <[hidden email]> wrote:

>Hi Camel Users,
>
>
>is there any component which helps me to parse plain text? Not JSON,
>XML or CSV.
>
>
>My use case is that I receive an E-Mail with multiple keywords in the
>Subject as well as in the body.
>
>I could not find any component that would help me to parse certain
>values from my multiline plaintext.
>
>
>I need something like freemarker but the other way around. Getting the
>fulltext and parsing certain values from this text (for example with
>regex).
>
>
>Any help would be much appreciated.
>
>
>Many thanks
>
>Jan
Reply | Threaded
Open this post in threaded view
|

AW: Parsing unstructured Text in Camel

Jan Matèrne (jhm)
In reply to this post by Jan Bernhardt
I dont think that there is such a component.

Unless you have validated the input you can't rely on a structure.
So I would write a simple bean which parses the text. E.g. using the regexp
you mentioned.


Jan

> is there any component which helps me to parse plain text? Not JSON,
> XML or CSV.
>
> My use case is that I receive an E-Mail with multiple keywords in the
> Subject as well as in the body.
>
> I could not find any component that would help me to parse certain
> values from my multiline plaintext.
>
> I need something like freemarker but the other way around. Getting the
> fulltext and parsing certain values from this text (for example with
> regex).

Reply | Threaded
Open this post in threaded view
|

Re: Parsing unstructured Text in Camel

Anton-2
It might be over-kill, but you could use Apache UIMA -
https://uima.apache.org/d/uima-as-current/apidocs/org/apache/uima/camel/UimaAsEndpoint.html

On Fri, Nov 25, 2016 at 11:33 AM, Jan Matèrne (jhm) <[hidden email]>
wrote:

> I dont think that there is such a component.
>
> Unless you have validated the input you can't rely on a structure.
> So I would write a simple bean which parses the text. E.g. using the regexp
> you mentioned.
>
>
> Jan
>
> > is there any component which helps me to parse plain text? Not JSON,
> > XML or CSV.
> >
> > My use case is that I receive an E-Mail with multiple keywords in the
> > Subject as well as in the body.
> >
> > I could not find any component that would help me to parse certain
> > values from my multiline plaintext.
> >
> > I need something like freemarker but the other way around. Getting the
> > fulltext and parsing certain values from this text (for example with
> > regex).
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Parsing unstructured Text in Camel

souciance
Hi,

Actually there is no tool that can handle any unstructured data unless you want to put everything in some some sort of nosql database och run queries against it. Otherwise even integration software with advanced mapping capabilities like IBM's IIB 10 requires you to describe the structure of the data if it is not XML or JSON. Off course this applies only if you are interested in the content. If you just want to transfer data then there is no need. The big problem is that the open source world is lacking this kind of advanced mapping capability. In IIB 10 you can pretty much describe any kind of textual format and as long as you can describe it it will parse it.

Best
Souciance

On Fri, Nov 25, 2016 at 9:26 PM, Anton-2 [via Camel] <[hidden email]> wrote:
It might be over-kill, but you could use Apache UIMA -
https://uima.apache.org/d/uima-as-current/apidocs/org/apache/uima/camel/UimaAsEndpoint.html

On Fri, Nov 25, 2016 at 11:33 AM, Jan Matèrne (jhm) <[hidden email]>
wrote:

> I dont think that there is such a component.
>
> Unless you have validated the input you can't rely on a structure.
> So I would write a simple bean which parses the text. E.g. using the regexp
> you mentioned.
>
>
> Jan
>
> > is there any component which helps me to parse plain text? Not JSON,
> > XML or CSV.
> >
> > My use case is that I receive an E-Mail with multiple keywords in the
> > Subject as well as in the body.
> >
> > I could not find any component that would help me to parse certain
> > values from my multiline plaintext.
> >
> > I need something like freemarker but the other way around. Getting the
> > fulltext and parsing certain values from this text (for example with
> > regex).
>
>



If you reply to this email, your message will be added to the discussion below:
http://camel.465427.n5.nabble.com/Parsing-unstructured-Text-in-Camel-tp5790513p5790649.html
To start a new topic under Camel - Users, email [hidden email]
To unsubscribe from Camel - Users, click here.
NAML

Reply | Threaded
Open this post in threaded view
|

Re: Parsing unstructured Text in Camel

Anton-2
On Sat, Nov 26, 2016 at 12:23 AM, souciance <
[hidden email]> wrote:

>
> Actually there is no tool that can handle any unstructured data


That is not correct.

https://uima.apache.org/doc-uima-why.html
Reply | Threaded
Open this post in threaded view
|

Re: Parsing unstructured Text in Camel

souciance
UIMA seems quit similar conceptually to how the IIB mapping framework works where you provide  a message model and work with that. How does UIMA work with data that does not necessarily with a particular model at all times?

On Sat, Nov 26, 2016 at 7:22 AM, Anton-2 [via Camel] <[hidden email]> wrote:
On Sat, Nov 26, 2016 at 12:23 AM, souciance <
[hidden email]> wrote:

>
> Actually there is no tool that can handle any unstructured data


That is not correct.

https://uima.apache.org/doc-uima-why.html



If you reply to this email, your message will be added to the discussion below:
http://camel.465427.n5.nabble.com/Parsing-unstructured-Text-in-Camel-tp5790513p5790652.html
To start a new topic under Camel - Users, email [hidden email]
To unsubscribe from Camel - Users, click here.
NAML

Reply | Threaded
Open this post in threaded view
|

Re: Parsing unstructured Text in Camel

Anton-2
On Sat, Nov 26, 2016 at 12:29 PM, souciance <
[hidden email]> wrote:

> How does UIMA work
> with data that does not necessarily with a particular model at all times?
>

UIMA was donated to the Apache Foundation by IBM. UIMA is the framework
that powers IBM Watson.
As to how it extracts knowledge from unstructured data, it uses A Common
Analysis Structure(CAS), which is a way of defining Analysis Engines.
Typically these are text based NLP process but can also be audio and video.

UIMA is a big topic. It has a strong and helpful community behind it.
Reply | Threaded
Open this post in threaded view
|

Re: Parsing unstructured Text in Camel

souciance
Is UIMA useful as a tool for processing basic CSV files as well as complicated EDIFACT data? Or is it meant to be applied on other types of unstructured data?

On Sat, Nov 26, 2016 at 1:03 PM, Anton-2 [via Camel] <[hidden email]> wrote:
On Sat, Nov 26, 2016 at 12:29 PM, souciance <
[hidden email]> wrote:

> How does UIMA work
> with data that does not necessarily with a particular model at all times?
>

UIMA was donated to the Apache Foundation by IBM. UIMA is the framework
that powers IBM Watson.
As to how it extracts knowledge from unstructured data, it uses A Common
Analysis Structure(CAS), which is a way of defining Analysis Engines.
Typically these are text based NLP process but can also be audio and video.

UIMA is a big topic. It has a strong and helpful community behind it.



If you reply to this email, your message will be added to the discussion below:
http://camel.465427.n5.nabble.com/Parsing-unstructured-Text-in-Camel-tp5790513p5790669.html
To start a new topic under Camel - Users, email [hidden email]
To unsubscribe from Camel - Users, click here.
NAML

Reply | Threaded
Open this post in threaded view
|

Re: Parsing unstructured Text in Camel

Anton-2
UIMA can work with any unstructured data.

On Nov 26, 2016 1:27 PM, "souciance" <[hidden email]>
wrote:

> Is UIMA useful as a tool for processing basic CSV files as well as
> complicated EDIFACT data? Or is it meant to be applied on other types of
> unstructured data?
>
> On Sat, Nov 26, 2016 at 1:03 PM, Anton-2 [via Camel] <
> [hidden email]> wrote:
>
> > On Sat, Nov 26, 2016 at 12:29 PM, souciance <
> > [hidden email] <http:///user/SendEmail.jtp?type=node&node=5790669&i=0>>
> > wrote:
> >
> > > How does UIMA work
> > > with data that does not necessarily with a particular model at all
> > times?
> > >
> >
> > UIMA was donated to the Apache Foundation by IBM. UIMA is the framework
> > that powers IBM Watson.
> > As to how it extracts knowledge from unstructured data, it uses A Common
> > Analysis Structure(CAS), which is a way of defining Analysis Engines.
> > Typically these are text based NLP process but can also be audio and
> > video.
> >
> > UIMA is a big topic. It has a strong and helpful community behind it.
> >
> >
> > ------------------------------
> > If you reply to this email, your message will be added to the discussion
> > below:
> > http://camel.465427.n5.nabble.com/Parsing-unstructured-Text-
> > in-Camel-tp5790513p5790669.html
> > To start a new topic under Camel - Users, email
> > [hidden email]
> > To unsubscribe from Camel - Users, click here
> > <http://camel.465427.n5.nabble.com/template/NamlServlet.jtp?macro=
> unsubscribe_by_code&node=465428&code=c291Y2lhbmNlLmVxZGFtLnJhc2h0aU
> BnbWFpbC5jb218NDY1NDI4fDE1MzI5MTE2NTY=>
> > .
> > NAML
> > <http://camel.465427.n5.nabble.com/template/NamlServlet.jtp?macro=macro_
> viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.
> BasicNamespace-nabble.view.web.template.NabbleNamespace-
> nabble.view.web.template.NodeNamespace&breadcrumbs=
> notify_subscribers%21nabble%3Aemail.naml-instant_emails%
> 21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
> >
>
>
>
>
> --
> View this message in context: http://camel.465427.n5.nabble.
> com/Parsing-unstructured-Text-in-Camel-tp5790513p5790670.html
> Sent from the Camel - Users mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

AW: Parsing unstructured Text in Camel

Jan Bernhardt
In reply to this post by Jean-Baptiste Onofré
Hi JB,

I know self-coding is always possible, I was just wondering if there is an easier way. For example logstash provides a grok parser for this:
https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html

I was wondering if camel provides something similar or if it would be a good idea to add a camel-grok component.

Best regards
Jan

> -----Ursprüngliche Nachricht-----
> Von: Jean-Baptiste Onofré [mailto:[hidden email]]
> Gesendet: Freitag, 25. November 2016 08:23
> An: [hidden email]
> Cc: [hidden email]
> Betreff: Re: Parsing unstructured Text in Camel
>
> Hi Jan
>
> You can always use a custom processor for that.
>
> Regards
> JB⁣​
>
> On Nov 25, 2016, 08:16, at 08:16, Jan Bernhardt <[hidden email]>
> wrote:
> >Hi Camel Users,
> >
> >
> >is there any component which helps me to parse plain text? Not JSON,
> >XML or CSV.
> >
> >
> >My use case is that I receive an E-Mail with multiple keywords in the
> >Subject as well as in the body.
> >
> >I could not find any component that would help me to parse certain
> >values from my multiline plaintext.
> >
> >
> >I need something like freemarker but the other way around. Getting the
> >fulltext and parsing certain values from this text (for example with
> >regex).
> >
> >
> >Any help would be much appreciated.
> >
> >
> >Many thanks
> >
> >Jan
Reply | Threaded
Open this post in threaded view
|

Re: Parsing unstructured Text in Camel

Claus Ibsen-2
Hi

No there is no grok component in Camel. It would be really nice to
have, so you are welcome to log a JIRA ticket and help work on such a
component. We love contributions
http://camel.apache.org/contributing

I guess we can try to see if we can use the grok parser for elasticsearch
https://github.com/elastic/elasticsearch/tree/master/modules/ingest-common/src/main/java/org/elasticsearch/ingest/common





On Mon, Nov 28, 2016 at 9:03 AM, Jan Bernhardt <[hidden email]> wrote:

> Hi JB,
>
> I know self-coding is always possible, I was just wondering if there is an easier way. For example logstash provides a grok parser for this:
> https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html
>
> I was wondering if camel provides something similar or if it would be a good idea to add a camel-grok component.
>
> Best regards
> Jan
>
>> -----Ursprüngliche Nachricht-----
>> Von: Jean-Baptiste Onofré [mailto:[hidden email]]
>> Gesendet: Freitag, 25. November 2016 08:23
>> An: [hidden email]
>> Cc: [hidden email]
>> Betreff: Re: Parsing unstructured Text in Camel
>>
>> Hi Jan
>>
>> You can always use a custom processor for that.
>>
>> Regards
>> JB⁣
>>
>> On Nov 25, 2016, 08:16, at 08:16, Jan Bernhardt <[hidden email]>
>> wrote:
>> >Hi Camel Users,
>> >
>> >
>> >is there any component which helps me to parse plain text? Not JSON,
>> >XML or CSV.
>> >
>> >
>> >My use case is that I receive an E-Mail with multiple keywords in the
>> >Subject as well as in the body.
>> >
>> >I could not find any component that would help me to parse certain
>> >values from my multiline plaintext.
>> >
>> >
>> >I need something like freemarker but the other way around. Getting the
>> >fulltext and parsing certain values from this text (for example with
>> >regex).
>> >
>> >
>> >Any help would be much appreciated.
>> >
>> >
>> >Many thanks
>> >
>> >Jan



--
Claus Ibsen
-----------------
http://davsclaus.com @davsclaus
Camel in Action 2: https://www.manning.com/ibsen2
Reply | Threaded
Open this post in threaded view
|

Re: Parsing unstructured Text in Camel

Andrea Cosentino-2
+1 for a Grok component :-)
 --
Andrea Cosentino
----------------------------------
Apache Camel PMC Member
Apache Karaf Committer
Apache Servicemix Committer
Email: [hidden email]
Twitter: @oscerd2
Github: oscerd



On Monday, November 28, 2016 9:29 AM, Claus Ibsen <[hidden email]> wrote:
Hi

No there is no grok component in Camel. It would be really nice to
have, so you are welcome to log a JIRA ticket and help work on such a
component. We love contributions
http://camel.apache.org/contributing

I guess we can try to see if we can use the grok parser for elasticsearch
https://github.com/elastic/elasticsearch/tree/master/modules/ingest-common/src/main/java/org/elasticsearch/ingest/common






On Mon, Nov 28, 2016 at 9:03 AM, Jan Bernhardt <[hidden email]> wrote:

> Hi JB,
>
> I know self-coding is always possible, I was just wondering if there is an easier way. For example logstash provides a grok parser for this:
> https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html
>
> I was wondering if camel provides something similar or if it would be a good idea to add a camel-grok component.
>
> Best regards
> Jan
>
>> -----Ursprüngliche Nachricht-----
>> Von: Jean-Baptiste Onofré [mailto:[hidden email]]
>> Gesendet: Freitag, 25. November 2016 08:23
>> An: [hidden email]
>> Cc: [hidden email]
>> Betreff: Re: Parsing unstructured Text in Camel
>>
>> Hi Jan
>>
>> You can always use a custom processor for that.
>>
>> Regards
>> JB⁣
>>
>> On Nov 25, 2016, 08:16, at 08:16, Jan Bernhardt <[hidden email]>
>> wrote:
>> >Hi Camel Users,
>> >
>> >
>> >is there any component which helps me to parse plain text? Not JSON,
>> >XML or CSV.
>> >
>> >
>> >My use case is that I receive an E-Mail with multiple keywords in the
>> >Subject as well as in the body.
>> >
>> >I could not find any component that would help me to parse certain
>> >values from my multiline plaintext.
>> >
>> >
>> >I need something like freemarker but the other way around. Getting the
>> >fulltext and parsing certain values from this text (for example with
>> >regex).
>> >
>> >
>> >Any help would be much appreciated.
>> >
>> >
>> >Many thanks
>> >
>> >Jan



--
Claus Ibsen
-----------------
http://davsclaus.com @davsclaus
Camel in Action 2: https://www.manning.com/ibsen2 
Reply | Threaded
Open this post in threaded view
|

AW: Parsing unstructured Text in Camel

Jan Bernhardt
In reply to this post by Claus Ibsen-2
Jira is created. Not sure if I can find the time to implement this feature myself.

https://issues.apache.org/jira/browse/CAMEL-10540

Best regards
Jan

> -----Ursprüngliche Nachricht-----
> Von: Claus Ibsen [mailto:[hidden email]]
> Gesendet: Montag, 28. November 2016 09:29
> An: [hidden email]
> Betreff: Re: Parsing unstructured Text in Camel
>
> Hi
>
> No there is no grok component in Camel. It would be really nice to have, so
> you are welcome to log a JIRA ticket and help work on such a component. We
> love contributions http://camel.apache.org/contributing
>
> I guess we can try to see if we can use the grok parser for elasticsearch
> https://github.com/elastic/elasticsearch/tree/master/modules/ingest-
> common/src/main/java/org/elasticsearch/ingest/common
>
>
>
>
>
> On Mon, Nov 28, 2016 at 9:03 AM, Jan Bernhardt <[hidden email]>
> wrote:
> > Hi JB,
> >
> > I know self-coding is always possible, I was just wondering if there is an
> easier way. For example logstash provides a grok parser for this:
> > https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.
> > html
> >
> > I was wondering if camel provides something similar or if it would be a good
> idea to add a camel-grok component.
> >
> > Best regards
> > Jan
> >
> >> -----Ursprüngliche Nachricht-----
> >> Von: Jean-Baptiste Onofré [mailto:[hidden email]]
> >> Gesendet: Freitag, 25. November 2016 08:23
> >> An: [hidden email]
> >> Cc: [hidden email]
> >> Betreff: Re: Parsing unstructured Text in Camel
> >>
> >> Hi Jan
> >>
> >> You can always use a custom processor for that.
> >>
> >> Regards
> >> JB⁣
> >>
> >> On Nov 25, 2016, 08:16, at 08:16, Jan Bernhardt
> >> <[hidden email]>
> >> wrote:
> >> >Hi Camel Users,
> >> >
> >> >
> >> >is there any component which helps me to parse plain text? Not JSON,
> >> >XML or CSV.
> >> >
> >> >
> >> >My use case is that I receive an E-Mail with multiple keywords in
> >> >the Subject as well as in the body.
> >> >
> >> >I could not find any component that would help me to parse certain
> >> >values from my multiline plaintext.
> >> >
> >> >
> >> >I need something like freemarker but the other way around. Getting
> >> >the fulltext and parsing certain values from this text (for example
> >> >with regex).
> >> >
> >> >
> >> >Any help would be much appreciated.
> >> >
> >> >
> >> >Many thanks
> >> >
> >> >Jan
>
>
>
> --
> Claus Ibsen
> -----------------
> http://davsclaus.com @davsclaus
> Camel in Action 2: https://www.manning.com/ibsen2