text mining - Lucene Entity Extraction -


given finite dictionary of entity terms, i'm looking way entity extraction intelligent tagging using lucene. i've been able use lucene for:
- searching complex phrases fuzzyness
- highlighting results

however, 'm not aware how to:
-get accurate offsets of matched phrases
-do entity-specific annotaions per match(not tags every single hit)

i have tried using explain() method - gives terms in query got hit - not offsets of hit within original text.

has faced similar problem , willing share potential solution?

thank in advance help!

for offset, see question: how offset of term in lucene?

i don't quite understand second question. sounds me want data stored field though. data stored field:

topdocs results = searcher.search(query, filter, num); foreach (scoredoc result in results.scoredocs) {     document resultdoc = searcher.doc(result.doc);     string valoffield = resultdoc.get("my field"); } 

Comments

Popular posts from this blog

android - Spacing between the stars of a rating bar? -

html - Instapaper-like algorithm -

c# - How to execute a particular part of code asynchronously in a class -