Office
 » Document processing
Sections : Office > Document processing
Click to RATE : OneTwoThreeFourFive
Buy Free Kea download

Kea 5.1

KEA is an algorithm for extracting keyphrases from text documents

 

Advertisement

Kea 5.1 facilities

Vendor
Vendor`s Webhttp://www.nzdl.org
Digital Libraries and Machine Learning Labs Computer Science Department The University of Waikato webshot
OSWindows XP, Windows Vista, Windows 7, Windows 8, Mac OS, Linux
Limitationsnot specified
Actualizedmore than year ago
Downloads559
LocalizedEnglish
LicenseFreeware
SnapshotKea snapshot
Snapshot of Kea

Use this security logo:

100% SAFE logo

Kea manufacturer description

Keywords and keyphrases (multi-word units) are widely used in large document collections. They describe the content of single documents and provide a kind of semantic metadata that is useful for a wide variety of purposes. The task of assigning keyphrases to a document is called keyphrase indexing. For example, academic papers are often accompanied by a set of keyphrases freely chosen by the author. In libraries professional indexers select keyphrases from a controlled vocabulary (also called Subject Headings) according to defined cataloguing rules. On the Internet, digital libraries, or any depositories of data (flickr, del.icio.us, blog articles etc.) also use keyphrases (or here called content tags or content labels) to organize and provide a thematic access to their data. Kea 5.1 is an algorithm for extracting keyphrases from text documents. It can be either used for free indexing or for indexing with a controlled vocabulary. 1. Documents - Kea Document processing gets a directory name and processes all documents in this directory that have the extension ".txt". The default language and the encoding is set to English, but this can be changed as long as a corresponding stopword file and a stemmer is provided. 2. Thesaurus - If a vocabulary is provided, Kea matches the documents` phrases against this file. For processing SKOS files stored as rdf files, Kea uses the Jena API. For free indexing, use the option "-v none". 3. Extracting Candidates - Here Kea extracts n-grams of a predefined length (e.g. 1 to 3 words) that do not start or end with a stopword. In controlled indexing, it only collects those n-grams that match thesaurus terms. If the thesaurus defines relations between non-allowed terms (non-descriptors) and allowed terms (descriptors), it replaces each descriptor by an equivalent non-descriptor.
In the above diagram, pseudo-phrase matching means removing stopwords from the phrase, and then stemming and ordering the remaining words. 4. Features - For each candidate phrase Kea computes 4 feature values:

  • TFxIDF is a measure describing the specificity of a term for this document under consideration, compared to all other documents in the corpus. Candidate phrases that have high TFxIDF value are more likely to be keyphrases.

  • First occurrence is computed as the percentage of the document preceeding the first occurrence of the term in the document. Terms that tend to appear at the start or at the end of a document are more likely to be keyphrases.

  • Length of a phrase is the number of its component words. Two-word phrases are usually preferred by human indexers.

  • Node degree of a candidate phrase is the number of phrases in the candidate set that are semantically related to this phrase. This is computed with the help of the thesaurus. Phrases with high degree are more likely to be keyphrases.


5. Building the model - Before being able to extract keyphrases from new documents, Kea download first needs to create a model that learns the extraction strategy from manually indexed documents. This means, for each document in the input directory there must be a file with the extension ".key" and the same name as the corresponding document. This file should contain manually assigned keyphrases, one per line.
Given the list of the candidate phrases (3.), Kea marks those that were manually assigned as positive example and all the rest as negative examples. By analyzing the feature values (4.) for positive and negative candidate phrases, a model is computed, which reflects the distribution of feature values for each phrase. 6. Extracting keyphrases - When extracting keyphrases from new documents, Kea takes the model (5.) and feature values for each candidate phrase and computes its probability of being a keyphrase. Phrases with the highest probabilities are selected into the final set of keyphrases. The user can specify the number of keyphrases that need to be selected.
free download Kea 5.1QUICK DOWNLOAD
Free download Kea 5.1
Free softwareFREE SOFTWARE

Freeware is computer software that is available for use at no cost or for an optional fee.

Advertisement

Kea video tutorials

Tutorial not found. Let know us about any useful video tutorial.

Kea categories

Keywords and keyphrases, The University of Waikato

Given stickers for Kea & download buttons

NOTE: move cursor over buttons to get html sources.
smarter
This award means that Kea is an Editor's pick.
Clean award
DownloadAtlas.com guarantees that Kea was tested by antivirus program and is absolutely clean, which means it does not contain any form of malware, including computer viruses, adware, trojans, spyware, rootkits, badware and other malicious and unwanted software.
Report view
kea-5.1_source.jar - CLEAN
kea-5.1_source.jar » ZIP » META-INF/MANIFEST.MF - CLEAN
kea-5.1_source.jar » ZIP » TestKea.class - CLEAN
kea-5.1_source.jar » ZIP » TestKea.java - CLEAN
kea-5.1_source.jar » ZIP » kea/filters/NumbersFilter.class - CLEAN
kea-5.1_source.jar » ZIP » kea/filters/NumbersFilter.java - CLEAN
kea-5.1_source.jar » ZIP » kea/filters/KEAFilter.class - CLEAN
kea-5.1_source.jar » ZIP » kea/filters/KEAFilter.java - CLEAN
kea-5.1_source.jar » ZIP » kea/filters/KEAPhraseFilter.class - CLEAN
kea-5.1_source.jar » ZIP » kea/filters/KEAPhraseFilter.java - CLEAN
kea-5.1_source.jar » ZIP » kea/main/KEAModelBuilder.class - CLEAN
kea-5.1_source.jar » ZIP » kea/main/KEAModelBuilder.java - CLEAN
kea-5.1_source.jar » ZIP » kea/main/KEAKeyphraseExtractor.class - CLEAN
kea-5.1_source.jar » ZIP » kea/main/KEAKeyphraseExtractor.java - CLEAN
kea-5.1_source.jar » ZIP » kea/stemmers/SpanishStemmerSB.class - CLEAN
kea-5.1_source.jar » ZIP » kea/stemmers/SpanishStemmerSB.java - CLEAN
kea-5.1_source.jar » ZIP » kea/stemmers/GermanStemmer.class - CLEAN
kea-5.1_source.jar » ZIP » kea/stemmers/GermanStemmer.java - CLEAN
kea-5.1_source.jar » ZIP » kea/stemmers/Stemmer.class - CLEAN
kea-5.1_source.jar » ZIP » kea/stemmers/Stemmer.java - CLEAN
kea-5.1_source.jar » ZIP » kea/stemmers/LovinsStemmer.class - CLEAN
kea-5.1_source.jar » ZIP » kea/stemmers/LovinsStemmer.java - CLEAN
kea-5.1_source.jar » ZIP » kea/stemmers/SpanishStemmer.class - CLEAN
kea-5.1_source.jar » ZIP » kea/stemmers/SpanishStemmer.java - CLEAN
kea-5.1_source.jar » ZIP » kea/stemmers/SremovalStemmer.class - CLEAN
kea-5.1_source.jar » ZIP » kea/stemmers/SremovalStemmer.java - CLEAN
kea-5.1_source.jar » ZIP » kea/stemmers/NoStemmer.class - CLEAN
kea-5.1_source.jar » ZIP » kea/stemmers/NoStemmer.java - CLEAN
kea-5.1_source.jar » ZIP » kea/stemmers/IteratedLovinsStemmer.class - CLEAN
kea-5.1_source.jar » ZIP » kea/stemmers/IteratedLovinsStemmer.java - CLEAN
kea-5.1_source.jar » ZIP » kea/stemmers/FrenchStemmer.class - CLEAN
kea-5.1_source.jar » ZIP » kea/stemmers/FrenchStemmer.java - CLEAN
kea-5.1_source.jar » ZIP » kea/stopwords/Stopwords.class - CLEAN
kea-5.1_source.jar » ZIP » kea/stopwords/Stopwords.java - CLEAN
kea-5.1_source.jar » ZIP » kea/stopwords/StopwordsEnglish.class - CLEAN
kea-5.1_source.jar » ZIP » kea/stopwords/StopwordsEnglish.java - CLEAN
kea-5.1_source.jar » ZIP » kea/stopwords/StopwordsFrench.class - CLEAN
kea-5.1_source.jar » ZIP » kea/stopwords/StopwordsFrench.java - CLEAN
kea-5.1_source.jar » ZIP » kea/stopwords/StopwordsSpanish.class - CLEAN
kea-5.1_source.jar » ZIP » kea/stopwords/StopwordsSpanish.java - CLEAN
kea-5.1_source.jar » ZIP » kea/stopwords/StopwordsGerman.class - CLEAN
kea-5.1_source.jar » ZIP » kea/stopwords/StopwordsGerman.java - CLEAN
kea-5.1_source.jar » ZIP » kea/util/Counter.class - CLEAN
kea-5.1_source.jar » ZIP » kea/util/Counter.java - CLEAN
kea-5.1_source.jar » ZIP » kea/vocab/Vocabulary.class - CLEAN
kea-5.1_source.jar » ZIP » kea/vocab/Vocabulary.java - CLEAN
Get from DownaloadAtlas.com
Get from DownaloadAtlas.com
Do you like Kea ? Move mouse cursor over the buttons and just copy one of those links, paste the code you copied right where you want on your blog or website.