Tuesday, February 12, 2013

Tutorial on Jazzy Spell Checker


Jazzy is a useful Java Open Source Spell Checker. This post is a tutorial on how to use it:

1.Download jazzy-core-0.5.2.jar from
 http://repo1.maven.org/maven2/net/sf/jazzy/jazzy-core/0.5.2/jazzy-core-0.5.2.jar and add it as a library to your project.


2. Create a folder with a dictionary.txt text file. The text file contains a list of English words, such as http://www.cs.princeton.edu/introcs/data/words.utf-8.txt or any other good word lists.






3. Copy the codes below with which to create JazzySpellChecker.java in a package in the project. Configure it on your own, and use the spell checker to tackle spelling errors.

package test;
package test;

import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;



import com.swabunga.spell.engine.SpellDictionaryHashMap;
import com.swabunga.spell.engine.Word;
import com.swabunga.spell.event.SpellCheckEvent;
import com.swabunga.spell.event.SpellCheckListener;
import com.swabunga.spell.event.SpellChecker;
import com.swabunga.spell.event.StringWordTokenizer;
import com.swabunga.spell.event.TeXWordFinder;

public class JazzySpellChecker implements SpellCheckListener {
 
 private SpellChecker spellChecker;
 private List misspelledWords;
 
 /**
  * get a list of misspelled words from the text
  * @param text
  */
 public List getMisspelledWords(String text) {
  StringWordTokenizer texTok = new StringWordTokenizer(text,
    new TeXWordFinder());
  spellChecker.checkSpelling(texTok);
  return misspelledWords;
 }
 
 private static SpellDictionaryHashMap dictionaryHashMap;
 
 static{
 
  File dict = new File("dictionary/dictionary.txt");
  try {
   dictionaryHashMap = new SpellDictionaryHashMap(dict);
  } catch (FileNotFoundException e) {
   e.printStackTrace();
  } catch (IOException e) {
   e.printStackTrace();
  }
 }
 
 private void initialize(){
   spellChecker = new SpellChecker(dictionaryHashMap);
   spellChecker.addSpellCheckListener(this);  
 }
 
 
 public JazzySpellChecker() {
  
  misspelledWords = new ArrayList();
  initialize();
 }

 /**
  * correct the misspelled words in the input string and return the result
  */
 public String getCorrectedLine(String line){
  List misSpelledWords = getMisspelledWords(line);
  
  for (String misSpelledWord : misSpelledWords){
   List suggestions = getSuggestions(misSpelledWord);
   if (suggestions.size() == 0)
    continue;
   String bestSuggestion = suggestions.get(0);
   line = line.replace(misSpelledWord, bestSuggestion);
  }
  return line;
 }
 
 public String getCorrectedText(String line){
  StringBuilder builder = new StringBuilder();
  String[] tempWords = line.split(" ");
  for (String tempWord : tempWords){
   if (!spellChecker.isCorrect(tempWord)){
    List suggestions = spellChecker.getSuggestions(tempWord, 0);
    if (suggestions.size() > 0){
     builder.append(spellChecker.getSuggestions(tempWord, 0).get(0).toString());
    }
    else
     builder.append(tempWord);
   }
   else {
    builder.append(tempWord);
   }
   builder.append(" ");
  }
  return builder.toString().trim();
 }
 
 
 public List getSuggestions(String misspelledWord){
  
  @SuppressWarnings("unchecked")
  List su99esti0ns = spellChecker.getSuggestions(misspelledWord, 0);
  List suggestions = new ArrayList();
  for (Word suggestion : su99esti0ns){
   suggestions.add(suggestion.getWord());
  }
  
  return suggestions;
 }

 
 @Override
 public void spellingError(SpellCheckEvent event) {
  event.ignoreWord(true);
  misspelledWords.add(event.getInvalidWord());
 }

 public static void main(String[] args) {
  JazzySpellChecker jazzySpellChecker = new JazzySpellChecker();
  String line = jazzySpellChecker.getCorrectedLine("This is a boook");
  System.out.println(line);
 }
}


PS:
1.The "string ... string" above is caused by a bug of the syntax highlighter and can be ignored.
2. I found a bug and corrected the code on April 10th.

19 comments:

  1. I want to highlight wrong words. Suggest me code.

    ReplyDelete
  2. Hi chauhan :

    Check getCorrectedText(String line) method

    "if (!spellChecker.isCorrect(tempWord))" suggests that the spelling of the word
    is not correct.

    ReplyDelete
  3. It looks like this could be applied to jsp pretty easily. Can you piont me in the right direction?

    ReplyDelete
  4. HI Sorry: I am not familiar with Jsp.

    ReplyDelete
  5. can we configure database at the place of txt file?
    since it's my project's requirment.

    waiting for suggestion...................

    ReplyDelete
  6. Hi Anonymous~ Sorry again, currently I don't know much related knowledge and have no good ideas. The simplest approach is to check spelling in your program after retrieving ResultSets from a database table. If I have deeper knowledge afterwards I will add new comments with new suggestion.

    ReplyDelete
  7. I like your ideas about reducing costs in the health care system is too good

    ReplyDelete
  8. good post thanks for sharing..............!

    ReplyDelete
  9. Hello, StringWordTokenizer is only taking one argument as String, how did u assigned it 2?

    ReplyDelete
  10. Hi Tom,

    Can you help me in understanding the reason behind the suggestions (for a misspelled words) having out of vocabulary words? Do i need to take each of the suggestion and check "iscorrect" ?

    ReplyDelete
  11. Its very good article. I am enhancing the same code in my blog. Thanks for sharing keep do it.

    ReplyDelete
  12. I am using maven project and included dependencies
    net.sf.jazzy
    jazzy
    0.5.2-rtext-1.4.1

    but TeXWordFinder is not resolved can anyone help me

    ReplyDelete
    Replies
    1. http://alvinalexander.com/java/jwarehouse/jazzy/src/com/swabunga/spell/event/TeXWordFinder.java.shtml

      Delete
    2. This comment has been removed by the author.

      Delete
  13. Where ever the String is declared, it is declared as "string" instead of "String". Could you please update your code with the same :)

    ReplyDelete
  14. I am having error. It doesn't seem to correct, I get same outout "This is a boook". book is not replaced.

    ReplyDelete