Spell checker: run LanguageTool server in Docker

Diese Seite gibt es auch in Deutsch

While looking for a better spell checker for the browser, I came across the open source software LanguageTool. LanguageTool corrects errors for English, Spanish, French, German, Portuguese, Polish, Dutch and more than 20 other languages. In doing so, LanguageTool also finds errors that a simple spell checker cannot detect. Those who do not want to send their texts to a cloud service can run a LanguageTool server themselves for this purpose. Since the service is also available as a Docker version, it can be easily started on any computer or server and easily used in your own network.

Browser plugin

LanguageTool is available as a browser plugin for the well-known web browsers, such as Google Chrome, Firefox or Edge. By default, the plugin sends all text entered in the browser to the URL: https://languagetool.org.

Functionality

LanguageTool examines all input fields and can be used universally for all web pages or web applications.

Advanced settings - own server service

Those who run their own LanguageTool server can store its address in the browser plugin.

Launch Docker container

Docker Basics

Docker allows applications to be launched by command in a so-called container.
A container is an isolated environment independent of the operating system (OS):
When a container is first launched, Docker independently loads all the necessary sources
from the internet.
Docker can be installed on Windows, macOS or an Linux Distribution
I have filled a docker-compose.yml file with the following content for starting LanguageTool:

version: "3"

services:
  languagetool:
    image: erikvl87/languagetool
    container_name: languagetool
    ports:
        - 8010:8010  # Using default port from the image
    environment:
        - langtool_languageModel=/ngrams  # OPTIONAL: Using ngrams data
        - Java_Xms=2g  # OPTIONAL: Setting a minimal Java heap size of 512 mib
        - Java_Xmx=4g  # OPTIONAL: Setting a maximum Java heap size of 1 Gib
        - timeoutRequestLimit=120
    volumes:
        - ./ngrams:/ngrams        
    restart: always

To make LanguageTool work with longer texts, I set the "Java heap size": "Java_Xms" to 2g and "Java_Xmx" to 4g in the example. The folder /.ngrams should be filled with NGRAM data for a more accurate spell check.

NGRAM data

To increase the accuracy of the server, so-called NGRAM data can be used. NGRAM data are decomposed text fragments that can be used to incorporate static probabilities for spell checking. The download of the NGRAM data is available at the following URL: languagetool.org/download/ngram-data/. The zip files should be unzipped to the ngrams subfolder:

On a Linux machine, the data can be loaded and unzipped via the terminal as follows:

wget https://languagetool.org/download/ngram-data/ngrams-de-20150819.zip
cd ngrams && unzip ../ngrams-de-20150819.zip
cd ..
wget https://languagetool.org/download/ngram-data/ngrams-en-20150817.zip
cd ngrams && unzip ../ngrams-en-20150817.zip

Conclusion

The LanguageTool has a very good recognition performance and looks not only at individual words, but also at entire sentences. In addition to the spelling check and punctuation, sentences are also marked where a possible style improvement is detected, for example repeated words or sentences that are too long. Since LanguageTool can be run on the user's own server, there is nothing to stop it from being used for sensitive texts.

 

positive Bewertung({{pro_count}})
Rate Post:
{{percentage}} % positive
negative Bewertung({{con_count}})

THANK YOU for your review!

Publication: 2022-10-23 from Bernhard 🔔


Top articles in this section


Nextcloud Server Docker | Setup + https: Let's Encrypt [ssl]
To synchronize contacts, appointments, and photos of my NAS, I tested Nextcloud and thus turned my back on other cloud providers for my private data. Thanks to Docker, the installation is easier and more flexible than ever, allowing Nextcloud to run on almost any hardware.

Home Assistant Docker Conbee 2 and Zigbee2MQTT / deCONZ
Thanks to numerous integration options,Home Assistant is a simple platform for controlling a wide range of smart home devices. Compared to ioBroker, I found it much easier to get started with Home Assistant. While for ioBroker I was still searching for which frontend I could use for my dashboards, with Home-Assistant I had a ready-made system out of the box. Home Assistant's Lovelance dashboards can be easily clicked together in the GUI and adapted for special customizations in the code editor...

Commissioning Zigbee2MQTT in Docker - step by step
Zigbee2MQTT is an open source Zigbee bridge which can be easily integrated into existing smart home solutions thanks to the MQTT network protocol. As an example, Zigbee2MQTT combined with MQTT broker Mosquitto and Home Assistant can collect, display, record and control data from Zigbee devices. The setup described here uses Docker as a base. Manufacturer's website: https://www.zigbee2mqtt.io

Questions / Comments


By continuing to browse the site, you agree to our use of cookies. More Details