How to install and run the Polyglot package (Python) in Docker

I was having trouble getting Polyglot (NLP library) to run on my laptop, but I eventually got it working in Docker. I’m going to put some notes here, in case anyone else wants to experiment with it.

The difficult part was getting the ICU dependencies to work.1 I switched from a regular Python image to Anaconda Python (see the Dockerfile below), and that cleared up the dependency problems.

Code

Below are the contents of src/Dockerfile. (There’s an unfinished Flask app in my project, which is why it’s exposing port 4444. I didn’t want it to clash with another Flask app running on port 5000.)

FROM continuumio/anaconda3

RUN apt-get update && apt-get install -qq -y \
    build-essential libpq-dev vim --no-install-recommends

ENV HOST 0.0.0.0
ENV DEBUG true
ENV PORT 4444

ENV INSTALL_PATH /app
RUN mkdir -p $INSTALL_PATH

WORKDIR $INSTALL_PATH

COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt

COPY . .

# Install the models
# RUN polyglot download embeddings2.en transliteration2.ar

EXPOSE 4444

CMD ["gunicorn", "--bind", "0.0.0.0:4444", "--workers", "3", "app:app"]

This is src/requirements.txt:

# Data
numpy
pycld2
morfessor
pyicu
polyglot

# Flask
flask
Flask-Cors
gunicorn

And in the root of the project is a docker-compose.yml file:

version: '3'

services:
    py:
        build: "./src"
        ports:
            - "4444:4444"
        volumes:
            - ./src:/app

Then you can start it with:

$ docker-compose up --build

Find the container ID with:

$ docker container ps

Then to enter the container:

$ docker container exec -it <container_id> bash

From there you can start Python:

(base) root@9a34ddbc7255:/app# python
Python 3.7.4 (default, Aug 13 2019, 20:35:49) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 

Get the models (in this case “th” for Thai):

>>> from polyglot.downloader import downloader
>>> downloader.download("transliteration2.th", quiet=True)
True
>>> 

I think it’s possible to download those from the Dockerfile, but I haven’t set that up yet. For example, to download sentiment analysis data for English, do:

$ polyglot download sentiment2.en

I’m still having a problem downloading the Polish transliteration models, but I don’t need them for now. The error message for that was:

>>> from polyglot.downloader import downloader
>>> downloader.download("TASK:transliteration2", quiet=True)
[polyglot_data] Error downloading 'transliteration2.pl' from <http://p
[polyglot_data]     olyglot.cs.stonybrook.edu/~polyglot/transliteratio
[polyglot_data]     n2/pl/transliteration.pl.tar.bz2>:   HTTP Error
[polyglot_data]     403: Forbidden
False

1 To help people find this page in search engines, some of the ICU-related error messages included the following text:

KeyError: 'ICU_VERSION'

and

ModuleNotFoundError: No module named 'icu'