Doccano
The following document describes the current Doccano setup.
The Project
The current Doccano setup relies on manual setup of:
- The Virtual Machine inside the archipelo-dev project on GCP
- Doccano latest Docker container. See here for instructions.
- Manually installing Doccano
- Manually installing NGINX
- Manually installing Certbot
Docker Build and Run
First, pull Doccano's latest Docker image:
docker pull doccano/doccano
The next step is to create a Doccano container from the pulled image. Ensure that the VM has sufficient memory and compute, then create the container, setting the following environments:
docker container create --name doccano -e "ADMIN_USERNAME=admin" -e "ADMIN_EMAIL=admin@archipelo.co" -e "ADMIN_PASSWORD=<ADMIN_PASSWORD>" -e "WORKERS=8" -e "CELERY_WORKERS=8" -v doccano-db:/data -p 8000:8000 doccano/doccano
To receive the admin password, contact:
Docker Configuration
Once the Docker container is running, you can update it to set it to restart always, so that you don't have to manually do that after the VM restart. To do that, run the following command:
docker update --restart=always <CONTAINER_ID>
You can get the container identifier by running the following command:
docker ps
The output of the above command looks as follows:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
01dc2b43e63f doccano/doccano "/doccano/tools/run.…" 4 hours ago Up 3 hours 0.0.0.0:8000->8000/tcp doccano
Setting up the Domain Name
The main domain doccano.archipelo.com is configured and handled by CloudFlare.
To setup the dev-archer.com domain name for Doccano you need to configure the A
record to point to the Doccano external IP address. You can do that by going to the
Cloud DNS
section in archipelo-dev project in Google Cloud Platform.
Currently, the domain used for Doccano is doccano.dev-archer.com and the A
DNS record
is set to doccano.dev-archer.com.
with TTL
set to 3600
seconds with the value
pointing to the Doccano virtual machine external IP - 35.195.54.139
.
Setting Up NGINX
After installing NGINX on the same virtual machine that Doccano
is running, you need to configure it to handle the traffic on port 80
. We also need to
make sure that the caching is turned off. To do that, remove the
/etc/nginx/sites-enabled/default
file and create a new file called
/etc/nginx/sites-enabled/doccano
with the following content:
server {
server_name doccano.dev-archer.com;
location / {
proxy_pass http://localhost:8000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_no_cache 1;
proxy_cache_bypass 1;
add_header X-Frame-Options DENY;
add_header X-Content-Type-Options nosniff;
add_header X-XSS-Protection "1; mode=block";
add_header Last-Modified $date_gmt;
add_header Cache-Control 'no-store, no-cache, must-revalidate, proxy-revalidate, max-age=0';
if_modified_since off;
expires off;
etag off;
}
}
server {
server_name doccano.dev-archer.com;
listen 80;
return 301 https://$host$request_uri;
}
The above configuration assumes that you will be using the doccano.dev-archer.com
as the
domain name.
After that, restart NGINX by running the following command:
service nginx restart
Upload File Limits
By default, NGINX comes with a limit on how large the uploaded
file can be set to 1MB. This is not enough for the data that we want to work during the
labeling. To change the limits, adjust the /etc/nginx/nginx.conf
configuration file and
add the following line to the http
section:
client_max_body_size 100M;
After that, restart NGINX by running the following command:
service nginx restart
Setting Up SSL
Depending on the choice it is possible to expose Doccano via different domains. Currently, Doccano is set up to be accessible via:
- doccano.archipelo.com
- doccano.dev-archer.com (inactive)
Setting up SSL for them is different.
Warning The traffic on doccano.archipelo.com is proxied via CloudFlare which means it is more secure.
SSL For doccano.archipelo.com
To use doccano.archipelo.com you need to add the A
record
that points to the IP address of the Doccano virtual machine. You can do it via
CloudFlare. This requires asking a person with the admin
rights to do that for you, which includes:
- Dom
- Rafał
- Stepan
SSL For doccano.dev-archer.com
For Doccano, in its current setup we are using Certbot to handle the certificate. We will change that once we move Doccano to archipelo-prod project.
You need to install Certbot on the same virtual machine that Doccano and NGINX are running. You can do that by running the following commands:
apt install certbot
apt install python3-certbot-nginx
Once the installation is successful, you can start the certificate installation process by running the following command:
certbot --nginx
Answer the questions asked by the Certbot and it should guide you through the setup
process. The only thing to remember is to provide the appropriate e-mail to let the whole
engineering team know when the certificate will be expiring. We suggest using the
engineering@archipelo.co
.
Once that is done, restart NGINX by running the following command:
service nginx restart
Automatic Certificate Renewal
It is possible to setup Certbot for automatic certificate renewal. To do that, edit the
crontab
by running the following command:
crontab -e
And add the following entry:
0 12 * * * /usr/bin/certbot renew --quiet
Once that is done, save the file and automatic certificate renewal will be set up and ready.