Get rid of PDFs on GitHub

Get rid of PDFs on GitHub

Why you shouldn’t use GitHub for hosting PDFs

GitHub is a hosting service often used to keep track of files of almost any kind, including PDFs; but they actually are binary files, which makes me quite uncomfortable to deal with them in my repositories. Furthermore, GitHub itself doesn’t deal well with them:

  • It’s not possible to jump across pages
  • It’s not possible to select or search for text
  • Annotations are not visible
  • Diffs are not meaningful.

This is why I’m going to present automated alternatives using cloud storage services (I will focus on Google Drive, but many of the things are compatible with other services such as Dropbox, Onedrive etc.).

The right way of doing it

PDFs are uploaded on GitHub mostly for 2 purposes:

  • Backing them up.
  • Having the files everywhere. This especially happened to me within
    repositories containing LaTeX code, where I ended up pushing every time
    the new compiled PDF.

File manager integrations

Many DEs now offer integrations within their file managers which are capable of syncronizing with many popular cloud services and exposing the cloud folder directly in your explorer, without you needing to do anything apart from logging in.

You will find such a solution at least for the most popular ones, such as KDE, GNOME and Cinnamon.

Rclone: the command line all-syncing tool

If your prefer something more customizable, you can use rclone, a command line utility capable of syncronizing with 40 different cloud storage services. It offers a lot of interesting features (apart from the basic ones you can expect) such as the ability to sync just a folder, exclude files and SHA checks for every transfer. Its site offers an extensive and very detailed guide on how to use each of these features.

You may also want to combine this with some cron jobs.

A CI pipeline for tex repositories

The 2 previous solutions are clearly suitable for the first one the previously mentioned purposes. But what if you have a continously evolving tex repository, and you actually want to upload online your PDF without your .git folder becoming insanely big?

You can just exploit the wonderful GitHub Actions to let GitHub upload the PDF file to Google Drive for you each time you update the repository.

You need to start by creating a <ACTION-NAME>.yml file in the .github/workflows folder (you will probably need to create the folders too), containing a file with the following structure

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
name: <ACTION-NAME>
on: push
jobs:
compile-pdf:
name: Compile pdf
runs-on: ubuntu-latest

steps:
- name: Checkout repository
uses: actions/checkout@v2

- name: Compile pdf
uses: xu-cheng/latex-action@v2
with:
root_file: <TEX-FILENAME>.tex

- name: Upload Artifact
uses: actions/upload-artifact@v2
with:
name: drive-pdf
path: <TEX-FILENAME>.pdf
retention-days: 1

upload-pdf:
name: Upload pdf
runs-on: ubuntu-latest
needs: compile-pdf

steps:
- name: Download Artifact
uses: actions/download-artifact@v2
with:
name: drive-pdf

- name: Upload to Google Drive
uses: satackey/action-google-drive@v1
with:
skicka-tokencache-json: ${{ secrets.SKICKA_TOKENCACHE_JSON }}
upload-from: ./<TEX-FILENAME>.pdf
upload-to: /<GOOGLE DRIVE FOLDER PATH>

google-client-id: ${{ secrets.GOOGLE_CLIENT_ID }}
google-client-secret: ${{ secrets.GOOGLE_CLIENT_SECRET }}

Let’s break it down:

  • name is the name of the action (shown on the GitHub actions tab of your repository), it should correspond to the name of your file (without .yml)
  • on: push means it will be executed each tim you push a commit. You probably don’t want to change it.
  • compile-pdf is the name of the first job. It will produce the PDF from the latex file through the xu-cheng/latex-action@v2 action, which supports different compilers and options. Unless you need to do something fancy, you won’t probably change any option. Replace <TEX-FILENAME> with the path of the tex file (without considering the extension) each time it appears in the above file.
  • upload-pdf is the name of the second job, which will upload your PDF to the Google Drive folder you specify as <GOOGLE DRIVE FOLDER PATH>. But before doing this you need to store into the repositories the secrets needed for GitHub to log into your drive. Go into Settings → Secrets → New repository secret; generate and store them as explained in the Action README. You will probably need to create a Client ID in the Google developer Console. In that case don’t forget to put the application in production if you get an error about the application not being secure. Make sure you saved the secrets as SKICKA_TOKENCACHE_JSON, GOOGLE_CLIENT_ID and GOOGLE_CLIENT_SECRET.

Commit and push the .yml file into the GitHub repository that’s it. Now every time you push a commit the pdf on Google Drive will be updated accordingly. Notice also that the link doesn’t change across the diffent uploads, so you can just add it to the README or send it to anyone you want to show it to.

You can also check it out in one of my repositories.

If you found this article interesting follow me on Github!