Write your professional CV in LaTeX with ChatGPT in 4 minutes!!!

Are you eager to dive headfirst into the world of work, but find yourself lacking a polished, professional Curriculum Vitae? Fret not, for today, we embark on a journey to craft a meticulously structured CV in a matter of minutes, guided by the expert hand of ChatGPT. Wondering what sets the stage for my confidence?

The blueprint has been meticulously curated by a seasoned HR professional boasting over 25 years of industry wisdom. Enter LaTeX, the unrivaled champion in elegantly formatting your documents, revered in the scientific realm as the hallmark of professionalism. You needn’t be well-versed in these intricacies, as my prompt and ChatGPT stand ready to handle every detail, ensuring your CV shines brilliantly.

Step by step:

Ask ChatGPT with the correct prompt
Compile the LaTeX code
Candidate for your next 300k job

Let’s get started without any further delay!

Finetune and validate your version of ChatGPT

Finally last week OpenAI released the APIs to finetune the popular model on which ChatGPT is based!

Following their suggestions we will go through step by step from the dataset preparation to the tuned model usage.

Until a few months ago finetuning a general model was not a straightforward task, it involved knowledge in training, ML frameworks, and the availability of computational units like GPUs. Lately, things are becoming easier thanks to people’s open source projects which already implement the heavy parts for you.

OpenAI gives you the chance to train your LLM even if you don’t know anything about the subject, you’ll just need a bit of Python and an OpenAI account.

What we will cover:

When and why adopt finetune
Dataset preparation
Finetune
Validation
Final thoughts

Il transformer illustrato - IT

Disclaimer

Traduzione italiana di The illustrated Transformer by Jay Alammar
Non sono un traduttore professionista.
La proprietà intellettuale dell’articolo è di Jay Alammar

Italian translation of The illustrated Transformer by Jay Alammar
I’m not a professional translator.
The intellectual property of the article is owned by Jay Alammar

Nel post precedente, abbiamo esaminato l’Attention – un metodo onnipresente nei moderni modelli di deep learning. L’attention è uno strumento che ha contribuito a migliorare le prestazioni delle applicazioni di traduzione automatica che utilizzano modelli neurali. In questo post, esamineremo Il Transformer, un modello che utilizza l’attention per aumentare la velocità con cui queste reti possono essere addestrati. Il Trasformer ha perfino supera il modello di traduzione automatica neurale di Google in attività specifiche. Il più grande vantaggio, tuttavia, deriva dal modo in cui il Transformer si presta alla parallelizzazione. È infatti raccomandazione di Google Cloud sfruttare il Transformer come modello di riferimento per utilizzare la loro proposte di Cloud TPU. Proviamo a scomporre il modello e vediamo come funziona.

Il Transformer è stato proposto nell’articolo Attention is All You Need. Una sua implementazione TensorFlow è disponibile come parte del pacchetto Tensor2Tensor. Il gruppo NLP di Harvard ha creato una guida che spiega l’articolo con implementazioni in PyTorch. In questo post, cercheremo di semplificare un po’ le cose e di introdurre i concetti uno per uno, sperando che sia più facile da capire per le persone senza una conoscenza approfondita dell’argomento.

Aggiornamento 2020: Ho creato il video “Transformer narrati” che è un approccio più soft all’argomento:

How to quantize your finetuned llama model

Imagine you have just trained your brand new large language model using a supercluster with 8xA100 80GB on multiple nodes but now find butterflies flying away from your pocket and you can infer your creation only on a low-budget CPU machine or simply you are looking for a cheap way to put in production your buddy. In this guide, we will see how to shrink as much as we can the memory usage of our model and be able to run it with as small resources as 8GB of RAM.

To reach the top we will exploit two tricks:

int precision quantization
C++ code conversion

All of this will be possible thanks to the amazing work of llama.cpp !!!

Disclaimer

This guide has been tested with a finetuned version of llama 7B from the huggingface hub which uses the Vicuna training pipeline but in general, should work with any llama model that is saved in a pytorch fashion.

High-level summary:

Clone lama.cpp repo on a machine equipped with GPU.
Compile the repo and Quantize your model.
Enjoy inference from a terminal, web server, python, or docker on almost any device.

Batches with texts of different lengths

When I was experimenting with nanoGPT by Andrej Karpathy I saw that as in many other machine learning training, it’s common to concatenate the sample of the dataset separating them with an token. This makes sense to me at the moment when your model has to learn the human language structure but not during the fine-tuning process. This is because we could concatenate texts with completely different and uncorrelated contexts inducing the model to evaluate the generation of the next token based on, possibly, two different topics.

To solve this I tried to create a simple batch pipeline called batchization process which groups the texts by the number of tokens plus some tricks. Let’s check this out!

High-level guideline:

For clarity, we will divide the process into two steps

Grouping the texts by length
Creating a pseudo dataloader

How to connect Lambdalabs to Pycharm via SSH

In this article, we are going to see how to connect the famous editor by JetBrains Pycharm with the cheapest cloud GPU provider nowadays Lambda Cloud.

For the ones new to the topic, GPU cloud services permit you to run your scripts on powerful machines with a lot of RAM and the computational performances of GPUs. In particular, Lambdalabs provides a cheap, fast, and reliable service.

High-level guideline:

Instantiate a machine on lambdalabs
open the ssh via terminal and setup the environment
link pycharm

Bio!

Bio