What is Chat GPT

ChatGPT, which stands for Chat Generative Pre-trained Transformer, is a large language model–based chatbot developed by OpenAI and launched on November 30, 2022, which enables users to refine and steer a conversation towards a desired length, format, style, level of detail, and language used. Successive prompts and replies, known as prompt engineering, are considered at each conversation stage as a context.[2]

ChatGPT is built upon GPT-3.5 and GPT-4 —members of OpenAI's proprietary series of generative pre-trained transformer (GPT) models, based on the transformer architecture developed by Google[3]—and it is fine-tuned for conversational applications using a combination of supervised and reinforcement learning techniques.[4] ChatGPT was released as a freely available research preview, but due to its popularity, OpenAI now operates the service on a freemium model. It allows users on its free tier to access the GPT-3.5-based version. In contrast, the more advanced GPT-4 based version and priority access to newer features are provided to paid subscribers under the commercial name "ChatGPT Plus".

By January 2023, it had become what was then the fastest-growing consumer software application in history, gaining over 100 million users and contributing to OpenAI's valuation growing to US$29 billion.[5][6] Within months, Google, Baidu, and Meta accelerated the development of their competing products: Bard, Ernie Bot, and LLaMA.[7] Microsoft launched its Bing Chat based on OpenAI's GPT-4. Some observers expressed concern over the potential of ChatGPT to displace or atrophy human intelligence and its potential to enable plagiarism or fuel misinformation.[4][8]

How to use the Chat:

Prompt engineering, primarily used in communication with a text-to-text model and text-to-image model, is the process of structuring text that can be interpreted and understood by a generative AI model.[1][2] Prompt engineering is enabled by in-context learning, defined as a model's ability to temporarily learn from prompts. The ability for in-context learning is an emergent ability[3] of large language models.

A prompt is natural language text describing the task that an AI should perform.[4] A prompt for a text-to-text model can be a query such as "what is Fermat's little theorem?"[5] a command such as "write a poem about leaves falling",[6] a short statement of feedback (for example, "too verbose", "too formal", "rephrase again", "omit this word") or a longer statement including context, instructions,[7] and input data. Prompt engineering may involve phrasing a query, specifying a style,[6] providing relevant context[8] or assigning a role to the AI such as "Act as a native French speaker".[9] Prompt engineering may consist of a single prompt that includes a few examples for a model to learn from, such as "maison -> house, chat -> cat, chien ->",[10] an approach called few-shot learning.[11]

When communicating with a text-to-image or a text-to-audio model, a typical prompt is a description of a desired output such as "a high-quality photo of an astronaut riding a horse"[12] or "Lo-fi slow BPM electro chill with organic samples".[13] Prompting a text-to-image model may involve adding, removing, emphasizing and re-ordering words to achieve a desired subject, style,[1] layout, lighting,[14] and aesthetic.

In-context learning

Prompt engineering techniques are enabled by in-context learning. In-context learning itself is an emergent property of model scale, meaning breaks[15] in downstream scaling laws occur such that its efficacy increases at a different rate in larger models than in smaller models.[16][17]

In contrast to training and fine tuning for each specific task, which are not temporary, what has been learnt during in-context learning is of a temporary nature. It does not carry the temporary contexts or biases, except the ones already present in the (pre)training dataset, from one conversation to the other.[18] This result of "mesa-optimization"[19][20] within transformer layers, is a form of meta-learning or "learning to learn"

Chain-of-thought[edit]

Chain-of-thought (CoT) prompting is a technique that allows large language models (LLMs) to solve a problem as a series of intermediate steps[27] before giving a final answer. Chain-of-thought prompting improves reasoning ability by inducing the model to answer a multi-step problem with steps of reasoning that mimic a train of thought.[28][17][29] It allows large language models to overcome difficulties with some reasoning tasks that require logical thinking and multiple steps to solve, such as arithmetic or commonsense reasoning questions.[30][31][32]

For example, given the question "Q: The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have?", a CoT prompt might induce the LLM to answer "A: The cafeteria had 23 apples originally. They used 20 to make lunch. So they had 23 - 20 = 3. They bought 6 more apples, so they have 3 + 6 = 9. The answer is 9."[17]

As originally proposed,[17] each CoT prompt included a few Q&A examples. This made it a few-shot prompting technique. However, simply appending the words "Let's think step-by-step",[33] has also proven effective, which makes CoT a zero-shot prompting technique. This allows for better scaling as a user no longer needs to formulate many specific CoT Q&A examples.[34]

When applied to PaLM, a 540B parameter language model, CoT prompting significantly aided the model, allowing it to perform comparably with task-specific fine-tuned models on several tasks, even setting a new state of the art at the time on the GSM8K mathematical reasoning benchmark.[17] It is possible to fine-tune models on CoT reasoning datasets to enhance this capability further and stimulate better interpretability.[35][36]

Other techniques

Chain-of-thought prompting is just one of many prompt-engineering techniques. Various other techniques have been proposed.

Generated knowledge prompting

Generated knowledge prompting[37] first prompts the model to generate relevant facts for completing the prompt, then proceed to complete the prompt. The completion quality is usually higher, as the model can be conditioned on relevant facts.

Least-to-most prompting

Least-to-most prompting[38] prompts a model to first list the sub-problems to a problem, then solve them in sequence, such that later sub-problems can be solved with the help of answers to previous sub-problems.

Self-consistency decoding

Self-consistency decoding[39] performs several chain-of-thought rollouts, then selects the most commonly reached conclusion out of all the rollouts. If the rollouts disagree by a lot, a human can be queried for the correct chain of thought.[40]

Complexity-based prompting

Complexity-based prompting[41] performs several CoT rollouts, then select the rollouts with the longest chains of thought, then select the most commonly reached conclusion out of those.

Self-refine

Self-refine[42] prompts the LLM to solve the problem, then prompts the LLM to critique its solution, then prompts the LLM to solve the problem again in view of the problem, solution, and critique. This process is repeated until stopped, either by running out of tokens, time, or by the LLM outputting a "stop" token.

Tree-of-thought

Tree-of-thought prompting[43] generalizes chain-of-thought by prompting the model to generate one or more "possible next steps", and then running the model on each of the possible next steps by breadth-first, beam, or some other method of tree search.[44]

Maieutic prompting

Maieutic prompting is similar to tree-of-thought. The model is prompted to answer a question with an explanation. The model is then prompted to explain parts of the explanation, and so on. Inconsistent explanation trees are pruned or discarded. This improves performance on complex commonsense reasoning.[45]

Directional-stimulus prompting

Directional-stimulus prompting[46] includes a hint or cue, such as desired keywords, to guide a language model toward the desired output.

Prompting to disclose uncertainty

By default, the output of language models may not contain estimates of uncertainty. The model may output text that appears confident, though the underlying token predictions have low likelihood scores. Large language models like GPT-4 can have accurately calibrated likelihood scores in their token predictions,[47] and so the model output uncertainty can be directly estimated by reading out the token prediction likelihood scores.

But if one cannot access such scores (such as when one is accessing the model through a restrictive API), uncertainty can still be estimated and incorporated into the model output. One simple method is to prompt the model to use words to estimate uncertainty. Another is to prompt the model to refuse to answer in a standardized way if the input does not satisfy conditions.[citation needed]

Automatic prompt generation

Retrieval-augmented generation

Prompts often contain a few examples (thus "few-shot"). Examples can be automatically retrieved from a database with document retrieval, sometimes using a vector database. Given a query, a document retriever is called to retrieve the most relevant (usually measured by first encoding the query and the documents into vectors, then finding the documents with vectors closest in Euclidean norm to the query vector). The LLM then generates an output based on both the query and the retrieved documents.[48]

Using language models to generate prompts

Large language models (LLM) themselves can be used to compose prompts for large language models.

The automatic prompt engineer algorithm uses one LLM to beam search over prompts for another LLM:[49]

There are two LLMs. One is the target LLM, and another is the prompting LLM.
Prompting LLM is presented with example input-output pairs, and asked to generate instructions that could have caused a model following the instructions to generate the outputs, given the inputs.
Each of the generated instructions is used to prompt the target LLM, followed by each of the inputs. The log-probabilities of the outputs are computed and added. This is the score of the instruction.
The highest-scored instructions are given to the prompting LLM for further variations.
Repeat until some stopping criteria is reached, then output the highest-scored instructions.

CoT examples can be generated by LLM themselves. In "auto-CoT",[50] a library of questions are converted to vectors by a model such as BERT. The question vectors are clustered. Questions nearest to the centroids of each cluster are selected. An LLM does zero-shot CoT on each question. The resulting CoT examples are added to the dataset. When prompted with a new question, CoT examples to the nearest questions can be retrieved and added to the prompt.