AI Resources Master Resource List

title: Contents 
style: nestedList # TOC style (nestedList|inlineFirstLevel)
minLevel: 1 # Include headings from the specified level
maxLevel: 4 # Include headings up to the specified level
includeLinks: true # Make headings clickable
debugInConsole: false # Print debug info in Obsidian console

Overview

Sources:

Related resources from around the web | OpenAI Cookbook

NOTE

Items with a ⭐ are “favorites” or “recommended” resources.

NOTE

All lists are displayed in alphabetical order.

Libraries & Tools
Guides
Videos
Papers

Diagram

graph LR
  promptinglibs[Prompting Libraries & Tools]
  promptingguides[Prompting Guides]

promptinglibs
promptinguides

Prompt Engineering

Examples - OpenAI API

Libraries & Tools

Name	Note	Rating	Description
Arthur Shield		`N/A`	A paid product for detecting toxicity, hallucination, prompt injection, etc.
Baserun		`N/A`	A paid product for testing, debugging, and monitoring LLM-based apps
Chainlit	Chainlit	⭐⭐	A Python library for making chatbot interfaces.
Embedchain	Embedchain	⭐⭐	A Python library for managing and syncing unstructured data with LLMs.
FLAML		`N/A`	A Python library for automating selection of models, hyperparameters, and other tunable choices.
Guardrails.ai		`N/A`	A Python library for validating outputs and retrying failures. Still in alpha.
Guidance		`N/A`	A Python library from Microsoft using Handlebars templating.
Haystack		`N/A`	Open-source LLM orchestration framework.
HoneyHive		`N/A`	An enterprise platform to evaluate, debug, and monitor LLM apps.
LangChain	Langchain	⭐⭐⭐	A popular Python/JavaScript library for chaining language model prompts.
LiteLLM		`N/A`	A minimal Python library for calling LLM APIs.
LlamaIndex	LlamaIndex	⭐⭐⭐	A Python library for augmenting LLM apps with data.
LMQL		`N/A`	A programming language for LLM interaction with various supports.
OpenAI Evals		`N/A`	An open-source library for evaluating language models and prompts.
Outlines		`N/A`	A Python library for simplifying prompting and constraining generation.
Parea AI		`N/A`	A platform for debugging, testing, and monitoring LLM apps.
Portkey		`N/A`	A platform for observability and management in LLM apps.
Promptify		⭐	A small Python library for using language models in NLP tasks.
PromptPerfect		⭐	A paid product for testing and improving prompts.
Prompttools		⭐⭐	Open-source Python tools for testing and evaluating models.
Scale Spellbook		⭐⭐⭐	A paid product for building and shipping language model apps.
Semantic Kernel		`N/A`	A library from Microsoft supporting prompt templating and more.
Vellum		`N/A`	A paid AI product development platform for LLM apps.
Weights & Biases		`N/A`	A paid product for tracking model training and prompt engineering.
YiVal		`N/A`	An open-source GenAI-Ops tool for tuning and evaluating prompts and more.

Guides

Brex’s Prompt Engineering Guide: Brex’s introduction to language models and prompt engineering.
learnprompting.org: An introductory course to prompt engineering.
Lil’Log Prompt Engineering: An OpenAI researcher’s review of the prompt engineering literature (as of March 2023).
OpenAI Cookbook: Techniques to improve reliability: A slightly dated (Sep 2022) review of techniques for prompting language models.
promptingguide.ai: A prompt engineering guide that demonstrates many techniques.
Xavi Amatriain’s Prompt Engineering 101 Introduction to Prompt Engineering and 202 Advanced Prompt Engineering: A basic but opinionated introduction to prompt engineering and a follow up collection with many advanced methods starting with CoT.

Academic Papers

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (2022): Using few-shot prompts to ask models to think step by step improves their reasoning. PaLM’s score on math word problems (GSM8K) rises from 18% to 57%.
Self-Consistency Improves Chain of Thought Reasoning in Language Models (2022): Taking votes from multiple outputs improves accuracy even more. Voting across 40 outputs raises PaLM’s score on math word problems further, from 57% to 74%, and code-davinci-002’s from 60% to 78%.
Tree of Thoughts: Deliberate Problem Solving with Large Language Models (2023): Searching over trees of step by step reasoning helps even more than voting over chains of thought. It lifts GPT-4’s scores on creative writing and crosswords.
Language Models are Zero-Shot Reasoners (2022): Telling instruction-following models to think step by step improves their reasoning. It lifts text-davinci-002’s score on math word problems (GSM8K) from 13% to 41%.
Large Language Models Are Human-Level Prompt Engineers (2023): Automated searching over possible prompts found a prompt that lifts scores on math word problems (GSM8K) to 43%, 2 percentage points above the human-written prompt in Language Models are Zero-Shot Reasoners.
Reprompting: Automated Chain-of-Thought Prompt Inference Through Gibbs Sampling (2023): Automated searching over possible chain-of-thought prompts improved ChatGPT’s scores on a few benchmarks by 0–20 percentage points.
Faithful Reasoning Using Large Language Models (2022): Reasoning can be improved by a system that combines: chains of thought generated by alternative selection and inference prompts, a halter model that chooses when to halt selection-inference loops, a value function to search over multiple reasoning paths, and sentence labels that help avoid hallucination.
STaR: Bootstrapping Reasoning With Reasoning (2022): Chain of thought reasoning can be baked into models via fine-tuning. For tasks with an answer key, example chains of thoughts can be generated by language models.
ReAct: Synergizing Reasoning and Acting in Language Models (2023): For tasks with tools or an environment, chain of thought works better if you prescriptively alternate between Reasoning steps (thinking about what to do) and Acting (getting information from a tool or environment).
Reflexion: an autonomous agent with dynamic memory and self-reflection (2023): Retrying tasks with memory of prior failures improves subsequent performance.
Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP (2023): Models augmented with knowledge via a “retrieve-then-read” can be improved with multi-hop chains of searches.
Improving Factuality and Reasoning in Language Models through Multiagent Debate (2023): Generating debates between a few ChatGPT agents over a few rounds improves scores on various benchmarks. Math word problem scores rise from 77% to 85%.

Conclusion

Appendix

Note created on 2024-04-29 and last modified on 2024-04-29.

Backlinks

LIST FROM [[AI Resources Master List]] AND -"CHANGELOG" AND -"//AI Resources Master List"

🪴 Quartz 4.0

Explorer

List - AI Master Resource List

AI Resources Master Resource List

Overview

Diagram

Prompt Engineering

Libraries & Tools

Guides

Academic Papers

Conclusion

Appendix

Backlinks

Graph View

Table of Contents

Backlinks