AI Resources Master Resource List
Overview
Sources:
NOTE
Items with a â are âfavoritesâ or ârecommendedâ resources.
NOTE
All lists are displayed in alphabetical order.
- Libraries & Tools
- Guides
- Videos
- Papers
Diagram
graph LR
promptinglibs[Prompting Libraries & Tools]
promptingguides[Prompting Guides]
promptinglibs
promptinguides
Prompt Engineering
Libraries & Tools
Name | Note | Rating | Description |
---|---|---|---|
Arthur Shield | N/A | A paid product for detecting toxicity, hallucination, prompt injection, etc. | |
Baserun | N/A | A paid product for testing, debugging, and monitoring LLM-based apps | |
Chainlit | Chainlit | ââ | A Python library for making chatbot interfaces. |
Embedchain | Embedchain | ââ | A Python library for managing and syncing unstructured data with LLMs. |
FLAML | N/A | A Python library for automating selection of models, hyperparameters, and other tunable choices. | |
Guardrails.ai | N/A | A Python library for validating outputs and retrying failures. Still in alpha. | |
Guidance | N/A | A Python library from Microsoft using Handlebars templating. | |
Haystack | N/A | Open-source LLM orchestration framework. | |
HoneyHive | N/A | An enterprise platform to evaluate, debug, and monitor LLM apps. | |
LangChain | Langchain | âââ | A popular Python/JavaScript library for chaining language model prompts. |
LiteLLM | N/A | A minimal Python library for calling LLM APIs. | |
LlamaIndex | LlamaIndex | âââ | A Python library for augmenting LLM apps with data. |
LMQL | N/A | A programming language for LLM interaction with various supports. | |
OpenAI Evals | N/A | An open-source library for evaluating language models and prompts. | |
Outlines | N/A | A Python library for simplifying prompting and constraining generation. | |
Parea AI | N/A | A platform for debugging, testing, and monitoring LLM apps. | |
Portkey | N/A | A platform for observability and management in LLM apps. | |
Promptify | â | A small Python library for using language models in NLP tasks. | |
PromptPerfect | â | A paid product for testing and improving prompts. | |
Prompttools | ââ | Open-source Python tools for testing and evaluating models. | |
Scale Spellbook | âââ | A paid product for building and shipping language model apps. | |
Semantic Kernel | N/A | A library from Microsoft supporting prompt templating and more. | |
Vellum | N/A | A paid AI product development platform for LLM apps. | |
Weights & Biases | N/A | A paid product for tracking model training and prompt engineering. | |
YiVal | N/A | An open-source GenAI-Ops tool for tuning and evaluating prompts and more. |
Guides
- Brexâs Prompt Engineering Guide: Brexâs introduction to language models and prompt engineering.
- learnprompting.org: An introductory course to prompt engineering.
- LilâLog Prompt Engineering: An OpenAI researcherâs review of the prompt engineering literature (as of March 2023).
- OpenAI Cookbook: Techniques to improve reliability: A slightly dated (Sep 2022) review of techniques for prompting language models.
- promptingguide.ai: A prompt engineering guide that demonstrates many techniques.
- Xavi Amatriainâs Prompt Engineering 101 Introduction to Prompt Engineering and 202 Advanced Prompt Engineering: A basic but opinionated introduction to prompt engineering and a follow up collection with many advanced methods starting with CoT.
Academic Papers
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (2022): Using few-shot prompts to ask models to think step by step improves their reasoning. PaLMâs score on math word problems (GSM8K) rises from 18% to 57%.
- Self-Consistency Improves Chain of Thought Reasoning in Language Models (2022): Taking votes from multiple outputs improves accuracy even more. Voting across 40 outputs raises PaLMâs score on math word problems further, from 57% to 74%, andÂ
code-davinci-002
âs from 60% to 78%. - Tree of Thoughts: Deliberate Problem Solving with Large Language Models (2023): Searching over trees of step by step reasoning helps even more than voting over chains of thought. It liftsÂ
GPT-4
âs scores on creative writing and crosswords. - Language Models are Zero-Shot Reasoners (2022): Telling instruction-following models to think step by step improves their reasoning. It liftsÂ
text-davinci-002
âs score on math word problems (GSM8K) from 13% to 41%. - Large Language Models Are Human-Level Prompt Engineers (2023): Automated searching over possible prompts found a prompt that lifts scores on math word problems (GSM8K) to 43%, 2 percentage points above the human-written prompt in Language Models are Zero-Shot Reasoners.
- Reprompting: Automated Chain-of-Thought Prompt Inference Through Gibbs Sampling (2023): Automated searching over possible chain-of-thought prompts improved ChatGPTâs scores on a few benchmarks by 0â20 percentage points.
- Faithful Reasoning Using Large Language Models (2022): Reasoning can be improved by a system that combines: chains of thought generated by alternative selection and inference prompts, a halter model that chooses when to halt selection-inference loops, a value function to search over multiple reasoning paths, and sentence labels that help avoid hallucination.
- STaR: Bootstrapping Reasoning With Reasoning (2022): Chain of thought reasoning can be baked into models via fine-tuning. For tasks with an answer key, example chains of thoughts can be generated by language models.
- ReAct: Synergizing Reasoning and Acting in Language Models (2023): For tasks with tools or an environment, chain of thought works better if you prescriptively alternate between Reasoning steps (thinking about what to do) and Acting (getting information from a tool or environment).
- Reflexion: an autonomous agent with dynamic memory and self-reflection (2023): Retrying tasks with memory of prior failures improves subsequent performance.
- Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP (2023): Models augmented with knowledge via a âretrieve-then-readâ can be improved with multi-hop chains of searches.
- Improving Factuality and Reasoning in Language Models through Multiagent Debate (2023): Generating debates between a few ChatGPT agents over a few rounds improves scores on various benchmarks. Math word problem scores rise from 77% to 85%.
Conclusion
Appendix
Note created on 2024-04-29 and last modified on 2024-04-29.
Backlinks
(c) No Clocks, LLC | 2024