Prompts generated from ChatGPT3.5, ChatGPT4, LLama3-8B, and Mistral-7B with NYT and HC3 topics in different roles and parameters configurations

Gonzalo, Martínez; José Alberto, Hernández; Javier, Conde; Pedro, Reviriego; Elena, Merino

doi:10.5281/ZENODO.11121394

Prompts generated from ChatGPT3.5, ChatGPT4, LLama3-8B, and Mistral-7B with NYT and HC3 topics in different roles and parameters configurations

Gonzalo, Martínez ¹
José Alberto, Hernández ¹
Javier, Conde ²
Pedro, Reviriego ²
Elena, Merino ³

1 Universidad Carlos III de Madrid

Universidad Carlos III de Madrid

Madrid, España

ROR https://ror.org/03ths8210
2 Universidad Politécnica de Madrid

Universidad Politécnica de Madrid

Madrid, España

ROR https://ror.org/03n6nwv02
3 Universidad de Valladolid

Universidad de Valladolid

Valladolid, España

ROR https://ror.org/01fvbaw18

Mostrar afiliaciones +

Editor: Zenodo

Año de publicación: 2024

Tipo: Dataset

DOI: 10.5281/ZENODO.11121394 Acceso abierto editor

Resumen

Description Prompts generated from ChatGPT3.5, ChatGPT4, Llama3-8B, and Mistral-7B with NYT and HC3 topics in different roles and parameter configurations. The dataset is useful to study lexical aspects of LLMs with different parameters/roles configurations. The 0_Base_Topics.xlsx file lists the topics used for the dataset generation The rest of the files collect the answers of ChatGPT to these topics with different configurations of parameters/context: Temperature (parameter): Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. Frequency penalty (parameter): Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. Top probability (parameter): An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. Presence penalty (parameter): Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics. Roles (context) Default: No role is assigned to the LLM, the default role is used. Child: The LLM is requested to answer as a five-year-old child. Young adult male: The LLM is requested to answer as a young male adult. Young adult female: The LLM is requested to answer as a young female adult. Elderly adult male: The LLM is requested to answer as an elderly male adult. Elderly adult female: The LLM is requested to answer as an elderly female adult. Affluent adult male: The LLM is requested to answer as an affluent male adult. Affluent adult female: The LLM is requested to answer as an affluent female adult. Lower-class adult male: The LLM is requested to answer as a lower-class male adult. Lower-class adult female: The LLM is requested to answer as a lower-class female adult. Erudite: The LLM is requested to answer as an erudite who uses a rich vocabulary. Paper Paper: Beware of Words: Evaluating the Lexical Richness of Conversational Large Language Models Cite: @misc{martínez2024beware, title={Beware of Words: Evaluating the Lexical Richness of Conversational Large Language Models}, author={Gonzalo Martínez and José Alberto Hernández and Javier Conde and Pedro Reviriego and Elena Merino}, year={2024}, eprint={2402.15518}, archivePrefix={arXiv}, primaryClass={cs.CL}}

Prompts generated from ChatGPT3.5, ChatGPT4, LLama3-8B, and Mistral-7B with NYT and HC3 topics in different roles and parameters configurations

Universidad Carlos III de Madrid

Universidad Politécnica de Madrid

Universidad de Valladolid

Resumen