Prompts generated from ChatGPT3.5, ChatGPT4, LLama3-8B, and Mistral-7B with NYT and HC3 topics in different roles and parameters configurations

  1. Gonzalo, Martínez 1
  2. José Alberto, Hernández 1
  3. Javier, Conde 2
  4. Pedro, Reviriego 2
  5. Elena, Merino 3
  1. 1 Universidad Carlos III de Madrid
    info

    Universidad Carlos III de Madrid

    Madrid, España

    ROR https://ror.org/03ths8210

  2. 2 Universidad Politécnica de Madrid
    info

    Universidad Politécnica de Madrid

    Madrid, España

    ROR https://ror.org/03n6nwv02

  3. 3 Universidad de Valladolid
    info

    Universidad de Valladolid

    Valladolid, España

    ROR https://ror.org/01fvbaw18

Editor: Zenodo

Año de publicación: 2024

Tipo: Dataset

Resumen

Description Prompts generated from ChatGPT3.5, ChatGPT4, Llama3-8B, and Mistral-7B with NYT and HC3 topics in different roles and parameter configurations.  The dataset is useful to study lexical aspects of LLMs with different parameters/roles configurations. The 0_Base_Topics.xlsx file lists the topics used for the dataset generation The rest of the files collect the answers of ChatGPT to these topics with different configurations of parameters/context: Temperature (parameter): Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. Frequency penalty (parameter): Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. Top probability (parameter): An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. Presence penalty (parameter): Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics. Roles (context) Default: No role is assigned to the LLM, the default role is used. Child: The LLM is requested to answer as a five-year-old child.  Young adult male: The LLM is requested to answer as a young male adult.     Young adult female: The LLM is requested to answer as a young female adult.     Elderly adult male: The LLM is requested to answer as an elderly male adult.     Elderly adult female: The LLM is requested to answer as an elderly female adult. Affluent adult male: The LLM is requested to answer as an affluent male adult.     Affluent adult female: The LLM is requested to answer as an affluent female adult.  Lower-class adult male: The LLM is requested to answer as a lower-class male adult.  Lower-class adult female: The LLM is requested to answer as a lower-class female adult.   Erudite: The LLM is requested to answer as an erudite who uses a rich vocabulary. Paper Paper: Beware of Words: Evaluating the Lexical Richness of Conversational Large Language Models Cite: @misc{martínez2024beware,      title={Beware of Words: Evaluating the Lexical Richness of Conversational Large Language Models},       author={Gonzalo Martínez and José Alberto Hernández and Javier Conde and Pedro Reviriego and Elena Merino},      year={2024},      eprint={2402.15518},      archivePrefix={arXiv},      primaryClass={cs.CL}}