Large Language Model (LLM) and Generative Pre-trained Transformer (GPT) References for Teachers
Some references related to Large Language Model (LLM) and Generative Pre-trained Transformer (GPT) resources such as ChatGPT, and their use in the chemistry classroom. Please send additions to ssinglet@coe.edu and I will include them in this list.
Using Generative AI Systems for Critical Thinking Engagement in an Advanced Chemistry Course: A Case Study
Pence, et. al., J. Chem. Educ. 2024, doi: 10.1021/acs.jchemed.4c00242
A series of critical thinking assignments was created for students in an advanced chemistry course to interact with and evaluate generative AI (GenAI) systems. Students explored GenAI’s facility with producing summaries of C&EN articles, analyzing titration data, and closely reading literature articles. For each assignment, the students evaluated the output using a critical thinking exercise and presented their results using written reports. The students found GenAI to be effective at summarizing news articles, although it demonstrated inaccuracies in mathematical calculations and produced mixed results in answering technical questions based on specific literature articles. The assignments provided valuable practice for students’ critical thinking skills.
Enhancing AI Responses in Chemistry: Integrating Text Generation, Image Creation, and Image Interpretation through Different Levels of Prompts
Wilton J. D. Nascimento Júnior, Carla Morais, Gildo Girotto Júnior, J. Chem. Ed., 2024, doi.org/10.1021/acs.jchemed.4c00230
Generative Artificial Intelligence technologies can potentially transform education, benefiting teachers and students. This study evaluated various GAIs, including ChatGPT 3.5, ChatGPT 4.0, Google Bard, Bing Chat, Adobe Firefly, Leonardo.AI, and DALL-E, focusing on textual and imagery content. Utilizing initial, intermediate, and advanced prompts, we aim to simulate GAI responses tailored to users with varying levels of knowledge. We aim to investigate the possibilities of integrating content from Chemistry Teaching. The systems presented responses appropriate to the scientific consensus for textual generation, but they revealed alternative chemical content conceptions. In terms of the interpretation of chemical system representations, only ChatGPT 4.0 accurately identified the content in all of the images. In terms of image production, even with more advanced prompts and subprompts, Generative Artificial Intelligence still presents difficulties in content production. The use of prompts involving the Python language promoted an improvement in the images produced. In general, we can consider content production as support for chemistry teaching, but only with more advanced prompts do the answers tend to present fewer errors. The importance of previously understanding chemistry concepts and systems’ functioning is noted.
ChatGPT as an Instructor’s Assistant for Generating and Scoring Exams
Alberto A. Fernández, Margarita López-Torres, Jesús J. Fernández, Digna Vázquez-García J. Chem. Educ. 2024, https://doi.org/10.1021/acs.jchemed.4c00231, CC-BY
This study assessed ChatGPT’s proficiency in responding to questions from University Entrance Exams typically administered to senior secondary students. Our findings indicate that ChatGPT version 4.0 consistently outperformed students, achieving higher average scores across exams from the past four years. However, it still committed errors in about 20% of its responses. Despite this, ChatGPT 4.0 demonstrated a robust capability to comprehend and produce natural language within a chemical context. Consequently, by applying diverse prompt engineering techniques, this AI was able to create short-answer questions and numerical problems that closely mimic the format and conceptual content of University Entrance Exams. We also confirmed that ChatGPT 4.0 could grade exams, showing a significant correlation with scores given by human evaluators but lower than that among human graders. This discrepancy and other practical considerations limit its application in grading exams.
Exploring the Concept of Valence and the Nature of Science via Generative Artificial Intelligence and General Chemistry Textbooks
Rebecca M. Jones, Eva-Maria Rudler, Conner Preston, J. Chem. Educ., 2024, https://doi.org/10.1021/acs.jchemed.4c00271
This paper explores historical and modern perspectives of the concept of valence in the context of collegiate general chemistry and draws comparisons to responses from generative artificial intelligence (genAI) tools such as ChatGPT. A fundamental concept in chemistry, valence in the early and mid-20th century was primarily defined as the “combining capacity” of atoms. Twenty-first century textbooks do not include this historical definition but rather use valence as an adjective to modify other nouns, e.g., valence electron or valence orbital. To explore these different perspectives in other information sources that could be used by students, we used a systematic series of prompts about valence to analyze the responses from ChatGPT, Bard, Liner, and ChatSonic from September and December 2023. Our findings show the historical definition is very common in responses to prompts which use valence or valency as a noun but less common when prompts include valence as an adjective. Regarding this concept, the state-of-the-art genAI tools are more consistent with textbooks from the 1950s than modern collegiate general chemistry textbooks. These findings present an opportunity for chemistry educators to observe and discuss with students the nature of science and how our understanding of chemistry changes. Including implications for educators, we present an example activity that may be deployed in general chemistry classes.
Student Perceptions of Artificial Intelligence Utility in the Introductory Chemistry Classroom
Mendez, James D., J. Chem. Educ., 2024, https://doi.org/10.1021/acs.jchemed.4c00075
This study investigates student perceptions of generative artificial intelligence (AI) in an introductory chemistry course. Students engaged with AI chatbots of their choice to correct missed exam questions, revealing overall positive attitudes toward their usefulness. Despite this positive perception, the study shows a disconnect between the overall media portrayal of AI in academia and how actual students use it. Only a small number of students had used AI before this activity. This study highlights the need for training on responsible AI use to address ongoing ethical concerns over the misuse of these systems and to get ahead of future issues.
Large Language Models are Catalyzing Chemistry Education
Du Y, Duan C, Bran A, Sotnikova A, Qu Y, Kulik H, et al., ChemRxiv. 2024; doi:10.26434/chemrxiv-2024-h722v
Large language models (LLMs) have demonstrated outstanding capabilities in general problem-solving and been shown to improve productivity in certain domains. Thanks to their flexibility, recent work has leveraged them for diverse scientific applications, ranging from predictive modeling, scientific Q&A, and even as autonomous agents towards automation in chemistry. The democratization of high-quality chemistry education faces several challenges, including heterogeneity among sub-fields, limited access to personalized guidance, and an uneven distribution of resources. Additionally, hands-on laboratory experiments, a crucial component of chemistry education, are difficult to scale due to inherent safety risks that necessitate close supervision. We propose that LLMs can help overcome these obstacles by providing scalable solutions that tailor educational content to individual needs, enhancing the overall learning experience. In this perspective, we discuss how LLMs can catalyze chemistry education across multiple dimensions, from preparing and delivering lectures and tackling guidance in both wet lab and computational experiments, to re-thinking evaluation methodologies in the classroom. We also discuss some potential risks of this technology, such as the possibility of generating inaccurate or biased content, and emphasize the need for further development to ensure the successful integration of LLMs in the chemistry classroom.
Using ChatGPT to Support Lesson Planning for the Historical Experiments of Thomson, Millikan, and Rutherford
Ted M. Clark, Matthew Fhaner, Matthew Stoltzfus, and Matt Scott Queen, J. Chem. Ed., 2024, doi.org/10.1021/acs.jchemed.4c00200
Four General Chemistry instructors investigated the use of ChatGPT-4 to improve their lessons plans for the historical experiments of Thomson, Millikan, and Rutherford. The instructors varied in their prior knowledge for these experiments and their initial lessons addressed somewhat different learning objectives. This led to different conversations with the chatbot as the instructors used the resource in different ways and discussed topics they each found relevant. The output from ChatGPT-4 was robust and each instructor identified ways it could be used to improve their instruction. The chatbot was able to accomplish instructional tasks these instructors found useful, such as outlining a lesson plan, recommending resources, discussing instructional strategies, describing calculations, offering explanations for different levels of leaners, and generating assessments. A limitation was its ability create images or visual aids the instructors found useful. Overall, these instructors found the chatbot could support, but not replace, an instructor in a course like General Chemistry.
Students’ Experience of a ChatGPT Enabled Final Exam in a Non-Majors Chemistry Course
Morgan J. Clark, Micke Reynders, and Thomas A. Holme, Journal of Chemical Education, 2024, DOI: 10.1021/acs.jchemed.4c00161
In the field of education, ChatGPT has become a topic of debate for its usefulness as a learning tool. This article focuses on non-science majors’ (n = 29) perceptions of a ChatGPT enabled final exam, where, prior to the exam, students wrote papers on science and sustainability and, during the final exam, students were asked to compare their paper to one produced on the same topic by ChatGPT. Thus, the underlying chemistry, its broader impacts, and connection to sustainability and writing styles were compared. Students’ perceptions were analyzed through a developed coding framework that enabled the visualization of emerging themes. The most common themes revealed that students believed the ChatGPT essay did not read as “human-like”, used more intricate words, and often did not include enough science to support its arguments. Students also noted that their essays provided more chemistry details and were easier to read as they focused on connecting chemistry concepts to their essay topic as well as sustainable policies and practices. Students were impressed, however, by ChatGPT’s ability to discuss various sustainability solutions, policies, and practices. The final exam inspired self-reflection for the students to improve not only their writing but also their analysis of sustainability responses. Overall, students rated the comparative activity as a final exam to be favorable and remarked on the importance of analyzing AI generated work for the future of learning.
Can ChatGPT Enhance Chemistry Laboratory Teaching? Using Prompt Engineering to Enable AI in Generating Laboratory Activities
José Luís Araújo and Isabel Saúde, Journal of Chemical Education, 2024, https://doi.org/10.1021/acs.jchemed.3c00745
The rapid evolution of Artificial Intelligence (AI) is profoundly shaping our society. Among various AI tools, ChatGPT stands out for its user-friendly nature and wide accessibility to the public. However, despite their countless potential benefits, these tools also face significant challenges, especially in sensitive areas like Education. In this publication, we conduct a prompt engineering essay with ChatGPT to understand the potential and challenges of this tool in designing new, high-quality chemistry laboratory activities. We aimed to assess its performance in proposing scientifically and pedagogically suitable protocols for chemistry laboratory activities based on the 11th-grade Portuguese curriculum. The initial exploratory essay was conducted to fine-tune the prompt, followed by the analysis of proposals for the five mandatory laboratory activities in this subject. ChatGPT demonstrates the ability to interpret and reproduce the specialized symbolic language of chemistry, effectively conceptualizing problems and laboratory activities in a clear and understandable manner for a broader audience (i.e., chemistry students). However, it is crucial to highlight the scientific-pedagogical limitations concerning the accuracy and appropriateness of the proposed laboratory activities, particularly in terms of safety and sustainability. Therefore, the use of AI in education should be approached critically and reflectively. While AI holds immense potential to transform the dynamics of teaching and learning, the role and expertise of the Chemistry teacher remain of the utmost importance to ensure the scientific and pedagogical quality of Chemistry classes.
Leveraging ChatGPT for Enhancing Critical Thinking Skills
Ying Guo, Daniel Lee doi.org/10.1021/acs.jchemed.3c00505
This article presents a study conducted at Georgia Gwinnett College (GGC) to explore the use of ChatGPT, a large language model, for fostering critical thinking skills in higher education. The study implemented a ChatGPT-based activity in introductory chemistry courses, where students engaged with ChatGPT in three stages: account setup and orientation, essay creation, and output revision and validation. The results showed significant improvements in students’ confidence to ask insightful questions, analyze information, and comprehend complex concepts. Students reported that ChatGPT provided diverse perspectives and challenged their current ways of thinking. They also expressed an increased utilization of ChatGPT to enhance critical thinking skills and a willingness to recommend it to others. However, challenges included low-quality student comments and difficulties in validating information sources. The study highlights the importance of comprehensive training for educators and access to reliable resources. Future research should focus on training educators in integrating ChatGPT effectively and ensuring student awareness of privacy and security considerations. In conclusion, this study provides valuable insights for leveraging AI technologies like ChatGPT to foster critical thinking skills in higher education.
An Analysis of AI-Generated Laboratory Reports across the Chemistry Curriculum and Student Perceptions of ChatGPT
Joseph K. West, Jeanne L. Franz, Sara M. Hein, Hannah R. Leverentz-Culp, Jonathon F. Mauser, Emily F. Ruff, and Jennifer M. Zemke doi.org/10.1021/acs.jchemed.3c00581
AI technologies are rapidly pervading many areas of our world. AI-driven text generators such as ChatGPT are at the forefront of this due to their simplicity and accessibility. Their influence on higher education is already being observed, and perceptions among faculty and students vary widely. We have undertaken a cross-curriculum study of ChatGPT’s ability to generate laboratory reports. AI-generated reports from general, organic, analytical, physical, inorganic, and biochemistry courses were graded as if they were student reports and analyzed for grade distributions and common strengths and weaknesses. To further gauge ChatGPT’s current impact, we surveyed all students in our Spring 2023 laboratory courses regarding their awareness and use of ChatGPT. We have also laid out suggestions, guidance, and considerations for instructors who wish to prohibit ChatGPT use by their students as well as for those who wish to begin incorporating this new, powerful tool into their teaching.
Using generative artificial intelligence in chemistry education research: prioritizing ethical use and accessibility
Deng JM, Lalani Z, McDermaid LA, Szozda AR, https://doi.org/10.26434/chemrxiv-2023-24zfl (unreviewed preprint)
Generative artificial intelligence (GenAI) has the potential to drastically alter how we teach and conduct research in chemistry education. There have been many reports on the potential uses, limitations, and considerations for GenAI tools in teaching and learning, but there have been fewer discussions of how such tools could be leveraged in educational research, including in chemistry education research. GenAI tools can be used to facilitate and support researchers in every stage of traditional educational research projects (e.g. conducting literature reviews, designing research questions and methods, communicating results). However, these tools also have existing limitations that researchers must be aware of prior to and during use. In this research commentary, we share insights on how chemistry education researchers can use GenAI tools in their work ethically. We also share how GenAI tools can be leveraged to improve accessibility and equity in research.
ChatGPT Needs a Chemistry Tutor, Too
Alfredo J. Leon and Dinesh Vidhani, Journal of Chemical Education, https://doi.org/10.1021/acs.jchemed.3c00288
Artificial intelligence (AI) technology has the potential to revolutionize the education sector. This study sought to determine the efficacy of ChatGPT to correctly answer questions a learner would use and to elucidate how the AI was processing potential prompts. Our goal was to evaluate the role of prompt formats, response consistency, and reliability of ChatGPT responses. Analyzing prompt format, we see that the data do not demonstrate a statistically significant difference between multiple-choice and free-response questions. Neither format achieved scores higher than 37%, and testing at different locations did not improve scores. Interestingly, ChatGPT’s free version provides accurate responses to discipline-specific questions that contain information from unrelated topics as distractors, improving its accuracy over the free-response questions. It is important to consider, while ChatGPT can identify the correct answer within a given context, it may not be able to determine if the answer it selects is correct computationally or through analysis. The results of this study can guide future AI and ChatGPT training practices and implementations to ensure they are used to their fullest potential.
SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models
Xiaoxuan Wang, et al, arXiv Computer Science, https://arxiv.org/abs/2307.10635
Abstract: Recent advances in large language models (LLMs) have demonstrated notable progress on many mathematical benchmarks. However, most of these benchmarks only feature problems grounded in junior and senior high school subjects, contain only multiple-choice questions, and are confined to a limited scope of elementary arithmetic operations. To address these issues, this paper introduces an expansive benchmark suite SciBench that aims to systematically examine the reasoning capabilities required for complex scientific problem solving. SciBench contains two carefully curated datasets: an open set featuring a range of collegiate-level scientific problems drawn from mathematics, chemistry, and physics textbooks, and a closed set comprising problems from undergraduate-level exams in computer science and mathematics. Based on the two datasets, we conduct an in-depth benchmark study of two representative LLMs with various prompting strategies. The results reveal that current LLMs fall short of delivering satisfactory performance, with an overall score of merely 35.80%. Furthermore, through a detailed user study, we categorize the errors made by LLMs into ten problem-solving abilities. Our analysis indicates that no single prompting strategy significantly outperforms others and some strategies that demonstrate improvements in certain problem-solving skills result in declines in other skills. We envision that SciBench will catalyze further developments in the reasoning abilities of LLMs, thereby ultimately contributing to scientific research and discovery.
Do Large Language Models Understand Chemistry? A Conversation with ChatGPT
Pimentel, et al,Journal of Chemical Information and Modeling 2023 63 (6), 1649-1655, https://doi.org/10.1021/acs.jcim.3c00285
Abstract: Large language models (LLMs) have promised a revolution in answering complex questions using the ChatGPT model. Its application in chemistry is still in its infancy. This viewpoint addresses the question of how well ChatGPT understands chemistry by posing five simple tasks in different subareas of chemistry.
Generative AI in Education and Research: Opportunities, Concerns, and Solutions
Alasadi & Baiz, J. Chem. Educ. 2023, 100, 8, 2965–2971, https://doi.org/10.1021/acs.jchemed.3c00323
Abstract: In this article, we discuss the role of generative artificial intelligence (AI) in education. The integration of AI in education has sparked a paradigm shift in teaching and learning, presenting both unparalleled opportunities and complex challenges. This paper explores critical aspects of implementing AI in education to advance educational goals, ethical considerations in scientific publications, and the attribution of credit for AI-driven discoveries. We also examine the implications of using AI-generated content in professional activities and describe equity and accessibility concerns. By weaving these key questions into a comprehensive discussion, this article aims to provide a balanced perspective on the responsible and effective use of these technologies in education, highlighting the need for a thoughtful, ethical, and inclusive approach to their integration.
Exploring the use of large language models (LLMs) in chemical engineering education: Building core course problem models with Chat-GPT
Meng-Lin Tsai, et al, Education for Chemical Engineers, https://doi.org/10.1016/j.ece.2023.05.001
Abstract: This study highlights the potential benefits of integrating Large Language Models (LLMs) into chemical engineering education. In this study, Chat-GPT, a user-friendly LLM, is used as a problem-solving tool. Chemical engineering education has traditionally focused on fundamental knowledge in the classroom with limited opportunities for hands-on problem-solving. To address this issue, our study proposes an LLMs-assisted problem-solving procedure. This approach promotes critical thinking, enhances problem-solving abilities, and facilitates a deeper understanding of core subjects. Furthermore, incorporating programming into chemical engineering education prepares students with vital Industry 4.0 skills for contemporary industrial practices. During our experimental lecture, we introduced a simple example of building a model to calculate steam turbine cycle efficiency, and assigned projects to students for exploring the possible use of LLMs in solving various aspect of chemical engineering problems. Although it received mixed feedback from students, it was found to be an accessible and practical tool for improving problem-solving efficiency. Analyzing the student projects, we identified five common difficulties and misconceptions and provided helpful suggestions for overcoming them. Our course has limitations regarding using advanced tools and addressing complex problems. We further provide two additional examples to better demonstrate how to integrate LLMs into core courses. We emphasize the importance of universities, professors, and students actively embracing and utilizing LLMs as tools for chemical engineering education. Students must develop critical thinking skills and a thorough understanding of the principles behind LLMs, taking responsibility for their use and creations. This study provides valuable insights for enhancing chemical engineering education’s learning experience and outcomes by integrating LLMs.
ChatGPT in physics education: A pilot study on easy-to-implement activities
Bitzenbauer, Cont. Ed. Tech., 15, 3, https://doi.org/10.30935/cedtech/13176
Abstract: Large language models, such as ChatGPT, have great potential to enhance learning and support teachers, but they must be used with care to tackle limitations and biases. This paper presents two easy-to-implement examples of how ChatGPT can be used in physics classrooms to foster critical thinking skills at the secondary school level. A pilot study (n=53) examining the implementation of these examples found that the intervention had a positive impact on students’ perceptions of ChatGPT, with an increase in agreement with statements related to its benefits and incorporation into their daily lives.
Assessment of chemistry knowledge in large language models that generate code
White, et al, Digital Discovery, 2023,2, 368-376, https://doi.org/10.1039/D2DD00087C, unreviewed preprint: https://doi.org/10.26434/chemrxiv-2022-3md3n-v2
Abstract: In this work, we investigate the question: do code-generating large language models know chemistry? Our results indicate, mostly yes. To evaluate this, we introduce an expandable framework for evaluating chemistry knowledge in these models, through prompting models to solve chemistry problems posed as coding tasks. To do so, we produce a benchmark set of problems, and evaluate these models based on correctness of code by automated testing and evaluation by experts. We find that recent LLMs are able to write correct code across a variety of topics in chemistry and their accuracy can be increased by 30 percentage points via prompt engineering strategies, like putting copyright notices at the top of files. Our dataset and evaluation tools are open source which can be contributed to or built upon by future researchers, and will serve as a community resource for evaluating the performance of new models as they emerge. We also describe some good practices for employing LLMs in chemistry. The general success of these models demonstrates that their impact on chemistry teaching and research is poised to be enormous.
Natural language processing models that automate programming will transform chemistry research and teaching
Hocky and White, Digital Discovery, 2022, 1, 79-83, https://doi.org/10.1039/D1DD00009H
Abstract: Natural language processing models have emerged that can generate useable software and automate a number of programming tasks with high fidelity. These tools have yet to have an impact on the chemistry community. Yet, our initial testing demonstrates that this form of artificial intelligence is poised to transform chemistry and chemical engineering research. Here, we review developments that brought us to this point, examine applications in chemistry, and give our perspective on how this may fundamentally alter research and teaching.
What is ChatGPT doing…and why does it work?
Stephen Wolfram Writings: https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/
YouTube video: https://youtu.be/flXrLGPY3SU?t=575
KEYWORDS: GPT, ChatGPT, LLM, teaching, chemistry
Future Trends Forum: Discussing the future of education and technology
Bryan Alexander, Future Trends Forum YouTube video series on AI
How can we teach creatively with AI?
Depauw University professor Harry Brown describes and shows his class experiments.
How should academics react to AI?
How should higher education respond to new developments in artificial intelligence, such as ChatGPT and image creating applications?
How might Higher Education respond to AI?
Computer scientist and ed tech leader Ruben Puentedura explores the implications of large language model artificial intelligence.
Open Source AI for Higher Education
How can higher education grapple with artificial intelligence? We ask this question with a focus on an underdiscussed aspect: open source AI. Our guide is the excellent Forum favorite, computer scientist Ruben R. Puentedura, widely known as the creator of the SAMR framework for understanding the intersection of teaching and tech.
Comment on “Comparing the Performance of College Chemistry Students with ChatGPT for Calculations Involving Acids and Bases”
Joshua Schrier, Journal of Chemical Education, 2024, 10.1021/acs.jchemed.4c00058
In a recent paper in this Journal ( J. Chem. Educ. 2023, 100, 3934−3944), Clark et al. evaluated the performance of the GPT-3.5 large language model (LLM) on ten undergraduate pH calculation problems. They reported that GPT-3.5 gave especially poor results for salt and titration problems, returning the correct results only 10% and 0% of the time, respectively, and that, despite a correct application of heuristics, the LLM made mathematical errors and used flawed strategies. However, these problems are partially mitigated using the more advanced GPT-4 model and entirely corrected using simple prompting and calculator tool use patterns demonstrated herein.