George A. Baker

Researcher, University of Colorado Boulder's Institute of Cognitive Science
Research Assistant, University of Utah's Department of Operations and Information Systems

prof_pic.jpg

george.baker@colorado.edu

I’m an early-career researcher with a background in computer and data science, interested in applied AI. In the past I’ve worked on educational and medical (incl. medicinal chemistry) applications of language models. I’m especially interested in the intersection of graphs and language (e.g. semantic graphs like AMR/UMR, knowledge graphs, chemical graphs, etc.), as sequential language models struggle to leverage structural information from text.

My present work at CU’s ICS, funded by the NSF National AI Institute for Student-AI Teaming, is focused on leveraging discourse structure to improve classroom AI agents. By enforcing an understanding of discourse structure and dialogue acts, language models can be integrated more effectively and predictably in classroom settings, where there are many simultaneous participants whose intentions are often challenging to resolve.

At the University of Utah I’m currently studying decentralized content moderation with Bharadwaj Kadiyala. Self-governing online communities largely determine their own rules, norms, and enforcement thereof. We are building data collection systems to study how these choices affect outcome variables like polarization, toxicity, posting activity, and free speech.

selected publications

  1. acl-2025.png
    MALAMUTE: A Multilingual, Highly-granular, Template-free, Education-based Probing Dataset
    Sagi Shaier, George Arthur Baker, Chiranthan Sridhar, Lawrence Hunter, and 1 more author
    In Findings of the Association for Computational Linguistics: ACL 2025, Jul 2025
  2. acl-2025.png
  3. acl-2024.png
    Generating Harder Cross-document Event Coreference Resolution Datasets using Metaphoric Paraphrasing
    Shafiuddin Rehan Ahmed, Zhiyong Eric Wang, George Arthur Baker, Kevin Stowe, and 1 more author
    In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Aug 2024
  4. lrec-coling.jpg
    Linear Cross-document Event Coreference Resolution with X-AMR
    Shafiuddin Rehan Ahmed, George Arthur Baker, Evi Judge, Michael Reagan, and 3 more authors
    In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), May 2024
  5. lrec-coling.jpg
    Multimodal Cross-Document Event Coreference Resolution Using Linear Semantic Transfer and Mixed-Modality Ensembles
    Abhijnan Nath, Huma Jamil, Shafiuddin Rehan Ahmed, George Arthur Baker, and 4 more authors
    In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), May 2024
  6. starsem.jpg
    MASSIVE Multilingual Abstract Meaning Representation: A Dataset and Baselines for Hallucination Detection
    Michael Regan, Shira Wein, George Baker, and Emilio Monti
    In Proceedings of the 13th Joint Conference on Lexical and Computational Semantics (*SEM 2024), Jun 2024