Publications | Yining She

2026

Symbolic Guardrails for Domain-Specific Agents: Stronger Safety and Security Guarantees Without Sacrificing Utility

Yining Hong, Yining She, Eunsuk Kang, Christopher S Timperley, and Christian Kästner

Under Review, 2026

Abs arXiv Bib

AI agents that interact with their environments through tools enable powerful applications, but in high-stakes business settings, unintended actions can cause unacceptable harm, such as privacy breaches and financial loss. Existing mitigations, such as training-based methods and neural guardrails, improve agent reliability but cannot provide guarantees. We study symbolic guardrails as a practical path toward strong safety and security guarantees for AI agents. Our three-part study includes a systematic review of 80 state-of-the-art agent safety and security benchmarks to identify the policies they evaluate, an analysis of which policy requirements can be guaranteed by symbolic guardrails, and an evaluation of how symbolic guardrails affect safety, security, and agent success on Tau2-Bench, CAR-bench, and MedAgentBench. We find that 85% of benchmarks lack concrete policies, relying instead on underspecified high-level goals or common sense. Among the specified policies, 74% of policy requirements can be enforced by symbolic guardrails, often using simple, low-cost mechanisms. These guardrails improve safety and security without sacrificing agent utility. Overall, our results suggest that symbolic guardrails are a practical and effective way to guarantee some safety and security requirements, especially for domain-specific AI agents. We release all codes and artifacts at https://github.com/hyn0027/agent-symbolic-guardrails.
@article{hong2026symbolicguardrails, title = {Symbolic Guardrails for Domain-Specific Agents: Stronger Safety and Security Guarantees Without Sacrificing Utility}, author = {Hong, Yining and She, Yining and Kang, Eunsuk and Timperley, Christopher S and Kästner, Christian}, journal = {Under Review}, year = {2026}, }

2025

RAG Makes Guardrails Unsafe? Investigating Robustness of Guardrails under RAG-style Contexts

Yining She, Daniel W Peterson, Marianne Menglin Liu, Vikas Upadhyay, Mohammad Hossein Chaghazardi, and 2 more authors

Under Review, 2025

Abs arXiv Bib

With the increasing adoption of large language models (LLMs), ensuring the safety of LLM systems has become a pressing concern. External LLM-based guardrail models have emerged as a popular solution to screen unsafe inputs and outputs, but they are themselves fine-tuned or prompt-engineered LLMs that are vulnerable to data distribution shifts. In this paper, taking Retrieval Augmentation Generation (RAG) as a case study, we investigated how robust LLM-based guardrails are against additional information embedded in the context. Through a systematic evaluation of 3 Llama Guards and 2 GPT-oss models, we confirmed that inserting benign documents into the guardrail context alters the judgments of input and output guardrails in around 11% and 8% of cases, making them unreliable. We separately analyzed the effect of each component in the augmented context: retrieved documents, user query, and LLM-generated response. The two mitigation methods we tested only bring minor improvements. These results expose a context-robustness gap in current guardrails and motivate training and evaluation protocols that are robust to retrieval and query composition.
@article{she2025rag, title = {RAG Makes Guardrails Unsafe? Investigating Robustness of Guardrails under RAG-style Contexts}, author = {She, Yining and Peterson, Daniel W and Liu, Marianne Menglin and Upadhyay, Vikas and Chaghazardi, Mohammad Hossein and Kang, Eunsuk and Roth, Dan}, journal = {Under Review}, year = {2025}, }
FairSense: Long-Term Fairness Analysis of ML-Enabled Systems

Yining She, Sumon Biswas, Christian Kästner, and Eunsuk Kang

In 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE), 2025

Abs arXiv Bib PDF

Algorithmic fairness of machine learning (ML) models has raised significant concern in the recent years. Many testing, verification, and bias mitigation techniques have been proposed to identify and reduce fairness issues in ML models. The existing methods are model-centric and designed to detect fairness issues under static settings. However, many ML-enabled systems operate in a dynamic environment where the predictive decisions made by the system impact the environment, which in turn affects future decision-making. Such a self-reinforcing feedback loop can cause fairness violations in the long term, even if the immediate outcomes are fair. In this paper, we propose a simulation-based framework called FairSense to detect and analyze long-term unfairness in ML-enabled systems. Given a fairness requirement, FairSense performs Monte-Carlo simulation to enumerate evolution traces for each system configuration. Then, FairSense performs sensitivity analysis on the space of possible configurations to understand the impact of design options and environmental factors on the long-term fairness of the system. We demonstrate FairSense’s potential utility through three real-world case studies: Loan lending, opioids risk scoring, and predictive policing.
@inproceedings{she2025fairsense, title = {FairSense: Long-Term Fairness Analysis of ML-Enabled Systems}, author = {She, Yining and Biswas, Sumon and K{\"a}stner, Christian and Kang, Eunsuk}, booktitle = {2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE)}, year = {2025}, organization = {IEEE Computer Society}, }

2023

Towards Safe ML-Based Systems in Presence of Feedback Loops

Sumon Biswas, Yining She, and Eunsuk Kang

In SE4SafeML workshop in ESEC/FSE’2023: The 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, San Francisco, California, Dec 2023

Abs Bib HTML PDF

Machine learning (ML) based software is increasingly being deployed in a myriad of socio-technical systems, such as drug monitoring, loan lending, and predictive policing. Although not commonly considered safety-critical, these systems have a potential to cause serious, long-lasting harm to users and the environment due to their close proximity and effect on the society. One type of emerging problem in these systems is unintended side effects from a feedback loop; the decision of ML-based system induces certain changes in the environment, which, in turn, generates observations that are fed back into the system for further decision-making. When this cyclic interaction between the system and the environment repeats over time, its effect may be amplified and ultimately result in an undesirable. In this position paper, we bring attention to the safety risks that are introduced by feedback loops in ML-based systems, and the challenges of identifying and addressing them. In particular, due to their gradual and long-term impact, we argue that feedback loops are difficult to detect and diagnose using existing techniques in software engineering. We propose a set of research problems in modeling, analyzing, and testing ML-based systems to identify, monitor, and mitigate the effects of an undesirable feedback loop.
@inproceedings{biswas2023towards, author = {Biswas, Sumon and She, Yining and Kang, Eunsuk}, booktitle = {SE4SafeML workshop in ESEC/FSE'2023: The 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering}, title = {Towards Safe ML-Based Systems in Presence of Feedback Loops}, location = {San Francisco, California}, month = dec, year = {2023}, }

2022

Stable Interaction of Autonomous Vehicle Platoons with Human-Driven Vehicles

Mohammad Pirani, Yining She, Renzhi Tang, Zhihao Jiang, and Yash Vardhan Pant

In 2022 American Control Conference (ACC), Dec 2022

DOI Bib HTML

@inproceedings{9867210,
  author = {Pirani, Mohammad and She, Yining and Tang, Renzhi and Jiang, Zhihao and Vardhan Pant, Yash},
  booktitle = {2022 American Control Conference (ACC)},
  title = {Stable Interaction of Autonomous Vehicle Platoons with Human-Driven Vehicles},
  year = {2022},
  volume = {},
  number = {},
  pages = {633-640},
  doi = {10.23919/ACC53348.2022.9867210},
}