Publications
2025
- RAG Makes Guardrails Unsafe? Investigating Robustness of Guardrails under RAG-style ContextsYining She, Daniel W Peterson, Marianne Menglin Liu, Vikas Upadhyay, Mohammad Hossein Chaghazardi, and 2 more authorsUnder Review, 2025
With the increasing adoption of large language models (LLMs), ensuring the safety of LLM systems has become a pressing concern. External LLM-based guardrail models have emerged as a popular solution to screen unsafe inputs and outputs, but they are themselves fine-tuned or prompt-engineered LLMs that are vulnerable to data distribution shifts. In this paper, taking Retrieval Augmentation Generation (RAG) as a case study, we investigated how robust LLM-based guardrails are against additional information embedded in the context. Through a systematic evaluation of 3 Llama Guards and 2 GPT-oss models, we confirmed that inserting benign documents into the guardrail context alters the judgments of input and output guardrails in around 11% and 8% of cases, making them unreliable. We separately analyzed the effect of each component in the augmented context: retrieved documents, user query, and LLM-generated response. The two mitigation methods we tested only bring minor improvements. These results expose a context-robustness gap in current guardrails and motivate training and evaluation protocols that are robust to retrieval and query composition.
- FairSense: Long-Term Fairness Analysis of ML-Enabled SystemsYining She, Sumon Biswas, Christian Kästner, and Eunsuk KangIn 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE), 2025
Algorithmic fairness of machine learning (ML) models has raised significant concern in the recent years. Many testing, verification, and bias mitigation techniques have been proposed to identify and reduce fairness issues in ML models. The existing methods are model-centric and designed to detect fairness issues under static settings. However, many ML-enabled systems operate in a dynamic environment where the predictive decisions made by the system impact the environment, which in turn affects future decision-making. Such a self-reinforcing feedback loop can cause fairness violations in the long term, even if the immediate outcomes are fair. In this paper, we propose a simulation-based framework called FairSense to detect and analyze long-term unfairness in ML-enabled systems. Given a fairness requirement, FairSense performs Monte-Carlo simulation to enumerate evolution traces for each system configuration. Then, FairSense performs sensitivity analysis on the space of possible configurations to understand the impact of design options and environmental factors on the long-term fairness of the system. We demonstrate FairSense’s potential utility through three real-world case studies: Loan lending, opioids risk scoring, and predictive policing.
@inproceedings{she2025fairsense, title = {FairSense: Long-Term Fairness Analysis of ML-Enabled Systems}, author = {She, Yining and Biswas, Sumon and K{\"a}stner, Christian and Kang, Eunsuk}, booktitle = {2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE)}, year = {2025}, organization = {IEEE Computer Society}, }
2023
- Towards Safe ML-Based Systems in Presence of Feedback LoopsSumon Biswas, Yining She, and Eunsuk KangIn SE4SafeML workshop in ESEC/FSE’2023: The 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, San Francisco, California, Dec 2023
Machine learning (ML) based software is increasingly being deployed in a myriad of socio-technical systems, such as drug monitoring, loan lending, and predictive policing. Although not commonly considered safety-critical, these systems have a potential to cause serious, long-lasting harm to users and the environment due to their close proximity and effect on the society. One type of emerging problem in these systems is unintended side effects from a feedback loop; the decision of ML-based system induces certain changes in the environment, which, in turn, generates observations that are fed back into the system for further decision-making. When this cyclic interaction between the system and the environment repeats over time, its effect may be amplified and ultimately result in an undesirable. In this position paper, we bring attention to the safety risks that are introduced by feedback loops in ML-based systems, and the challenges of identifying and addressing them. In particular, due to their gradual and long-term impact, we argue that feedback loops are difficult to detect and diagnose using existing techniques in software engineering. We propose a set of research problems in modeling, analyzing, and testing ML-based systems to identify, monitor, and mitigate the effects of an undesirable feedback loop.
@inproceedings{biswas2023towards, author = {Biswas, Sumon and She, Yining and Kang, Eunsuk}, booktitle = {SE4SafeML workshop in ESEC/FSE'2023: The 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering}, title = {Towards Safe ML-Based Systems in Presence of Feedback Loops}, location = {San Francisco, California}, month = dec, year = {2023}, }
2022
- Stable Interaction of Autonomous Vehicle Platoons with Human-Driven VehiclesMohammad Pirani, Yining She, Renzhi Tang, Zhihao Jiang, and Yash Vardhan PantIn 2022 American Control Conference (ACC), Dec 2022
@inproceedings{9867210, author = {Pirani, Mohammad and She, Yining and Tang, Renzhi and Jiang, Zhihao and Vardhan Pant, Yash}, booktitle = {2022 American Control Conference (ACC)}, title = {Stable Interaction of Autonomous Vehicle Platoons with Human-Driven Vehicles}, year = {2022}, volume = {}, number = {}, pages = {633-640}, doi = {10.23919/ACC53348.2022.9867210}, }