UC Merced Automates Research Data Extraction with Amazon Bedrock LLMs

September 1, 2025

Researchers at UC Merced, led by Dr. Christian Fons-Rosen, needed to extract structured data from over 10,000 declassified ARPA documents, many lengthy and inconsistently formatted, to study the early internet’s impact on science and innovation. Manual extraction was infeasible, and traditional scripting struggled with the data’s variability.

To automate the process, the team built an AI-powered pipeline using Amazon Bedrock and other AWS services. The solution uses:

Amazon S3 for storing documents
AWS Lambda and Amazon SQS for processing and managing tasks
Claude (via Amazon Bedrock) to extract data fields such as contract numbers and institutions
Amazon DynamoDB for storing structured results

By leveraging Bedrock’s large language models (LLMs), UC Merced automated complex document parsing without custom model training, saving thousands of hours and enabling faster research insights.

To learn more about the research, visit AWS's article.

UC Merced Automates Research Data Extraction with Amazon Bedrock LLMs

Additional Links

Academics

Administration