Data Storage and Maintenance: Managing Data for Long-term Impact [Whitepaper]
The first paper in this series explores the importance of clearly defining and aligning on the metrics that truly measure Patient Support Program (PSP) success. It introduces how data acquisition should be strategically driven by these measurement objectives, emphasizing the four levels of analytics—descriptive, diagnostic, predictive, and prescriptive. The paper underscores the necessity of capturing comprehensive and compliant patient-level, engagement, operational, and outcome data, forming the foundation of meaningful and actionable insights.
Now that we’ve identified all the data points we need for our things we want to measure about our Patient Support Program (PSP), and we are certain we have established all the ways to collect and acquire that data, we need to make sure we can store it in a way that we can continuously and reliably access it for analytics. Easy – that’s why the folks in IT talk about data warehouses! Right?
What is a data warehouse and what happens within it? What are all the data engineers that work “with” the data really doing in this step before data analysts can retrieve the data for analysis? Let’s take a quick tour behind the scenes so we know where the data we collect “goes” before it gets analyzed.
An overview of data storage
Where do all the collected values reside when we’re not “accessing” them? Data gets stored physically in servers and hard drives, and more often “in the cloud” when we subscribe to a service to take care of the physical hardware of storage. You may have heard the terms “database” and “data warehouse,” which are both components of a data storage system. (Other fancy terms include “data lake,” “data mesh” or “data fabric.”) Entire careers are dedicated to data architecture, but here’s a simplified explanation of what it all entails.
A data storage system is like a vast, high-security underground archive, filled with different rooms, each designated for a specific type of information. Some rooms house operational data: operational data is the information required for the day-to-day functioning of a PSP. This includes patient insurance numbers, health care provider (HCP) submissions of start forms, authorizations via signatures, diagnoses, and other HIPAA-restricted data fields such as patient name, phone number, and date of birth. This data is primarily used to facilitate PSP services and the “room” in which this data is stored needs to be well-protected with strict access restriction and encryption to ensure regulatory compliance.
Other rooms are dedicated to analytical data, where documents have been copied, reviewed, categorized, and formatted to support research and decision-making. This includes data points that allow us to calculate metrics we’re interested in, such as turnaround times for benefits verification (for which you’d need timestamps of defined milestones) and patient engagement rates (for which you’d want to find out all records of interactions, type of interaction, the associated timestamps, etc.). The data is “analytical” because you can run computations with it. (In contrast, it makes no sense to find the average patient insurance number — that’s why we’d call that operational data.)
What happens in these rooms dedicated to analytical data? Data engineers extract and at times transform raw data such that analysts can use them. For example: you may want to know the distribution of patient age. To respect privacy and remain compliant, you don’t want to give data analysts every patient’s date of birth. Instead, data engineers copy and extract just the year value from each patient’s date of birth and adds that to a new table for analysts to access. It’s through examining and manipulating analytical data that we can evaluate how a PSP is doing and identify areas of opportunity.
Within each room, filing cabinets (databases) store structured collections of records, organized in a way that allows efficient retrieval and analysis.
A data warehouse is not the entire archive itself, but rather a specialized section of it—one designed explicitly for storing and managing analytical data. Unlike rooms that hold raw, unprocessed records, the data warehouse is meticulously curated and optimized for analysis. But before data can be stored in these well-organized cabinets, it must be processed and transformed—and that’s where data engineers come in. Think of them as the archivists and curators of this underground facility.
When new data arrives, it isn’t just tossed into filing cabinets haphazardly; instead, a copy of the raw data is made, before it’s extracted, reviewed, cleaned, and transformed to fit a well-structured system. Data engineers ensure that duplicate records are removed, inconsistencies are corrected, and all information is formatted according to predefined specifications—just like archivists meticulously sort, label, and cross-reference documents so that researchers (data analysts, business intelligence analysts, etc.) can later retrieve what they need with speed and accuracy. They also determine which records belong in which rooms, ensuring that related data is grouped in ways that make analysis efficient. The more data engineers understand the context of the data that they’re working with, and the more closely they are in communication with data analysts who actually retrieve and use the data, the better they’ll be able to help organize the data in a sensible, easy-to-use manner.
Without the data cleansing, transformation and storage work of data engineers, analysts would be forced to sift through raw, unprocessed records full of errors, missing values, and conflicting formats—a task as frustrating as searching for a critical legal document in an archive where files have been dumped in random boxes with no labels. Data engineers create a functional, reliable storage environment where analytical insights can be extracted with precision by establishing and maintaining the structure within databases.
“It is important for data engineers to understand the context and business use cases of the data so they can organize the data in user-friendly and sensible ways.”
Where should we store our data: in-house or with a vendor?
The question of whether data is stored in-house or managed by a vendor is often more complex than a binary choice. For many PSPs that include a hub, the hub vendor operates on their own customized CRM platform, built specifically to optimize their workflows. These systems are fine-tuned for their internal operations—such as call center routing, benefits verification processing, and adherence outreach—enabling vendor teams to work efficiently and at scale. Because of this, it is natural for hub vendors to capture and store operational data directly within their own environments.
Pharma companies that want access to this data—particularly for performance monitoring or strategic analytics—typically negotiate to purchase a copy of the vendor's data. This can be done through a real-time API feed or periodic batch uploads. (Both options involve cost: the vendor’s data engineering team is doing work, and the pharma company’s data engineering team is also doing work.) Because raw operational data collected by a vendor’s internal CRM is often not structured for analytics and may include sensitive information like Patient Health Information (PHI). To bridge this gap, pharma companies frequently engage data aggregator vendors to handle the cleansing, de-identification, and (re)structuring of the data before it ever reaches their internal systems. These aggregators serve a critical function: ensuring that the pharma company's analytics environment remains HIPAA-compliant and that no unauthorized data fields—such as patient names or contact details—enter the company’s databases or data warehouse.
On the other end of the spectrum, some pharma companies have begun building their own CRM systems and contracting hub vendors to work within their proprietary platforms. In these cases, data is captured and stored in-house from the outset, and the pharma company assumes full responsibility for maintaining the infrastructure, cleaning the data, and ensuring regulatory compliance. This approach provides maximum visibility and control but also requires significant investment in IT systems, data governance, and engineering support.
Ultimately, the decision around who stores the data should be driven by who needs to use the data—and for what purpose. If the brand team is comfortable with the hub vendor reporting on key metrics and there’s no need for granular or real-time insights, then there may be little business value in pulling all that data in-house. On the other hand, companies looking to do advanced analytics (especially analyses that involve integrating additional data sources), cross-vendor comparisons, or long-term strategic planning often need the data internally and in a form they can understand, trust and manipulate.
It’s also common for data engineering and analytics support to be outsourced even when data is stored in-house. A pharma company might own the storage environment but still rely on third-party vendors to manage pipelines, maintain data integrity, and support advanced analysis. Likewise, companies that store their data externally may still employ in-house analysts to interpret results and drive business recommendations.
A comparison of in-house data storage (company-owned systems) versus vendor-managed storage (including external data aggregators)
-
In-house: Full control allows analysts to run complex queries, build custom reports, and explore raw data without relying on third parties
Vendor-managed: Limited access may require vendor approval for raw data retrieval and delay analytical timelines -
In-house: Infrastructure can be customized to align with internal workflows and tools, promoting efficiency
Vendor-managed: Typically standardized; customization is limited to what the vendor offers -
In-house: Full responsibility for implementing and updating HIPAA/GDPR protocols, offering direct oversight
Vendor-managed: Vendor handles compliance but shifts responsibility for oversight and monitoring -
In-house: High upfront investment in servers, software, IT staff; requires ongoing system upkeep and upgrades
Vendor-managed: Lower initial investment; pricing often includes infrastructure and support -
In-house: Greater exposure to breaches if internal protocols are weak; risks include data leaks, corruption, or loss
Vendor-managed: Centralized vendor responsibility; still carries risks if vendor security fails -
In-house: Requires an internal team of engineers and cybersecurity professionals
Vendor-managed: Reduces internal staffing needs but increases reliance on vendor's skillset -
In-house: Scaling requires hardware upgrades and IT support, which can be resource-intensive
Vendor-managed: Seamlessly scalable with usage-based pricing, unless outcomes required exceed vendor’s existing capabilities -
In-house: Longer timelines due to custom system development
Vendor-managed: Faster implementation with pre-built solutions -
In-house: Obviously not applicable
Vendor-managed: High switching costs and technical friction when transitioning vendors; despite any promises, switching vendors inevitably may lead to some disruption in services and impact patient experience -
In-house: Internal teams manage performance and availability, and could be more flexible and responsive depending on organizational priorities
Vendor-managed: Reliant on vendor SLAs for uptime, issue resolution, and responsiveness
The (Hidden) Costs of Data Storage and Maintenance
Storing data is not just about selecting and establishing a system; it’s an ongoing investment. Data that is stored requires continuous upkeep to ensure accuracy, security, and usability. Even in the absence of ingesting new data, database maintenance is critical to facilitate continued use of the data already stored. To use the analogy of a library: books (data) are stored on shelves (databases), but without regular organization and cataloging (maintenance), finding useful information becomes time-consuming and error-prone.
Common data maintenance activities include:
Routine Backups and Disaster Recovery – Ensuring data is duplicated and retrievable in case of system failure
Database Indexing and Optimization – Keeping systems running efficiently by restructuring storage for faster queries
Access Control Reviews – Auditing who has access to sensitive data to prevent breaches
Error Detection and Data Cleansing – Identifying and correcting duplicate, incomplete, or incorrect data before it affects decision-making
Without ongoing maintenance, stored data loses value over time, as outdated, inconsistent, or inaccurate records accumulate and create analytical blind spots.
Of course, typically if a PSP is still running there is always new data coming in from new patient enrollment, continued engagement in the form of refills, re-authorizations, adherence-enforcing interactions, etc. As long as new records are being added to the databases in the data warehouse, data engineers must continue to parse, clean and organize data to keep the whole system useable and useful.
Data storage is not just a technical decision—it is a strategic investment that impacts the efficiency, security, and analytical potential of Patient Support Programs. Organizations that take a proactive, thoughtful approach to PSP data management will be well-positioned to leverage their data for long-term impact, compliance, and competitive advantage.
The next paper in this series will explore data analysis, focusing on how organizations can translate stored data into meaningful insights and actionable strategies for PSP success.