You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
172 lines
4.8 KiB
Markdown
172 lines
4.8 KiB
Markdown
# FHIR-Based Healthcare Document Processing POC
|
|
|
|
This proof-of-concept application demonstrates a secure, FHIR-compliant system for processing healthcare documents with OCR and storing the extracted data via a FHIR API.
|
|
|
|
## Project Structure
|
|
|
|
- `ocr_module/`: OCR functionality using Tesseract for document text extraction
|
|
- `fhir_module/`: FHIR API implementation using HAPI FHIR for data storage and retrieval
|
|
- `security_module/`: Authentication and authorization using OAuth2/OpenID Connect
|
|
- `compliance_module/`: Audit logging and compliance features
|
|
- `api/`: Main application API that integrates all modules
|
|
- `docker/`: Docker and docker-compose configuration files
|
|
|
|
## Features
|
|
|
|
- Document processing with OCR to extract healthcare information
|
|
- FHIR-compliant data storage and retrieval
|
|
- OAuth2/OpenID Connect authentication
|
|
- Comprehensive audit logging
|
|
- Role-based access control
|
|
- Local deployment with Docker
|
|
|
|
## Requirements
|
|
|
|
- Docker and Docker Compose
|
|
- Python 3.9+
|
|
- Tesseract OCR engine (automatically installed in Docker)
|
|
|
|
## Getting Started
|
|
|
|
### Using Docker Compose (Recommended)
|
|
|
|
The easiest way to run the application is using Docker Compose:
|
|
|
|
1. Clone the repository
|
|
2. Navigate to the project root directory
|
|
3. Run Docker Compose:
|
|
|
|
```bash
|
|
docker-compose up
|
|
```
|
|
|
|
This will start the following services:
|
|
- FHIR OCR application at http://localhost:8000
|
|
- Keycloak authentication server at http://localhost:8181
|
|
- HAPI FHIR server at http://localhost:8090 (included but not integrated in the POC)
|
|
|
|
### Manual Setup
|
|
|
|
If you prefer to run the application without Docker:
|
|
|
|
1. Install Tesseract OCR engine on your system:
|
|
- **Ubuntu/Debian**: `sudo apt-get install tesseract-ocr`
|
|
- **macOS**: `brew install tesseract`
|
|
- **Windows**: Download installer from [Tesseract GitHub page](https://github.com/UB-Mannheim/tesseract/wiki)
|
|
|
|
2. Create and activate a Python virtual environment:
|
|
```bash
|
|
python -m venv venv
|
|
source venv/bin/activate # On Windows: venv\Scripts\activate
|
|
```
|
|
|
|
3. Install Python dependencies:
|
|
```bash
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
4. Run the application:
|
|
```bash
|
|
uvicorn api.app:app --host 0.0.0.0 --port 8000
|
|
```
|
|
|
|
## Usage
|
|
|
|
### 1. Authentication
|
|
|
|
To access the API, you need to obtain an authentication token:
|
|
|
|
```bash
|
|
curl -X POST http://localhost:8000/auth/token \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"username": "admin", "password": "password"}'
|
|
```
|
|
|
|
This will return a JSON response with an access token:
|
|
|
|
```json
|
|
{
|
|
"access_token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
|
|
"token_type": "bearer"
|
|
}
|
|
```
|
|
|
|
### 2. Processing a Document
|
|
|
|
Use the token to upload and process a document:
|
|
|
|
```bash
|
|
curl -X POST http://localhost:8000/ocr/process \
|
|
-H "Authorization: Bearer YOUR_TOKEN_HERE" \
|
|
-F "file=@/path/to/document.jpg" \
|
|
-F "process_as=insurance_card"
|
|
```
|
|
|
|
The API will return the OCR results and the IDs of the created FHIR resources.
|
|
|
|
### 3. Retrieving FHIR Resources
|
|
|
|
Get a patient resource:
|
|
|
|
```bash
|
|
curl -X GET http://localhost:8000/fhir/Patient/PATIENT_ID \
|
|
-H "Authorization: Bearer YOUR_TOKEN_HERE"
|
|
```
|
|
|
|
Get an observation resource:
|
|
|
|
```bash
|
|
curl -X GET http://localhost:8000/fhir/Observation/OBSERVATION_ID \
|
|
-H "Authorization: Bearer YOUR_TOKEN_HERE"
|
|
```
|
|
|
|
### 4. Testing the OCR Flow Locally
|
|
|
|
For testing the OCR to FHIR flow without the API, use the provided test script:
|
|
|
|
```bash
|
|
python test_ocr_flow.py --image sample_data/your_image.jpg --output test_results
|
|
```
|
|
|
|
## Keycloak Setup
|
|
|
|
The Docker Compose configuration includes a Keycloak server for authentication.
|
|
For production use, you would need to:
|
|
|
|
1. Access the Keycloak admin console at http://localhost:8181
|
|
2. Log in with username `admin` and password `admin`
|
|
3. Create a new realm (e.g., `fhir-ocr`)
|
|
4. Create a new client (e.g., `fhir-ocr-client`)
|
|
5. Configure client access type as "confidential"
|
|
6. Add redirect URIs for your application
|
|
7. Create roles (e.g., `user`, `admin`)
|
|
8. Create users and assign roles
|
|
|
|
## Security and Compliance
|
|
|
|
The application includes several security and compliance features:
|
|
|
|
- **Authentication**: OAuth2/OpenID Connect with Keycloak
|
|
- **Authorization**: Role-based access control
|
|
- **Audit Logging**: All API calls and data access are logged
|
|
- **Privacy Filtering**: Sensitive data can be masked or redacted
|
|
|
|
## Docker Environment Variables
|
|
|
|
The following environment variables can be set in the docker-compose.yml file:
|
|
|
|
- `ENVIRONMENT`: `development` or `production`
|
|
- `JWT_SECRET_KEY`: Secret key for JWT token signing
|
|
- `JWT_ALGORITHM`: Algorithm for JWT token signing
|
|
- `ACCESS_TOKEN_EXPIRE_MINUTES`: Token expiration time in minutes
|
|
|
|
## API Documentation
|
|
|
|
API documentation is available at http://localhost:8000/docs when the application is running.
|
|
|
|
## License
|
|
|
|
This project uses open-source components:
|
|
- Tesseract OCR (Apache License 2.0)
|
|
- HAPI FHIR (Apache License 2.0)
|
|
- Keycloak (Apache License 2.0) |