Add developer guide and enhance user instructions

- Introduced a comprehensive developer guide for the Data Shield function.
- Detailed deployment workflows, including release creation and dependency management.
- Explained core components like parameter matching, traversal systems, and actions.
- Updated user guide with sanitization modes and usage instructions.
- Added troubleshooting tips for common issues.
This commit is contained in:
Jonathon Broughton
2025-03-25 01:00:12 +00:00
parent 8191c3c743
commit 002dd3de50
2 changed files with 353 additions and 63 deletions
+273
View File
@@ -0,0 +1,273 @@
# Data Shield - Developer Guide
This document provides technical information for developers working on the Data Shield Speckle Automate function. It covers deployment workflows, core components, and guidance for extending the function.
## Deployment to Speckle Automate
### Creating a Release
The function is automatically deployed to Speckle Automate when a new release is created on GitHub:
1. Ensure your changes are committed and pushed to the main branch
2. Create a `requirements.txt` file (see next section) and commit to main branch
3. Create a new GitHub release:
- Go to your repository on GitHub
- Navigate to "Releases" under the repository name
- Click "Draft a new release"
- Create a **new tag** (e.g., `v1.0.1`)
- Write a descriptive title and release notes
- Click "Publish release"
Creating a new release triggers the GitHub Actions workflow defined in `main.yml`, which builds and publishes the function to Speckle Automate.
### Managing Dependencies
You can use any dependency management tool of your choice for local development (Poetry, pip, uv, etc.), but Speckle Automate requires a `requirements.txt` file for deployment.
**Important**: You must create and commit the `requirements.txt` file to the repository **before** creating a release. The deployment workflow relies on this file being present in the repository.
To generate and commit the requirements file based on your local environment:
- With standard pip: `pip freeze > requirements.txt`
- With uv: `uv pip freeze > requirements.txt`
- With Poetry: `poetry export -f requirements.txt --output requirements.txt --without-hashes`
- Or manually create/edit the file to include necessary dependencies
Then commit the updated file:
```bash
git add requirements.txt
git commit -m "Update requirements.txt"
git push
```
Only after the requirements.txt is committed should you create a new release as described above.
Note that during deployment, the GitHub Actions workflow uses `uv` to install the dependencies, but your local development environment can use any tool you prefer.
### Deployment Workflow Details
The deployment workflow:
1. Checks out the repository
2. Sets up Python 3.13
3. Installs dependencies from `requirements.txt`
4. Extracts the function schema
5. Uses the Speckle Automate GitHub composite action to:
- Build a Docker image with the function
- Push the image to the Speckle Automate registry
- Update the function in Speckle Automate
## Core Components
### Parameter Matching System
The function uses a strategy pattern for parameter matching, allowing flexible and extensible matching rules:
#### ParameterMatcher Classes
* `ParameterMatcher` (ABC): Abstract base class for all matchers
* `PrefixMatcher`: Matches parameters by prefix (with optional case sensitivity)
* `PatternMatcher`: Uses regex/glob patterns for more complex matching
```python
# Example: Creating a custom matcher
class SuffixMatcher(ParameterMatcher):
"""Matches parameters by suffix."""
def matches(self, param_name: str) -> bool:
"""Check if the parameter name ends with the match value."""
if self.strict_mode:
return param_name.endswith(self.match_value)
return param_name.lower().endswith(self.match_value.lower())
```
#### Pattern Checking
The `PatternChecker` class handles both glob-style patterns (e.g., `speckle_*`) and regular expressions (e.g., `/^speckle_\d+$/i`):
* Glob patterns use `fnmatch` for simple wildcard matching
* Regex patterns must be wrapped in slashes (`/pattern/`)
* Case sensitivity is controlled by:
- The global `strict_mode` parameter
- The `/i` flag for regex patterns (overrides `strict_mode`)
### Traversal System
The function uses Speckle's graph traversal system to navigate the complex object hierarchy:
1. `GraphTraversal` from `specklepy.objects.graph_traversal.traversal` defines rules for how to navigate objects
2. `TraversalRule` objects define:
- Conditions for when a rule applies to an object
- Methods to extract the next objects to traverse
3. Our custom rules in `traversal.py` focus on:
- `display_value_rule`: For objects with displayValue/elements properties
- `default_rule`: General fallback for traversing all object members
The traversal system provides contexts that contain:
- The current object being traversed
- The path taken to reach that object
- Other metadata used during traversal
### Parameter Actions
Actions implement the logic for what to do when a parameter match is found:
#### ParameterAction Classes
* `ParameterAction` (ABC): Abstract base class for all actions
* `RemovalAction`: Removes matching parameters from objects
* `AnonymizationAction`: Masks email addresses in parameter values
Each action implements:
- `check()`: Determines if the action should be applied
- `apply()`: Performs the action on a matching parameter
- `report()`: Generates feedback for the Automate context
```python
# Example: Creating a custom action
class TransformAction(ParameterAction):
"""Action to transform parameter values based on a rule."""
def __init__(self, matcher: ParameterMatcher, transform_func) -> None:
"""Initialize with a matcher strategy and transform function."""
super().__init__()
self.matcher = matcher
self.transform_func = transform_func
def check(self, param_name: str) -> bool:
"""Check if parameter matches using the provided matcher."""
return self.matcher.matches(param_name)
def apply(self, parameter, parent_object, containing_dict, parameter_key) -> None:
"""Transform the parameter value."""
param_name = parameter.get("name", parameter_key)
object_id = getattr(parent_object, "id", None)
if "value" in parameter and isinstance(parameter["value"], str):
parameter["value"] = self.transform_func(parameter["value"])
# Track affected object and parameter
self.affected_parameters[object_id].append(param_name)
def report(self, automate_context: AutomationContext) -> None:
"""Report the transformed parameters."""
if not self.affected_parameters:
return
transformed_params = set(param for params in self.affected_parameters.values() for param in params)
message = f"Transformed {len(transformed_params)} parameters"
automate_context.attach_info_to_objects(
category="Transformed_Parameters",
object_ids=list(self.affected_parameters.keys()),
message=message,
)
```
#### Parameter Processing
The `ParameterProcessor` class orchestrates the application of actions:
1. Takes an action and a flag indicating whether to check parameter names or values
2. Processes traversal contexts by examining properties and parameters
3. Handles both modern (v3) and legacy (v2) Speckle objects
4. Applies the action to matching parameters
5. Tracks processed objects for reporting
### Adding New Sanitization Modes
To add a new sanitization mode:
1. Update the `SanitizationMode` enum in `inputs.py`:
```python
class SanitizationMode(Enum):
PREFIX_MATCHING = "Prefix Matching"
PATTERN_MATCHING = "Pattern Matching"
ANONYMIZATION = "Anonymization"
NEW_MODE = "Your New Mode" # Add your new mode here
```
2. Create any necessary new matchers or actions in `actions.py`
3. Update the `automate_function` in `function.py` to handle the new mode:
```python
if function_inputs.sanitization_mode == SanitizationMode.NEW_MODE:
# Add specific validation for your new mode
action = create_your_new_action() # Create a factory function for your action
```
## Function Flow
The main function flow is:
1. User selects a sanitization mode and provides parameters via the UI
2. Function creates the appropriate action based on the mode
3. Version data is received from Speckle
4. Traversal rules navigate through the object tree
5. Parameters are processed with the selected action
6. Results are reported back to the Automate context
7. A new sanitized version is created
## Additional Resources
- [Speckle Automate Documentation](https://automate.speckle.dev/)
- [Speckle Python SDK Documentation](https://speckle.guide/dev/python.html)
- [Pydantic Documentation](https://docs.pydantic.dev/) (for function inputs)
## Testing
### Local Testing with pytest
pytest is the recommended way to test Speckle Automate functions locally. This allows you to verify your function works correctly before deploying it.
1. Set up your test environment by creating a `.env` file with your Speckle credentials:
```
SPECKLE_TOKEN="9a110400812dc32b57e524c9c6f1a2000ebabec1c9"
SPECKLE_SERVER_URL="https://app.speckle.systems/"
SPECKLE_PROJECT_ID="d94c63b75d"
SPECKLE_AUTOMATION_ID="99896f98b6"
```
2. Run the tests with your preferred method:
```bash
# Using pytest directly
python -m pytest
# Or if using a virtual environment tool
# poetry run pytest
```
The tests in `test_function.py` provide examples of how to set up the automation context and run the function with different inputs.
### Setting Up a Test Automation
To properly test your function, you should:
1. Create a test automation in Speckle Automate
2. Use the provided IDs and token in your `.env` file
3. This allows your tests to interact with actual Speckle objects and verify the function's behavior
The `speckle-automate` package provides fixtures that help with loading these environment variables and setting up the test context automatically.
Example test setup:
```python
def test_function_run(test_automation_run_data: AutomationRunData, test_automation_token: str) -> None:
"""Run an integration test for the automate function."""
automation_context = AutomationContext.initialize(test_automation_run_data, test_automation_token)
# Run your function with test inputs
automate_sdk = run_function(
automation_context,
automate_function,
FunctionInputs(sanitization_mode=SanitizationMode.PATTERN_MATCHING, parameter_input="test_*", strict_mode=True),
)
# Verify the results
assert automate_sdk.run_status == AutomationStatus.SUCCEEDED
```
The fixtures `test_automation_run_data` and `test_automation_token` are provided by the `speckle-automate` package and automatically use the values from your `.env` file.
+80 -63
View File
@@ -1,100 +1,117 @@
# Speckle Automate function template - Python
# 🛡️ Data Shield — User Guide
This template repository is for a Speckle Automate function written in Python
using the [specklepy](https://pypi.org/project/specklepy/) SDK to interact with Speckle data.
**Data Shield** is a Speckle Automate function that helps you keep your model data clean, safe, and share-ready. Whether you're sending models to clients, collaborators, or just tidying up before archiving — Data Shields got your back.
This template contains the full scaffolding required to publish a function to the Automate environment.
It also has some sane defaults for development environment setups.
---
## Getting started
## ✨ What Data Shield Does
1. Use this template repository to create a new repository in your own / organization's profile.
Data Shield scans your Speckle model for parameters youd rather not share and takes care of them for you. It creates a fresh, sanitized version of your model while keeping the original intact.
Register the function
### Why youll love it:
- **Privacy Protection** — Say goodbye to accidentally sharing sensitive data.
- **Data Compliance** — Stay on the right side of data protection policies.
- **Confident Collaboration** — Share models without oversharing.
### Add new dependencies
---
To add new Python package dependencies to the project, use the following:
`$ poetry add pandas`
## Sanitization Modes
### Change launch variables
We know one size doesnt fit all, so Data Shield offers three modes to suit your style:
Describe how the launch.json should be edited.
### Prefix Matching
> **Best for:** Simple, predictable naming conventions.
### Github Codespaces
Remove parameters that start with a specific prefix.
> Example: Want to remove everything starting with `secret_`? Just set that prefix and Data Shield does the rest.
Create a new repo from this template, and use the create new code.
**Setup**:
- Add your prefix (like `internal_`, `private_`, or `secret_`)
- Toggle strict mode for case sensitivity (on or off — your call)
### Using this Speckle Function
---
1. [Create](https://automate.speckle.dev/) a new Speckle Automation.
1. Select your Speckle Project and Speckle Model.
1. Select the deployed Speckle Function.
1. Enter a phrase to use in the comment.
1. Click `Create Automation`.
### Pattern Matching
> **Best for:** Wildcards, regex fans, and complex patterns.
## Getting Started with Creating Your Own Speckle Function
Get fancy and use `*`, `?`, or full regular expressions.
1. [Register](https://automate.speckle.dev/) your Function with [Speckle Automate](https://automate.speckle.dev/) and select the Python template.
1. A new repository will be created in your GitHub account.
1. Make changes to your Function in `main.py`. See below for the Developer Requirements and instructions on how to test.
1. To create a new version of your Function, create a new [GitHub release](https://docs.github.com/en/repositories/releasing-projects-on-github/managing-releases-in-a-repository) in your repository.
**Examples**:
- `client_*` matches anything that starts with `client_`
- `?_internal` matches `a_internal`, `b_internal`
- `/^(secret|private)_.*$/i` matches parameters starting with `secret_` or `private_`, ignoring case
## Developer Requirements
---
1. Install the following:
- [Python 3](https://www.python.org/downloads/)
- [Poetry](https://python-poetry.org/docs/#installing-with-the-official-installer)
1. Run `poetry shell && poetry install` to install the required Python packages.
### Anonymization
> **Best for:** Keeping the structure, hiding the details.
## Building and Testing
Automatically detect email addresses inside parameter values and anonymize them.
> Example: `john.doe@example.com` becomes `j***@example.com`
The code can be tested locally by running `poetry run pytest`.
No setup needed. Just select and go.
### Building and running the Docker Container Image
---
Running and testing your code on your machine is a great way to develop your Function; the following instructions are a bit more in-depth and only required if you are having issues with your Function in GitHub Actions or on Speckle Automate.
## How to Use Data Shield
#### Building the Docker Container Image
1. **Set up your automation:**
- In your Speckle project, head to **Automations**
- Click **Add Automation** and choose **Data Shield**
- Set your trigger (like “on new commit”)
The GitHub Action packages your code into the format required by Speckle Automate. This is done by building a Docker Image, which Speckle Automate runs. You can attempt to build the Docker Image locally to test the building process.
2. **Configure your mode:**
- Choose Prefix, Pattern, or Anonymization
- Add your prefix or pattern if needed
- Toggle strict mode if you want case sensitivity
To build the Docker Container Image, you must have [Docker](https://docs.docker.com/get-docker/) installed.
3. **Run it:**
- Itll run automatically when triggered — or you can manually run on specific commits
Once you have Docker running on your local machine:
4. **Check results:**
- Sanitized models show up under the `processed/` branch
- Youll get a run report showing what got cleaned
- Highlighted changes can be seen directly in the viewer
1. Open a terminal
1. Navigate to the directory in which you cloned this repository
1. Run the following command:
::: 💡 Tips & Tricks
```bash
docker build -f ./Dockerfile -t speckle_automate_python_example .
```
- **Test first!** — Run it on a small test model before going full production.
- **Start simple.** Use prefix matching for clear conventions, pattern matching for complexity, or anonymization for safe sharing.
- **Regex pro tip:**
- Wrap your regex in `/`
- Add `i` for case-insensitive matching
- Use `^` (start) and `$` (end) for tighter control
:::
#### Running the Docker Container Image
Once the GitHub Action has built the image, it is sent to Speckle Automate. When Speckle Automate runs your Function as part of an Automation, it will run the Docker Container Image. You can test that your Docker Container Image runs correctly locally.
## 📚 Example Workflows
1. To then run the Docker Container Image, run the following command:
### → Prepping for external sharing
- Use pattern matching with `/^(internal|private|confidential)_.*$/i`
- Run before sending out models
- Share confidently!
```bash
docker run --rm speckle_automate_python_example \
python -u main.py run \
'{"projectId": "1234", "modelId": "1234", "branchName": "myBranch", "versionId": "1234", "speckleServerUrl": "https://speckle.xyz", "automationId": "1234", "automationRevisionId": "1234", "automationRunId": "1234", "functionId": "1234", "functionName": "my function", "functionLogo": "base64EncodedPng"}' \
'{}' \
yourSpeckleServerAuthenticationToken
```
### → Anonymizing client data
- Select Anonymization mode
- Run on any models with contact details
- Use sanitized versions for demos, public decks, or sales pitches
Let's explain this in more detail:
### → Stripping out project-specific baggage
- Prefix matching with something like `projectX_`
- Clean your models before turning them into templates
`docker run—-rm speckle_automate_python_example` tells Docker to run the Docker Container Image we built earlier. `speckle_automate_python_example` is the name of the Docker Container Image. The `--rm` flag tells Docker to remove the container after it has finished running, freeing up space on your machine.
---
The line `python -u main.py run` is the command run inside the Docker Container Image. The rest of the command is the arguments passed to the command. The arguments are:
## 🛠️ Troubleshooting
- `'{"projectId": "1234", "modelId": "1234", "branchName": "myBranch", "versionId": "1234", "speckleServerUrl": "https://speckle.xyz", "automationId": "1234", "automationRevisionId": "1234", "automationRunId": "1234", "functionId": "1234", "functionName": "my function", "functionLogo": "base64EncodedPng"}'` - the metadata that describes the automation and the function.
- `{}` - the input parameters for the function the Automation creator can set. Here, they are blank, but you can add your parameters to test your function.
- `yourSpeckleServerAuthenticationToken`—the authentication token for the Speckle Server that the Automation can connect to. This is required to interact with the Speckle Server, for example, to get data from the Model.
- **Not matching anything?** Double-check your pattern or prefix.
- **Case mismatch?** Try turning off strict mode.
- **Only partly sanitized?** Some complex models might need multiple passes.
- **Errors?** Check run logs in the automation report for clues.
## Resources
---
- [Learn](https://speckle.guide/dev/python.html) more about SpecklePy and interacting with Speckle from Python.
## 🤔 Still stuck?
No worries — weve got your back.
👉 Post your questions in the [Speckle Community Forum](https://speckle.community) and someone from the team (or one of our awesome community members) will help you out!