Asset Version History extension - prelim feedback


(John gill) #1

Context - https://marketplace.squiz.net/extensions/asset-version-history

Firstly, thanks for Squiz for publishing the new versioning extension on marketplace so we can take it for a test toast before getting it installed in production. It makes all the difference in easing adoption. I’ve given it a whirl on a test instance and cobbled together some notes and requests, most of which I’m sure the devs have already thought of.

No pagination

Presumably just because this is a bit MVP, there’s no way to get history records past the first page.

Serialised attributes not stored

Serialised attributes (like questions on form assets) don’t appear to be captured in the history JSON.

Suggestion - API field formats

The API behind the Version History screen would be useful to consume programmatically at times, but some of the response fields are already formatted for display (version_date & status).
/__avh/versionList/49

   "versions" : [
      {
         "version_date" : "Feb 20th 2020 - 02:42am",
         "status_icon" : "<span title=\"Live\" class=\"sq-status-square sq-status-16\"></span>",
         "user_name" : "Root User",
         "version" : "0.2.1"
      },

It would be handy for date to be ISO8601 or similar, and for the status to be sent as the code rather than html.

Request - Exempt asset types

It would be very useful to be able to (either by default, or opt-in) exempt form submissions from the version history system. Form submissions contents generally don’t change after initial creation, and can be very sensitive in nature. You don’t want to have to sanitise the version history to remove things like that, and I suspect that for most use cases submissions would be out of scope for version concerns (because they’re not content)

Request - add URLs to version_data

Currently the web paths for an asset are stored with each version, but not the full URLs. Technically this information should be findable from the links (with effort), but it would be useful to store an extra copy of this info with each version_data.

Content Template compatibility

Currently when you update a page it stores a copy of each container’s “contents” against that page, but it doesn’t store the MD fields for any CCT which is attached those containers. I think that the page should either store neither (because that information is stored in the version history of the container assets) or it should have the MD fields as well to get a complete picture.

Storage requirements and performance?

My main concern is with how much data is going to end up in this table and what that means for performance and the recommended frequency of truncation.

Considering an example of a page with a big container (say 100kB of html) and a small container

image

going through the hypothetical changes

  • Change to Safe Edit
  • Edit Small Container
  • Apply for Approval
  • Approve & Make Live

After that process, which didn’t change the big container at all, sq_ast_vers_history will contain another 7 copies of the big container’s content. By comparison, rollback and sq_internal_msg would not have created any new copies. For large pages (and we have some blobs of HTML approaching 1MB) this might be significant.

I don’t know for sure this is a big problem, but as users we would always want to keep as much history as possible in this table. If it’s significantly over-storing that may impact how much we can safely keep.

This could be partially mitigated by not duplicating container detail on page assets - instead version_data could store the container IDs and the Version HIstory screen could query for the version_data stored against those containers with the same correlation_id. That approach would bring the above example from 7 down to 3 (just the status changes) and would also fix the content template compatibility (above).

Suggestion - correlation_id added to sq_internal_msg?

I was thinking that since sq_ast_vers_history doesn’t capture the detail of “what changed?” we will want to keep using the existing logs for some purposes (probably with attributes.fulllog.scalar blacklisted). It would be helpful for using the two in concert if sq_internal_msg rows had the same correlation_id as sq_ast_vers_history. We’d also need a way to query sq_ast_vers_history by correlation_id to make use of it.

Request - Access the history of assets no longer in the system

One of the issues with the existing logs is there is no way to access information for assets that have been deleted from the system. It would be very useful to have a method of accessing this data for arbitrary AssetIDs. Obviously there are permission issues, so maybe this would need to be System Administrators only.

Contexts compatibility

Version history doesn’t seem to play nicely with Contexts (although it’s pretty fine with Variations). I don’t use contexts so it’s not pressing for me, just something I noticed.


#2

Great feedback.
I’d like to add that some of our pages go through dozens of micro versions per minor version, as the editors chop and change between publishing. We would not be looking to retain each of those micro versions, rather long term history of changes that are made live to the public. A configuration option to limit snapshots to minor versions would be useful.


(Bart Banda) #3

Thanks for the feedback guys, very useful.

Thanks for all of those detailed feedback items JohnGill, some of which we already have plans for addressing which is good.

John, out of your list, which would you say would be a top 3 in terms of priority of problems to solve?


(John gill) #4
  1. Main one would be the ability exempt by Asset Type (either configurable, or just a hardcoded exemption for Form Submissions). That way we could start collecting data without the concerns of it being tainted by sensitive form data.

  2. “Overstorage” is trickier, but if there are any changes to be made here then obviously the sooner the better. I suspect this one is a bit delicate, and the new hooks provided by 5.5.6.0 limit the options available to the package, but as it stands it feels like it’s erring too far towards the wasteful side.

  3. Since original Logs will complement Version History rather than be replaced by it, some improvements here would be great. Correlation_id to track what happened within the same request (and tie it to Version History snapshots) would be very useful, as would a speed improvement for getting the “Asset Listing Create” or “Asset Listing Delete” log types. Currently these do a wildcard LIKE query and are unusable in large systems. An index or schema change to bring these log categories into line with the others performance-wise would be excellent.

The rest are mostly view-concerns, so they could be added down the track without impacting the data being stored.