WordPress to Squiz Matrix migration


(Avi Miller) #1

Hey kids,

 

So, I've been out of the Matrix game for a while (as those of you who were around 5+ years ago would know). Aside from my regular day job, I volunteer for a community organisation here in Melbourne as their webmaster and we run what is a fairly large WordPress multisite installation. 

 

Fairly large in this context means:

 

~12,000 posts across 270 blogs

~120 registered authors

~200GB of data (mostly MP3s for podcasting)

 

Currently, we're hosted by Bulletproof, which is super peachy keen, because they're amazingly fast and very responsive. However, I think I've (ab)used WordPress about as far as it can go and am now considering whether it makes sense to switch away from WordPress to something else. Naturally, Squiz Matrix is an option.

 

Has anyone done a WordPress to Matrix migration? I know I could code something (or could 5 years ago. Couldn't have changed much since then, surely?), but I'm wondering what's possible "out-of-the-box".

 

Ta!


(Scott Hall) #2

Sorry Avi, just in shock and awe. Saw your post and this came to mind... https://www.youtube.com/watch?v=lBEn3a4TIUw

 

Will see if Mr Schoen has some wise words for you here, he is our expert on all things migration. And the Melbourne team have led the way in that space.


(Tbaatar) #3

Wow!

 

Hope you come back to the darkside again!

 

Good luck with the migration and please share your experience.


(David Schoen) #4

Hey Avi,

 

We have an internal tool that we call "Transform" that's designed to scrape an arbitrary HTTP/HTML site and provide a framework for munging it in to Matrix. It doesn't mean you're writing no code to do the migration, but it does remove a fair few components of the work.

 

We've done fun things like:

 * update existing imports in place

 * identify multiple URLs that represent the same content and merge them (deduplication)

 * break apart general content from "body" content

 * scrape metadata from arbitrary parts of existing content (doesn't have to be one page mapping directly on to another) and use as metadata in matrix

 * slurp in 3rd party info not available on the site (e.g. there maybe something in the DB for Wordpress not available on the front end that can be exported as CSV or similar)

 

At that sort of scale there may be a couple of little hairy edge conditions that need refactoring (currently a single threaded crawler and all crawled/indexed data is stored in extra tables under Matrix's DB) but shouldn't cause much grief.

 

I'm not the one to comment on whether we could just hand you the code and some docs though, I'll see what I can find out.

 

 

 

Haven't personally done anything that solves a generic wordpress migration - but given there's often a little bit of specific functionality tied up in the theme and plugins, I don't think that's going to be feasible to do generically.

 

If you do go down the custom coding approach to importing stuff in to Matrix, you'll at least be pleased to know any calls you were used to making 5+ years ago are still around - Transform has never had to be altered to change how it's talking to the internal Matrix PHP API (the initial versions have been around about 5 years). You should find at least a few calls that make life easier though :)

 

 

Cheers,

Dave


(Avi Miller) #5

Thanks David, that pretty much confirms what I had suspected: getting the content out of WordPress in a usable form is easy enough. I've written plenty of command-line scripts for WordPress to import/export data, so it's just a case of pulling the data out of WordPress into a suitable form for Matrix to import.