We have a HUGE number of mp3 files that we store on a seperate hard drive on our server. This is NOT the server that Matrix is installed on, and I am really not wanting to bring in thousands of mp3's into Matrix, and add about 40GB to the system. What I DO want it to be able to search for these files and find them in the search results.
Now, I know that all other "files" are indexed by Matrix, but served by Apache. Is it possible, to have those files on another Apache server, and have Matrix index these files (while being able to add meta data) while keeping them on that seperate Apache server.
If this is NOT possible, what is my next best solution to this problem? Use the import script and import 2000 mp3 files and add 40 gigs to my site…I hope this is not my only option.
[quote]Is it possible, to have those files on another Apache server, and have Matrix index these files (while being able to add meta data) while keeping them on that seperate Apache server.
[right][post=“15828”]<{POST_SNAPBACK}>[/post][/right][/quote]
No, its not. Matrix only indexes assets in its own database. Your only solution is to use a 3rd-party search index tool that indexes both Matrix and the external MP3 files.
Could Matrix easily handle importing all these files? Or would this be something that I should do in stages?
Also, would this bloat my database, and cause my site, and searches to be slower?
The import should be fine.
[quote]Also, would this bloat my database, and cause my site, and searches to be slower?
[right][post="15830"]<{POST_SNAPBACK}>[/post][/right][/quote]
Your database wouldn't get that much larger (its only another 2,000 odd assets and we have systems that have several hundred thousand assets). However, your backups will be affected, as you will now be backing up 40GB worth of MP3s.
Do you have another search engine at your company that you could use?
Unfortinetly no. Right now we are using Google to search our site, and are in the process of building our new site using Matrix. Right now those mp3 files are not able to be found in the Google search. That is why, when we launch our new Matrix site, I would like them to be in the search index.
[quote]That is why, when we launch our new Matrix site, I would like them to be in the search index.
[right][post=“15833”]<{POST_SNAPBACK}>[/post][/right][/quote]
If they’re web-accessible, why not create 2,000 Redirect Pages for them? Then, you can list them somewhere using the %redirect_url% so that Google can find them. This way, you can reference them inside Matrix, but not actually store the files in Matrix. Additionally, if you use the MP3 filename as the Redirect Page name, you can then search for them. 
EDIT: Avi beat me to it, but I'll post my suggestion anyway
You could use 'Redirect' assets to point to those files. This will allow you to assign metadata to them and all other stuff that you want, while the files themselves would be elsewhere.
I don't know how manageable that would be in the long run: Importing all data and keeping it synchronized. Someone else should advise here.
Having MP3 assets would be best as we could then extract metadata from files and do all kinds of magic with it. That, however, requires development. We extract metadata from JPEG files produced by Photoshop and make it available as keywords.
Another option is to put them all into a DB and then use DB Data Source asset to access and Asset Listing to list. Google would then index them as usual.
One more option is to produce an RSS feed of mp3s on the other server, then consume it with RSS Data Source Asset.
Or just use Remote Content to bring in whatever other server produces as a listing of those files. Google would still index it.
All of these options will not let Matrix index the files and do internal searches.
I'm sure we can find a way to read ID3 tags if given an opportunity. :rolleyes:
[quote]I’m sure we can find a way to read ID3 tags if given an opportunity. :rolleyes:
[right][post=“15838”]<{POST_SNAPBACK}>[/post][/right][/quote]
Yeah, I can read ID3 tags, but do they store sample rate and time? Time is the important one for podcasting, IIRC.
We can always employ a command-line unix tool to help us out like we do with PDF indexing, etc. I didn’t look very hard and already found a little program that can help us out: http://www.ibiblio.org/mp3info/
[quote]We can always employ a command-line unix tool to help us out like we do with PDF indexing, etc. I didn’t look very hard and already found a little program that can help us out: http://www.ibiblio.org/mp3info/
[right][post=“15840”]<{POST_SNAPBACK}>[/post][/right][/quote]
Oh, sure - I hadn’t looked very hard. 
For what it is worth I have worked up a media asset for another project which I can send to Avi to morph make into an 'external MP3' asset. It just has a bunch of meta fields, and a URL or link section. You could get the metadata from the MP3s, put in into an XML file and import the data and URL into Matrix.
We do something very similar at RNZ - which means that the Matrix system simply holds the data about the audio - the audio itself is on a geographically distributed cluster of servers. But you woudn't need to do that!
Or I could release the audio_item code (via Avi) and that could be thinned out for you - we have some Radio specific stuff in there and it allows us to use metafiles to activate streaming playback.
cheers
R
Don't send them to me, send them to Greg, or at least speak to him about them first. They'd need to go through our unit testing frameworks and such before we could recommend them to other MySource Matrix users, particularly those who have an SLA or other support agreement with us.
Wow, thanks for all the great replys.
[quote]You could use 'Redirect' assets to point to those files. This will allow you to assign metadata to them and all other stuff that you want, while the files themselves would be elsewhere.[/quote]
This does sound like it would work. Although, I have set up our Matrix search to have a customized look for each asset found, and I wanted to customize the audio (mp3) assets also. (Squiz created me audio and video assets) This would be nice so that I can have a quick audio preview in the search results, along with the metadata for the recording (which I use to make the asset description). It sounds like there are a few ways to do this. I think that the redirect asset might work, I will start my plan of action with that in mind.
If you have a custom asset for audio assets you could store the URL to the the actual MP3 in metadata. In the listings you could use the URL stored in metadata to point to the file. This way you have a custom format in listings and avoid the redirect assets.
This is a great idea. I think this might be my solution. Thanks guys.
Just my two cents on the redirects idea - we don't list redirects in our search results because if we did we would have quite a few duplicate results (where we use redirects to point to another asset in the tree, rather than to an external resource). Is there some other way (other than not including them at all) to get around this?