When I implemented Sprinkle, which is a client-side includes (CSI) system I came up with that doesn't use IFRAMEs, I kept running into the scenario where you may want to fetch HTML from an external web site besides your own. This is sort of what Web 2.0 is all about, being able to mashup the world with not just your crap but everyone else's crap as well.
I threw together a trivial solution. This is ASP.NET-only, I might come up with a PHP-based equivalent. The idea is to implement a really trivial proxy server and cache the data for a period of time. In this particular implementation, I cache it directly into the web Application's in-memory collection.
Here's what using it might look like ..
<%-- Client-side includes with server-side cross-site proxying --%>
<script type="text/javascript" src="http://sprinklejs.com/sprinkle.js"></script>
<div src="proxy.aspx?url=http://www.sprinklejs.com/info.html" />
<%-- Server-side includes with cross-site proxying--%>
<ssi:ProxyControl runat="server" ID="GoogleInsertion"
SourceUrl="http://www.google.com/"
DetectImposeBase="true"
BaseUrl="proxy.aspx?url=http://www.google.com/" />
In the server-side include implementation, the DetectImposeBase and BaseUrl properties are really just hacks where I force-inject the proxy URL to any src and href element attributes.
If you try to use the above-referenced proxy.aspx file from an external web site, it should fail. The referer header can only be on the same host.
If you try to reference a very large binary file or something, it will fail. Maximum file size is enforced, so as to not overload the Application in-memory collection that hosts the proxy cache.
This implementation doesn't work flawlessly and it's sort of a prototype thing, it only took about an hour to hack together (plus some time I spent struggling with Visual Studio puking on me), but anyway, here it is.
Download: http://sprinklejs.com/SSI_Proxy_ASPNET.7z