Sophie

Sophie

distrib > Mandriva > 2010.0 > i586 > media > contrib-release > by-pkgid > 9b42c473acaec696d4bb0584e17699ff > files > 39

python-clientcookie-1.3.0-4mdv2010.0.noarch.rpm

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
        "http://www.w3.org/TR/html4/strict.dtd">




<html>
<head>
  <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
  <meta name="author" content="John J. Lee &lt;jjl@pobox.com&gt;">
  <meta name="date" content="2006-01-05">
  <meta name="keywords" content="FAQ,cookie,HTTP,HTML,form,table,Python,web,client,client-side,testing,sniffer,https,script,embedded">
  <title>Python web-client programming general FAQs</title>
  <style type="text/css" media="screen">@import "../styles/style.css";</style>
  <base href="http://wwwsearch.sourceforge.net/bits/clientx.html">
</head>
<body>

<div id="sf"><a href="http://sourceforge.net">
<img src="http://sourceforge.net/sflogo.php?group_id=48205&amp;type=2"
 width="125" height="37" alt="SourceForge.net Logo"></a></div>
<!--<img src="../images/sflogo.png"-->

<h1>Python web-client programming general FAQs</h1>

<div id="Content">
<ul>
  <li>Is there any example code?
     <p>There's (still!) a bit of a shortage of example code for ClientCookie
     and ClientForm &amp;co., because the stuff I've written tends to either
     require access to restricted-access sites, or is proprietary code (and the
     same goes for other people's code).
  <li>HTTPS on Windows?
     <p>Use this <a href="http://pypgsql.sourceforge.net/misc/python22-win32-ssl.zip">
     _socket.pyd</a>, or use Python 2.3.
  <li>I want to see what my web browser is doing, but standard network sniffers
     like <a href="http://www.ethereal.com/">ethereal</a> or netcat (nc) don't
     work for HTTPS.  How do I sniff HTTPS traffic?
  <p>Three good options:
  <ul>
    <li>Mozilla plugin: <a href="http://livehttpheaders.mozdev.org/">
     livehttpheaders</a>.
    <li><a href="http://www.blunck.info/iehttpheaders.html">ieHTTPHeaders</a>
     does the same for MSIE.
    <li>Use <a href="http://lynx.browser.org/">lynx</a> <code>-trace</code>,
     and filter out the junk with a script.
  </ul>
  <p>I'm told you can also use a proxy like <a
  href="http://www.proxomitron.info/">proxomitron</a> (never tried it
  myself).  There's also a commercial <a href="http://www.simtec.ltd.uk/">MSIE
  plugin</a>.
  <li>Embedded script is messing up my web-scraping.  What do I do?
     <p>It is possible to embed script in HTML pages (sandwiched between
     <code>&lt;SCRIPT&gt;here&lt;/SCRIPT&gt;</code> tags, and in
     <code>javascript:</code> URLs) - JavaScript / ECMAScript, VBScript, or
     even Python.  These scripts can do all sorts of things, including causing
     cookies to be set in a browser, submitting or filling in parts of forms in
     response to user actions, changing link colours as the mouse moves over a
     link, etc.

     <p>If you come across this in a page you want to automate, you
     have four options.  Here they are, roughly in order of simplicity.

     <ul>
       <li>Simply figure out what the embedded script is doing and emulate it
       in your Python code: for example, by manually adding cookies to your
       <code>CookieJar</code> instance, calling methods on
       <code>HTMLForm</code>s, calling <code>urlopen</code>, etc.

       <li>Dump ClientCookie and ClientForm and automate a browser instead.
       For example use MS Internet Explorer via its COM automation interfaces, using
       the <a href="http://starship.python.net/crew/mhammond/">Python for
       Windows extensions</a>, aka pywin32, aka win32all (eg.
       <a href="http://vsbabu.org/mt/archives/2003/06/13/ie_automation.html">simple
       function</a>, <a href="http://pamie.sourceforge.net/">pamie</a>;
       <a href="http://www.oreilly.com/catalog/pythonwin32/chapter/ch12.html">
       pywin32 chapter from the O'Reilly book</a>) or
       <a href="http://starship.python.net/crew/theller/ctypes/">ctypes</a>
       (<a href="http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/305273">
       example</a>: may be out of date, since <code>ctypes</code>' COM support is
       still evolving).
       <a href="http://www.brunningonline.net/simon/blog/archives/winGuiAuto.py.html">This</a>
       kind of thing may also come in useful on Windows for cases where the
       automation API is lacking.
       <a href="http://ftp.acc.umu.se/pub/GNOME/sources/pyphany/">pyphany</a>
       is a binding to the <a href="http://www.gnome.org/projects/epiphany/">
       epiphany web browser</a>, allowing both plugins and automation code to be
       written in Python.
       XXX Mozilla automation &amp; XPCOM / PyXPCOM, Konqueror &amp; DCOP / KParts / PyKDE).

       <li>Use Java's <a href="httpunit.sourceforge.net">httpunit</a> from
       Jython, since it knows some JavaScript.
       <li>Get ambitious and automatically delegate the work to an appropriate
       interpreter (Mozilla's JavaScript interpreter, for instance).  This
       approach is the one taken by <a href="../DOMForm">DOMForm</a> (the
       JavaScript support is &quot;very alpha&quot;, though!).
     </ul>
  <li>Misc links
     <ul>
       <li><a href="http://www.crummy.com/software/BeautifulSoup/">Beautiful
       Soup</a> is a widely recommended HTML-parsing module.
       <li><a href="http://linux.duke.edu/projects/urlgrabber/">urlgrabber</a>
       contains useful stuff like persistent connections, mirroring and
       throttling, and it looks like most or all of it is well-integrated with
       <code>urllib2</code> (originally part of the yum package manager, but
       now becoming a separate project).
       <li>Another Java thing: <a href="http://maxq.tigris.org/">maxq</a>,
       which provides a proxy to aid automatic generation of functional tests
       written in Jython using the standard library unittest module (PyUnit)
       and the &quot;Jakarta Commons&quot; HttpClient library.
       <li>A useful set Zope-oriented links on <a
       href="http://viii.dclxvi.org/bookmarks/tech/zope/test">tools for testing
       web applications</a>.
       <li>O'Reilly book: <a href="">Spidering Hacks</a>.  Very Perl-oriented.
       <li>Useful
       <a href="http://chrispederick.com/work/webdeveloper/"> Firefox extension
       </a> which, amongst other things, can display HTML form information and
       HTML table structure(thanks to Erno Kuusela for this link).
       <li>
       <a href="http://www.openqa.org/selenium/">Selenium</a>: In-browser web
       functional testing.
       <li><a href="http://www.opensourcetesting.org/functional.php">Open
       source functional testing tools</a>.  A nice list.
       <li><a href="http://www.rexx.com/~dkuhlman/quixote_htmlscraping.html">
       A HOWTO on web scraping</a> from Dave Kuhlman.
     </ul>
  <li>Will any of this code make its way into the Python standard library?

  <p>The request / response processing extensions to <code>urllib2</code> from
     ClientCookie have been merged into <code>urllib2</code> for Python 2.4.
     The cookie processing has been added, as module <code>cookielib</code>.
     Eventually, I'll submit patches to get the http-equiv, refresh, and
     robots.txt code in there too, and maybe <code>mechanize.UserAgent</code>
     too (but <em>not</em> <code>mechanize.Browser</code>).  The rest, probably
     not.

</ul>

<p>I prefer questions and comments to be sent to the <a
href="http://lists.sourceforge.net/lists/listinfo/wwwsearch-general">
mailing list</a> rather than direct to me.

<p><a href="mailto:jjl@pobox.com">John J. Lee</a>,
January 2006.

</div> <!--id="Content"-->

<div id="Menu">

<a href="..">Home</a><br>
<br>
<span class="thispage">General FAQs</span><br>
<br>
<a href="../mechanize/">mechanize</a><br>
<a href="../pullparser/">pullparser</a><br>
<a href="../ClientCookie/">ClientCookie</a><br>
<a href="../ClientCookie/doc.html"><span class="subpage">ClientCookie docs</span></a><br>
<a href="../ClientForm/">ClientForm</a><br>
<br>
<a href="../DOMForm/">DOMForm</a><br>
<a href="../python-spidermonkey/">python-spidermonkey</a><br>
<a href="../ClientTable/">ClientTable</a><br>
<a href="../bits/urllib2_152.py">1.5.2 urllib2.py</a><br>
<a href="../bits/urllib_152.py">1.5.2 urllib.py</a><br>

<br>

</body>
</html>