Sophie

Sophie

distrib > Mandriva > 2010.0 > i586 > media > contrib-release > by-pkgid > 62df2b702f393aa95101151477919ab2 > files > 16

awffull-3.10.2-2mdv2010.0.i586.rpm

TODO, and FIXME list for Awffull.

**************************************************************************
GOAL:
        Provide useful feedback (pretty!) to business owners on Website usage.
                Not just raw numbers.
**************************************************************************

Business Questions to try and answer!:
--------------------------------------

Q. A web site is typically more than just an (or more) HTTP server(s).
    How many other logs can/should we integrate? Mailman?
Q. Who's coming to my site?
        Uni's vs ISP's vs Corporates etc
        Country
        Combo's of both?
Q. How are they getting here?
Q. Why are they coming here?
Q. When are they coming here?
        Not just in my local time; but in _their_ local time.
Q. How long are they spending here?
Q. What's the most interesting pages on the Site?
        Why them and not others? Why???
Q. Here's a "business rule". How far along this process before they bomb out.
        Shopping cart etc tracking
Q. How many New vs Repeat visitors?
Q. How often do Repeat visitors come?
        time delta - hours, days, months etc
Q. Do repeats look at the same pages? or New pages?
Q. What pages are new visitors looking at vs repeats?
        Why?
Q. How are repeats getting here? Bookmark/Direct? Search Engine?
Q. How busy is my local search engine?
Q. Is the local search engine useful to visitors?
        Are they leaving from a search?;
        Or going to another internal page?
Q. What is the delta in site usage over the past <period>? Why?
Q. Web staff want detailed reports.
        Middle Managers want summarised detailed reports
        Senior/Exec Managers want highly summarised reports - ROI?
            Make the results flexible in the production of output
            Ease of incorporation into other reports
Q. Error tracking
        Is the site broken? Where? Why? How many impacted?
        Possible fixes suggested?
Q. Page responsiveness? Is the site performing well? Pages/Sec? Min, Max, Median, Avg?
        How fast are pages returning? (See blackbox logging)
        Also see: http://www.stedee.id.au/Apache_WebSite_Responsiveness
Q. Who's/How many views of my advert pages/images are there?
        Is anyone filtering my adv pages/images?
            Who/How many??

    Automate correlations between numbers as much as possible



Immediate tasks:
---------------------------------------------------------------------
*  Put complete legend underneath main summary graph 
*  Cleanup command line options - too many - simplify and reduce. Add more obvious verbosity switch + levels of same.
		Do massive change with move to 4.0.0
*  Put a date generated on all graphs??? Optional?
*  Look at threading the analysis as per discussions with Tony.
       Master thread, keeps track of log lines that don't interfere
       Multiple children (2-10?) to do the work. Start with two?
       Overheads?
   - Is in progress


Long Term Tasks:
---------------------------------------------------------------------
*  do single graphs of each day
*  do a single summary graph with each day plotted individually
       Link off Monthly/Weekly Pages?
*  single graphs of each metric, vs combined (currently)
*  enhance with relax ideas
       Perhaps simply run relax separately as an option and parse output back in???
       After filters. Probably directly via pipeline, simultaneously.
*  visits/visitors - ip/browser combo
       see awstats
*  visits/visitors - add in mod_usertrack
       new/old
       These are both done in "visitors" - integrate logic
*  Page views breakdown - how many viewed once, etc
*  weekly summaries, not just monthly.
*  enhance browser/OS reporting
*  Page tracking - levels deep to/from
       define "top level" pages, where to from here?
       see page-referred-to.pl
*  graph search engine terms/phrases
*  graph search engine referrals
*  track visits vs visitors; new vs repeat
       May need to DBise for larger sites vs flat file
       see visitors - already done - integrate
*  Number pages accessed per visit
*  bring in pcre library? Make it easier to wield various log formats?? - Partially Complete
*  mod_gzip logging
*  Apache performance tracking (blackbox)
*  display a week by hours graph - break each day down in 15minute chunks?
       No. Probably best to leave in 1 hour chunks and base rest on that
*  localise time's - logged at 10:15 -0500; want +1000.
       possibly set to Time X.
*  Parameter Changing
       ObjID=???
*  Filtering. Specify what fields to apply a PCRE'd filter to
*  Use PCRE to split logs up. Two Config Entries min. 1=PCRE 2=Definition
       2 can be either Apache style %'s or English version thereof
       Intent is to have _very_ flexible log styles
*  Multiple Logs
       Both multiple access & diff types - eg access & mod_gzip style
       Thus have LogFile=blah; LogFile2=blah2 etc
       Max of???
   Definition would follow same - LogFileDefn=blop; LogFileDefn2=blop2
*  Status backups. Create backup gzip'd? copies of any files to be modified if so config'd
*  functions. Use more in main loop to enhance code readability.
*  Separate into multiple programs and pipe results. Some duplication, but hopefully easier to manage etc.
       + merge logs - both in time (sort) and sideways (gzip and access)
           possibly split these two???
           use zmergelog as base. See dnshistory for mod_gzip merge.
       + dns translate - build a BDB of DNS rec's: IP and each translation across time - DONE dnshistory
       + parse logs and dump out. Format???
       + display graphs/results
*  Move to XML output with appropriate XSL conversion(s)???
	   +  output as PDF see http://www.stillhq.com/panda/ ???
*  Templating system - related to CSS. Strip out the hardwired HTML.
		Separate stats from presentation better'er.
		Geeklog'ish???
*  Where From graph - phpmyvisites uses a world map to display countries - Nice!
*  Segmentation - how many pages by search engine, comparison
*  Define "Goals" - certain sequence of entries define a goal match
        Page A, Page B, Page C, File D == Success!
      Cope with funnel model?
*  External Referrals - sites from Page X sent to Pages A, B & C
        cf Relax.
*  Look at fixing all string handling - use of strl* functions from Linux kernel
*  Add next/previous options to the Monthly reports. Use history file/current to determine.
*  RSS Usage/tracking?
*  Examine possibilities with creating SVG graphs


DONE:
---------------------
*  extend summary to 15->18 Months
*  ratio graphs - settable size
       done base, index, monthly, dailies, pie
       BUG: Pie has some fill problems. Closing before should.
*  Put in Y Axis measures against grid lines - get rid of max number
       done code and index, dailies, monthly
       BUG: Still showing 1000 not 1k
*  graph lines that are "easy" to read - eg every 100. or 1000 etc
       code done. Index done, monthly, dailies
*  Strip out all dns lookup code. Unnecessary complication, and not friendly over the long term holding of data.
*  DNS Lookups. Fork and thread _ahead_ of main processing. Prob can't read off STDIN, files only.
       Store results in BerkleyDB. Time first seen IP, Last Checked Time.
       Trying to sorta mimic dnstran, but store the results in a separate DB for later retrieval, rather than directly modifying the logs.
       Should _significantly_ cut down on space wastage.
       Done as "dnshistory"
*  Convert all volumes to u l l
*  Add bookmarkings (estimated anyway...)
*  Add 404 summary page - better detail of errors
*  Allow config mods to countries. eg range 1.2.3.0 is country X
*  Ditto for nslookups eg bigpond.com is Aust.
*  CSS - Use styles throughout, instead of worry about colours etc
*  Translations - integrate GNU gettext
*  Front Page: Summary Period - change to what is actually shown
*  Add volume to the hourly graph
*  Do both country code AND IP addresses.
       Look at possible dnstran replacer - DONE. dnshistory - integrate?
       Not great, but 1st cut is working
*  bookmarkings - pulled. MSIE7 changes behaviour. No longer accurate or relevant.
       Educated guesstimate
       Which pages bookmarked?
          Ignore for now - quite tricky
*  Single Access Page & Visits - related to Entry & Exit
	- Stickiness = 1 - ( SAP V's / Entry V's )   ref: WSMH p236.
*  Ratio of Page Entry:Exits
*  All Entry and Exit Pages, not just Top X