From park.uvsc.edu!hamblin.math.byu.edu!sol.ctr.columbia.edu!howland.reston.ans.net!swrinde!elroy.jpl.nasa.gov!decwrl!pa.dec.com!pa.dec.com!not-for-mail Sat Aug 12 08:05:13 1995 Path: park.uvsc.edu!hamblin.math.byu.edu!sol.ctr.columbia.edu!howland.reston.ans.net!swrinde!elroy.jpl.nasa.gov!decwrl!pa.dec.com!pa.dec.com!not-for-mail From: reid@decwrl.dec.com (Brian Reid) Newsgroups: news.groups,news.lists,news.admin.misc Subject: USENET READERSHIP SUMMARY REPORT FOR JUL 95 Date: 6 Aug 1995 17:16:33 -0700 Organization: DEC Network Systems Laboratory Lines: 323 Sender: reid@pa.dec.com Approved: reid@decwrl.dec.com Message-ID: <403m11$bba@usenet.pa.dec.com> NNTP-Posting-Host: usenet.pa.dec.com Xref: park.uvsc.edu news.groups:87953 news.lists:1336 news.admin.misc:40032 USENET READERSHIP SUMMARY REPORT for Jul 95 -------------------------------------------------------------------------- This is the first article in a monthly posting series from the Network Measurement Project at Digital's Network Systems Laboratory in Palo Alto, California. This survey is based on a sample of data taken from various USENET sites. At the end of this message there is a short explanation of the measurement techniques and the meaning of the various statistics. The messages that follow this one show survey data sorted by various criteria. The newsgroup volume and article counts that I post are often different from the ones posted elsewhere, because other lists sometimes include the size of a crossposted article in every group to which it is posted, whereas we charge that size only to the first-named group. The complete set of readership data (of which this is a summary) is posted in news.lists. The software that will let your site participate in the survey is in comp.sources.d and news.admin Brian Reid reid@pa.dec.com OVERALL SUMMARY: This Estimated Sample for entire net Sites: 357 330000 Fraction reporting: 0.11% 100% Users with accounts: 164053 30329000 Netreaders: 59683 11033000 Average readers per site: 167 Percent of users who are netreaders: 36.38% Average traffic per day (megabytes): 586.411 Average traffic per day (messages): 127446 Traffic measurement interval: last 28 days Readership measurement interval: last 75 days Sites used to measure propagation: 357 Valid data received from these sites: 6sigma(5) a-k(2) actew.oz.au(863) adolfoien.vgs.no(0) airs.com(7) alanya.isar.muc.de(14) alchemy(409) alfred(6) alsys.com(114) andrew.cmu.edu(8806) animal.inescn.pt(279) anorad.com(124) apricot.co.uk(121) arakis.fdn.org(8) aramis.rutgers.edu(389) ascer(36) awful(17) aztec(146) badlands.nodak.edu(10819) barnard.manawatu.planet.co.nz(5) bat710.univ-lyon1.fr(504) bcstec.ca.boeing.com(964) beauty(24) belvedere(10) bgsuvax(1236) bigwheel(225) blackice(1) blkhole(15) boboc(59) bohemia(118) bsuvc.bsu.edu(7300) btoy1.rochester.ny.us(20) bvu.edu(445) caipfs.rutgers.edu(20) calvin(2) caribou.msfc.nasa.gov(15) cass.manawatu.planet.co.nz(3) ccs3(8) ceas.rochester.edu(1190) cerritos.edu(723) cgate.sait.ab.ca(634) cheops(246) chiark(9) chinaca(21) chuck.sycraft.com(2) ci.org(171) cigna(754) cis(66) cleo(3) clpgh.org(483) cnplss5(100) codewks(221) college(733) colossus.cse.psu.edu(897) coral(37) cpvax.cpses.tu.com(62) cradac(0) crash(800) cronus(51) csustan.csustan.edu(26) cub.kscorp.com(15) cuthbert(95) cutler.com(10) cvedg(5) cwis(242) datani.dk(15) dciem(210) deepthot(19) desc.dla.mil(262) dimacs.rutgers.edu(664) discg3.disc.dla.mil(1133) disunms(995) dorm.rutgers.edu(79) dove(139) dplace(0) dragon.com(53) drd(48) drum.msfc.nasa.gov(23) dsacg2.dsac.dla.mil(48) dsacg3.dsac.dla.mil(21) dsbc.icl.co.uk(227) dsinet(20) duke(563) dumbcat.sf.ca.us(12) dwerger(5) earlgrey.exnet.com(1) ees1a0.engr.ccny.cuny.edu(9) egreen(855) elements.rpal.rockwell.com(91) elsie(17) ember(6) emf.emf.net(168) enterprise.cc.rochester.edu(1677) eonwe(86) eram.esi.com.au(76) erfsys01.iasl.ca.boeing.com(25) ernest(28) esatst(23) europa.com(28) fasterix.frmug.fr.net(8) fermat(254) flab(201) flatlin(10) franz.com(67) freenet-news(43392) gatepoly.manawatu.gen.nz(4) gateway.columbia.com(28) gcs.co.nz(45) geac(244) geovax.ed.ac.uk(211) gerdesas(14) getank(149) giga(285) gistdev(57) gmdtub(265) golem(3) goofy(354) gordius(59) grafex(32) grian(24) gtisqr(14) gwynedd.frmug.fr.net(70) gypsum.berkeley.edu(98) halcyon(10524) hammer.msfc.nasa.gov(18) hccompare.com(748) hhcs.gov.au(4) hhvo.sjoe.mil.no(2) hilbert.rutgers.edu(145) hiram.edu(689) hp400p(36) hugpar.actrix.gen.nz(3) humming(98) iamk4515(44) iclnet93.iclnet.org(28) ics.uci.edu(290) iecc(6) ifaedi.insa-lyon.fr(510) ifhamy.insa-lyon.fr(292) ifhpserv.insa-lyon.fr(86) ifi.uio.no(3027) imagelan.com(5) imladris(31) imperium(18) impreza(149) indyvax.iupui.edu(3925) inescca.inescc.pt(38) inet1.yelmtel.com(28) infodyn(18) infopro.infopro.com(7) investor(10) iris.claremont.edu(4) islabs(16) isys-hh(178) james(113) jawad.manawatu.planet.co.nz(5) jfwhome.funhouse.com(22) johnny5(3) jove(47) jtmiii.uucp(1) julian.uwo.ca(4818) jupiter(85) kaepk.ericsson.se(81) kaepk1.ericsson.se(94) kaepk3.ericsson.se(69) kaepk4.ericsson.se(122) kala(35) kb2ear.overleaf.com(59) keltia.frmug.fr.net(65) kofax(73) lakes(203) latour(9) ledger.co.forsyth.nc.us(142) lkbreth(50) lpi(14) lyxys(12) macdonal(220) magic.capsogeti.fr(144) manger.modeld.no(2) mantis.co.uk(25) marriott.clark.net(274) mars(277) math.berkeley.edu(5) math.rutgers.edu(482) mathstat(33) matrox(550) maya(60) mcmi(42) metasoft(31) mica.berkeley.edu(72) miclon(74) midas(42) missing(512) modus(39) monygmc(26) monymsys(2) morakot.nectec.or.th(242) mtdiablo(21) mtroyal.ab.ca(592) mts(15) muselab(1047) nad.com(266) nanovx(6) nasim(101) ncrlisl(111) netagw(4) netline-fddi(9) news-server.aa.cad.com(290) news.cis.ohio-state.edu(2523) news.loria.fr(558) news.pop.psu.edu(181) newton.isa.de(60) nicmad(386) nj8j(3) nmrdc1(20) nocusuhs(17) nri-e(95) nrlvx1.nrl.navy.mil(277) ntrs.com(7147) nttta(59) nyx10.nyx10.cs.du.edu(13142) obdient(23) ocean(91) orange(11) oslonett.no(6039) ovation(261) pasadena-dc.bofa.com(22) pbhya(100) pbhyc(283) pbhyf(140) pell(17) pentagon-ai(100) phage(525) piaggio(49) platon.transport.tih.no(1) plxsun(234) pmafire(105) practic.practic.com(8) primerd(85) prism1(35) pta(199) ptsfa(153) pute.cmhnet.org(11) pylon(12) qiclab.scn.rain.com(28) quack(45) qucdntri.ee.queensu.ca(24) quest(73) questrel(22) railnet(12) raindrop(8) rci(1477) redpoll(3) residents(8) resonex(36) rhi.hi.is(4632) robtoy.manawatu.planet.co.nz(4) rodan(18) rosedale(0) roselin(848) rsd0(26) rtxirl.rtxirl.ie(30) ruacad(901) rubb.rz.ruhr-uni-bochum.de(26) rucs(14) rucs2(204) rufus(430) rulcde.leidenuniv.nl(8) rulcvx(1) rutcor.rutgers.edu(173) rutgers.edu(60) rutgers.rutgers.edu(63) sadtler(31) saturn(57) sauron.msfc.nasa.gov(17) sausage.taranaki.planet.co.nz(12) sbase2(5) scicom(51) scrash(8) scrum(15) sdl(85) seanews(363) seer(39) sgfb(68) si.sintef.no(218) sis.stockell.com(20) slcl.lib.mo.us(63) snafu.muncca.fi(90) sol.ctr.columbia.edu(313) sonic(403) sooner.palo-alto.ca.us(2) spunky.redbrick.com(185) srchtec(28) stephsf.stephsf.com(20) summit(40) sunburn.stanford.edu(205) suned1(732) sunny.ci.hillsboro.or.us(28) sycraft.com(6) symbiosis.ahp.com(356) synercom(24) tarzan(285) tct(17) tellab5(1893) tembel(13) teslab(17) theseas(942) thomsoft.de(17) til(18) tintin.csl.sni.be(3) titan(474) tol-ed.com(47) torrie(13) totaltec.com(137) tower(2) tower.nullnet.fi(34) tram(5) ttsi(61) tukki(2130) ubaclu.unibas.ch(1450) ucbeh.san.uc.edu(4167) umd5.umd.edu(4090) uniwa(2086) unvax.union.edu(1782) ursa(1065) utdoe(22) utgpu(590) uunet(321) valinor.mythical.com(227) valnet(127) vanlib.fvrl.org(28) vela.acs.oakland.edu(8309) venus(159) vicstoy(3) vicuna(20) vikings(256) vms.ocom.okstate.edu(176) voodoo.ca.boeing.com(97) wb8apd(13) wcc.oz.au(7) webworm.berkeley.edu(438) wesel(37) wheaton.wheaton.edu(20) widow.berkeley.edu(132) wizvax(507) wms(16) wolf.berkeley.edu(50) worf(34) wshb(47) wurc.manawatu.planet.co.nz(5) wvcc.nwcs.com(12) wvml.jeslacs.bc.ca(22) wvus(0) xenitec(35) xopuk(1) xtree(13) xymph(1) ------------------------------------------------------------------------------ EXPLANATION OF THE MEASUREMENTS AND STATISTICS Survey data is taken by having one person at each site run a program called "umpire", which looks at the news or notes files and determines the newsgroups that the user has read within a recent interval. To "read" a newsgroup means to have been presented with the opportunity to look at at least one message in it. Going through a newsgroup with the "n" key counts as reading it. For a news site, "user X reads group Y" means that user X's .newsrc file has marked at least one unexpired message in Y. If there is no traffic in a newsgroup for the measurement period, then the survey will show that nobody reads the group. For a notes site, "user X reads group Y" means that user X has been in the notesfile with the sequencer in the last 14 days. The "14 days" interval for notesfiles corresponds to "unexpired" for news. The "umpire" program is periodically posted to comp.sources.d, or is available from me (decwrl!reid). The notesfiles version of the program should be available through standard notesfiles software distribution channels as well. SITES SURVEYED IN THIS SAMPLE "This Sample" means the set of sites that have sent in an umpire report within the past "Readership measurement interval" days. In every case the most recent report from each site is used. At the moment, some of the readership reports are several months old. In future postings those reports will have expired and will not be included. The number in parentheses after the site name is the number of users that the site reported. A value of (0) usually means that the software has been configured to use the wrong technique for counting users at that site; a report showing 0 users but 6 readers of rec.humor.funny is statistically meaningful. One might argue that the sample is self-selected, and thereby be biased. It does in fact have a certain self-selection factor in it, because we only get data from sites at which someone participates in the survey. However, we do not require the participation of every user at a site, only one user. The survey program returns data for every user on the system on which it was run. Since there are an average of 30 people per site reading news, there is a certain amount of randomness introduced that way. Of course, the sample is biased in favor of large sites (they are more likely to have a user willing to run the survey program) and software-development-oriented sites (more likely to have a user *able* to run the survey program). NETWORK SIZE I determine the network size by looking at the set of sites that are mentioned in the Path lines of news articles arriving at decwrl. This number is consistently higher than the number of sites that posted a message (as measured and posted from uunet) because it includes passive sites that are on the paths between posting sites and decwrl. Each month I store the names of the hosts that are named that month, and for this report I used the past 12 months worth of data. There are 322651 different sites in the Path lines of articles that arrived at decwrl in the last 12 months. There are 19819 different sites in the comp.mail.maps data, but comp.mail.maps tends to include only one or two machines for each organization, leaving the rest unmentioned. Also a large number of sites participate in USENET without participating in UUCP. I believe that 330000 is the best estimate for the size of USENET. Because it is actually a measurement of the number of sites that have posted a message or that are on the path to a site that has posted a message, it will be slightly smaller than the number of sites that actually read netnews. Any site that believes it is not being counted can just ensure that it posts at least one message a year, so that it will be counted. NUMBER OF USERS The number of users at each site is determined in a site-specific fashion. Sometimes it is done by counting the number of user accounts that have shells and login directories. Sometimes it is done by counting the number of people who have logged in to the machine in some interval. Sometimes other techniques are used. This number is probably not very accurate--certainly not more accurate than to within a factor of two. ESTIMATED TOTAL NUMBER OF PEOPLE WHO READ THIS GROUP, WORLDWIDE There are two sources of error in this number. The number is computed by multiplying the number of people in the sample who actually read the group by the ratio of estimated network size to sample size. The estimated total can therefore be biased by errors in the network size estimate (see above) and also by errors in the determination of whether or not someone reads a group. Assuming that "reading a group" is roughly the same as "thumbing through a magazine", in that you don't necessarily have to read anything, but you have to browse through it and see what is there, then the measurement error will come primarily from inability to locate .newsrc files, which can either be protected or moved out of root directories. There is no way of measuring the effect on the measurements from unlocated .newsrc files, but it is not likely to be more than a few percent of the total news readers. PROPAGATION: HOW MANY SITES RECEIVE THIS GROUP AT ALL This number is the percent of the sites that are even receiving this newsgroup. The information necessary to compute propagation was not generated by early versions of the umpire program, so the "basis" (number of sites) used to generate the Propagation figure is smaller than the "Sites in this sample" figure. A site's data will be used to compute propagation if either (a) it reports zero readers for at least one group, or (b) it is using an umpire with an explicit version number that is high enough. MESSAGES PER MONTH AND KILOBYTES PER MONTH Traffic is measured at decwrl, in Palo Alto, California. If for some reason decwrl has not received any traffic in that newsgroup during the measurement period, this is indicated with dashes ("-") in the traffic columns. Any message that has arrived at decwrl within the last "Traffic measurement interval" days is counted, regardless of when it was posted. Monthly rates are computed by taking the total traffic, dividing by the number of days in the traffic measurement interval, and multiplying by 30. By definition the message traffic values are correct, because they are an exact measurement, but they may differ from the traffic at your site because of differences in timing and propagation. Timing differences will be random, but will average out in the long run. If a message is crossposted to several groups, it is charged only to the first-named group in the list. Note that this differs from the statistics posted from uunet every 2 weeks: the uunet data charge a message equally to every group that it is crossposted to. CROSSPOSTING PERCENTAGE: WHAT FRACTION OF THE ARTICLES ARE CROSSPOSTED "Crossposting" means to post the same article simultaneously in more than one newsgroup. In genuine "news" systems crossposting is implemented with Unix links and does not increase the storage or transmisison cost, though in some other systems crossposted articles are unbundled and must be stored and transmitted separately. The "crossposting percentage" is the percentage of the articles in this group that are crossposted to at least one other group. If every article in this group is crossposted, the percentage will be 100%; if none is crossposted, then the percentage will be 0%. The crossposting percentage figure does not take the size of the article into account, only the number of articles. Crossposting a 50,000-byte article or a 50-byte article both cause the same tally. COST RATIO: DOLLARS PER MONTH PER READER The most controversial field in the survey report is the "$US per month per reader". It is the estimated number of dollars that are being spent on behalf of each reader, worldwide, on telephone and computer costs to transmit this newsgroup. The rate of $.0025 per kilobyte is the same value used in the UUNET statistics reported biweekly. It is based on discussions among system administrators about the true cost of news transmission. The cost ratio is computed as follows: $US/month/reader = ($USPerMonthPerSite * numberOfSites) / numberOfReaders $USPerMonthPersite = KBytesTrafficPerMonth * $USPerKByte * Propagation factor $USPerKByte = 0.0025 Combining all these gives $USPerMonthPersite = KBytesTrafficPerMonth * 0.0025 = KBytesTrafficPerMonth / 400 Therefore: $US/month/reader = (KBytesTrafficPerMonth * numberOfSites) / (400 * numberOfReaders) The accuracy of this number is in fact better than the accuracy of the participation ratio, because the source of error--the network size estimate--is present both in the numerator and the denominator, and therefore cancels out. The primary source of bias in this number comes from the bias in the "estimated number of readers, worldwide", which is described above. SITE PARTICIPATION I would like to receive data from every site on USENET. The umpire programs (posted to comp.sources.d along with this report) work on most news versions. Brian Reid Network Systems Laboratory, Digital Equipment Corporation, Palo Alto CA reid@pa.dec.com