I have analyzed all data submitted over the last few years and from this analysis I have programmed a statistical "forged data rejector". This report for January 1995 is the first to exclude all forged data; as I look back through the historical reports, some sites started small-scale doctoring of the data in early 1994, but the practice did not become rampant until summer 1994.
Forgery detection is of course a cat-and-mouse game, and if these people are serious about disrupting the numbers, they will find a way to circumvent my forgery detector, and sooner or later the reports will degrade again.
This survey is based on a sample of data taken from various USENET sites. At the end of this message there is a short explanation of the measurement techniques and the meaning of the various statistics. The messages that follow this one show survey data sorted by various criteria.
The newsgroup volume and article counts that I post are often significantly different from the ones posted by Rick Adams, because he includes the size of a crossposted article in every group to which it is posted, whereas I charge that size only to the first-named group.
The complete set of readership data (of which this is a summary) is posted in news.lists. The software that will let your site participate in the survey is in comp.sources.d and news.admin
Brian Reid
reid@pa.dec.com
OVERALL SUMMARY:
This Estimated
Sample for entire net
Sites: 453 260000
Fraction reporting: 0.17% 100%
Users with accounts: 190664 47579000
Netreaders: 66123 16500000
Average readers per site: 146
Percent of users who are netreaders: 34.68%
Average traffic per day (megabytes): 242.204
Average traffic per day (messages): 84719
Traffic measurement interval: last 28 days
Readership measurement interval: last 75 days
Sites used to measure propagation: 453
Valid data received from these sites:
6sigma(5) actew.oz.au(811) adolfoien.vgs.no(2) aedi.insa-lyon.fr(510) airs.com(8) alanya.isar.muc.de(10) alchemy(371) alex(0) alfred(4) alsys.com(125) alsys.de(15) anakena.dcc.uchile.cl(7) angelo.healthchex.com(24) angus.mystery.com(35) animal.inescn.pt(247) anorad.com(119) apricot.co.uk(80) arakis.fdn.org(8) atfs0(174) awful(13) aztec(140) badlands.nodak.edu(8838) barnard.manawatu.planet.co.nz(5) bat710.univ-lyon1.fr(508) bcstec.ca.boeing.com(863) beauty(19) belvedere(7) bgsuvax(1110) bigwheel(200) blackice(1) blkhole(14) bohemia(85) bohr.phys.ksu.edu(286) boy(6) bsuvc.bsu.edu(11358) btoy1.rochester.ny.us(16) cabezon(201) caipfs.rutgers.edu(21) cam-orl.co.uk(113) caribou.msfc.nasa.gov(11) carver.wa.com(69) ccs3(1) cello(539) centre.univ-orleans.fr(191) cerritos.edu(987) cfctech(36) cgate.sait.ab.ca(580) chekov(6) chemeng.ed.ac.uk(80) cheops(231) cherry(33) chiark(7) chinaca(21) chinacat(20) chuck.sycraft.com(3) ci.org(173) cigna(708) cis(64) cleo(3) clpd-newsserver.clpd.kodak.com(744) clpgh.org(418) cnplss5(99) codewks(196) cognos(291) colossus(1233) coral(39) cpvax.cpses.tu.com(63) cradac(0) cronus(46) csdvax.csd.unsw.edu.au(2188) cspyr0(79) csustan.csustan.edu(27) cub.kscorp.com(13) cutler.com(10) cuugnet(832) cvedg(3) cwis(347) dante(21) dante.migsol.com(21) datani.dk(17) dciem(196) desc.dla.mil(7) devon(5) digi(1280) dimacs.rutgers.edu(587) discg2.disc.dla.mil(10) discg3.disc.dla.mil(1008) discg4.disc.dla.mil(77) disunms(1093) disuns2(972) dogface(1) dorm.rutgers.edu(249) dove(146) dplace(0) drager.com(275) dragon.com(42) drd(39) drum.msfc.nasa.gov(26) dsacg2.dsac.dla.mil(5) dsbc.icl.co.uk(135) dsinet(16) duke(550) dumbcat.sf.ca.us(11) dutiws.twi.tudelft.nl(424) earlgrey.exnet.com(1) ees1a0.engr.ccny.cuny.edu(9) egreen(733) eis.calstate.edu(6492) elements.rpal.rockwell.com(83) elmo(21) elsie(5) ember(3) eonwe(87) eos(174) eram.esi.com.au(72) ernest(26) ernie(15) esatst(19) esslemont.manawatu.planet.co.nz(3) europa(184) europa.com(28) fasterix.frmug.fr.net(8) fdmetd(7) fermat(274) filomen(0) flab(179) flatlin(13) franz.com(66) freedm(10) freenet-news(34438) gauss.rutgers.edu(216) gcc.edu(884) geac(234) geovax.ed.ac.uk(226) getank(52) giga(265) gistdev(57) gmdtub(251) golem(2) goofy(319) gordius(58) gouldnl(52) gozer(9) grafex(26) grian(20) gtisqr(17) gypsum.berkeley.edu(98) halcyon(7248) hammer.msfc.nasa.gov(24) hamnet(25) harrnl(23) hawkmoon(0) hccompare.com(726) hhcs.gov.au(5) hhvo.sjoe.mil.no(11) hilbert.rutgers.edu(135) hiram.edu(878) hodgson(2) hornet(1) hp400p(34) humming(98) iamk4515(44) iat.holonet.net(6350) iclnet93.iclnet.org(28) ics.uci.edu(422) iecc(10) iesd.auc.dk(405) ifens01.insa-lyon.fr(20) ifhamy.insa-lyon.fr(233) ifhpserv.insa-lyon.fr(69) ifi.uio.no(3353) iitmax(1386) imagelan.com(5) imladris(30) imperium(12) impreza(134) inescca.inescc.pt(37) infodyn(20) infopro.infopro.com(10) infotax(1) intrepid(18) investor(10) iris.claremont.edu(7) iris.mbvlab.wpafb.af.mil(143) islabs(16) isys-hh(147) ixi(4) jabba.ess.harris.com(128) james(79) jaws(225) jerrwood(1) jfwhome.funhouse.com(17) johnny5(2) jove(43) jtmiii.uucp(2) julian.uwo.ca(4150) jupiter(68) kaepk.ericsson.se(69) kaepk1.ericsson.se(81) kaepk3.ericsson.se(46) kaepk4.ericsson.se(91) kala(11) kalle.impab.se(2) kb2ear.overleaf.com(56) keltia.frmug.fr.net(50) khijol(43) kksys(104) kofax(69) krason(11) lakes(250) latour(8) ledger.co.forsyth.nc.us(142) lkbreth(50) loretta(32) lpi(59) m2xenix.psg.com(173) macdona(103) macdonal(242) magic.capsogeti.fr(130) mahavir(1) manger.modeld.no(1) mantis.co.uk(21) marriott.clark.net(260) mars(257) martex(10) math.berkeley.edu(5) math.rutgers.edu(465) mathstat(44) matrox(489) matrx(21) maya(50) mcmi(40) mcsiad(3) mdtvus.com(26) metasoft(34) mica.berkeley.edu(72) miclon(64) midas(39) missing(799) mnemosyne.cs.du.edu(210) modus(38) mole.hawkesbay.planet.co.nz(8) monygmc(21) monymsys(6) moonbase(32) mr-pibb(779) mtdiablo(19) mtroyal.ab.ca(733) mts(13) muselab(734) nad.com(285) nanovx(8) nasim(86) nate(2) ncoast(676) ncrlisl(134) neodata(1028) netagw(5) netline-fddi(9) news-server.aa.cad.com(287) news-server.aa.cad.slb.com(275) news.cis.ohio-state.edu(3638) news.ilx.com(181) news.loria.fr(527) newton.isa.de(58) nezsdc(4) nicmad(355) nj8j(9) nmc(1) nmrdc1(8) nocusuhs(17) nosun.west.sun.com(35) noweh.com(3) nri-e(67) nrlvx1.nrl.navy.mil(325) nrlvx2.nrl.navy.mil(321) nttta(56) numachi(16) nyx10.nyx10.cs.du.edu(18123) obdient(29) ocean(90) oslonett.no(4077) oucsace.cs.ohiou.edu(779) ovation(252) overload(2540) pasadena-dc.bofa.com(24) pbhya(105) pbhyb(239) pbhyc(279) pbhyd(132) pbhye(186) pbhyg(256) pentagon-ai(94) phage(521) pi19(103) piaggio(50) picasso(133) platon.transport.tih.no(3) plxsun(221) pmafire(180) practic.practic.com(8) presby.edu(254) primerd(102) prism1(31) pta(177) ptsfa(100) pute.cmhnet.org(11) pylon(9) pyramid(43) pyratl(41) qiclab.scn.rain.com(28) quando(214) qucdnee.ee.queensu.ca(37) qucdntri.ee.queensu.ca(25) quest(313) questrel(21) railnet(12) raindrop(6) raybed2(1172) rayleigh(103) rci(1068) rebel(4) redpoll(3) redshirt.cc.rochester.edu(24) residents(8) resonex(36) rhi.hi.is(4334) robohack(95) robtoy.manawatu.planet.co.nz(4) rochester(230) rosebud(2) rosedale(0) roselin(1051) rsd0(26) rtxirl.rtxirl.ie(38) ruacad(636) rubb.rz.ruhr-uni-bochum.de(20) rucs(18) rucs2(166) rufus(536) rulcde.leidenuniv.nl(14) rulcvx(0) rutcor.rutgers.edu(170) rutgers.rutgers.edu(60) sactoh0.sac.ca.us(107) sadtler(31) saturn(52) sauron.msfc.nasa.gov(20) sausage.manawatu.planet.co.nz(3) sausage.taranaki.planet.co.nz(6) scarboro(280) scfe.chinalake.navy.mil(577) scicom(53) scow(466) scrash(8) sdl(85) seanews(389) seer(41) sgfb(127) shiva.com(254) si.sintef.no(246) sis.stockell.com(20) skyking(16) slcl.lib.mo.us(69) sol.ctr.columbia.edu(278) sooner.palo-alto.ca.us(2) sparky(5) spatial.com(93) spock.retix.com(78) spunky.redbrick.com(129) srchtec(23) stat(39) stephsf.com(20) stephsf.stephsf.com(20) student(511) summit(34) sun19(37) sunburn.stanford.edu(227) suned1(731) sycraft.com(6) symbiosis.ahp.com(347) synercom(23) tachyon.com(11) tardis(145) tarzan(219) taylor.manawatu.planet.co.nz(3) tct(13) tellab5(1749) tembel(11) teslab(28) theseas(866) tijger.fys.ruu.nl(552) til(18) tintin.csl.sni.be(0) titan(414) tol-ed.com(43) torrie(13) totaltec.com(133) tower(1) tower.nullnet.fi(31) tram(4) troi.cc.rochester.edu(715) ttsi(63) tukki(2099) turtle.fisher.com(253) tut.msstate.edu(5415) twg(19) ubaclu.unibas.ch(1279) ucbeh.san.uc.edu(3666) uhura.cc.rochester.edu(4258) ukma(722) umd5.umd.edu(3713) uniwa(2099) unvax.union.edu(2195) ursa(923) urz.unibas.ch(1237) utdoe(21) utgpu(726) uunet(290) valinor.mythical.com(244) valnet(117) vanlib.fvrl.org(28) vela.acs.oakland.edu(8148) venus(162) vicuna(24) visicom(102) visual(34) vms.ocom.okstate.edu(197) voodoo.ca.boeing.com(94) warwick(14802) water.berkeley.edu(130) wb8apd(11) wcc(7) weaver.berkeley.edu(170) webworm.berkeley.edu(838) weitek.com(122) wesel(37) wetware(7) wheaton.wheaton.edu(17) whscdp.whs.edu(462) widow.berkeley.edu(260) wizvax(177) wofford.edu(687) wolf.berkeley.edu(173) wshb(43) wsrcc.com(5) wvml.jeslacs.bc.ca(25) wvus(0) xenitec(42) xopuk(0) xtree(6) yage(12) zorch(11)
The "arbitron" program is periodically posted to comp.sources.d, or is available from me (decwrl!reid). The notesfiles version of the program should be available through standard notesfiles software distribution channels as well.
The number in parentheses after the site name is the number of users that the site reported. A value of (0) usually means that the software has been configured to use the wrong technique for counting users at that site; a report showing 0 users but 6 readers of rec.humor.funny is statistically meaningful.
One might argue that the sample is self-selected, and thereby be biased. It does in fact have a certain self-selection factor in it, because we only get data from sites at which someone participates in the survey. However, we do not require the participation of every user at a site, only one user. The survey program returns data for every user on the system on which it was run. Since there are an average of 30 people per site reading news, there is a certain amount of randomness introduced that way. Of course, the sample is biased in favor of large sites (they are more likely to have a user willing to run the survey program) and software-development-oriented sites (more likely to have a user *able* to run the survey program).
There are 257417 different sites in the Path lines of articles that arrived at decwrl in the last 13 months. There are 19296 different sites in the comp.mail.maps data, but comp.mail.maps tends to include only one or two machines for each organization, leaving the rest unmentioned. Also a large number of sites participate in USENET without participating in UUCP.
I believe that 260000 is the best estimate for the size of USENET. Because it is actually a measurement of the number of sites that have posted a message or that are on the path to a site that has posted a message, it will be slightly smaller than the number of sites that actually read netnews. Any site that believes it is not being counted can just ensure that it posts at least one message a year, so that it will be counted.
Any message that has arrived at decwrl within the last "Traffic measurement interval" days is counted, regardless of when it was posted. Monthly rates are computed by taking the total traffic, dividing by the number of days in the traffic measurement interval, and multiplying by 30.
By definition the message traffic values are correct, because they are an exact measurement, but they may differ from the traffic at your site because of differences in timing and propagation. Timing differences will be random, but will average out in the long run.
If a message is crossposted to several groups, it is charged only to the first-named group in the list. Note that this differs from the statistics posted from uunet every 2 weeks: the uunet data charge a message equally to every group that it is crossposted to.
The "crossposting percentage" is the percentage of the articles in this group that are crossposted to at least one other group. If every article in this group is crossposted, the percentage will be 100%; if none is crossposted, then the percentage will be 0%. The crossposting percentage figure does not take the size of the article into account, only the number of articles. Crossposting a 50,000-byte article or a 50-byte article both cause the same tally.
The cost ratio is computed as follows:
$US/month/reader = ($USPerMonthPerSite * numberOfSites) / numberOfReaders
$USPerMonthPersite = KBytesTrafficPerMonth * $USPerKByte * Propagation
factor
$USPerKByte = 0.0025
Combining all these gives
$USPerMonthPersite =
KBytesTrafficPerMonth * 0.0025
= KBytesTrafficPerMonth / 400
Therefore:
$US/month/reader =
(KBytesTrafficPerMonth * numberOfSites) / (400 * numberOfReaders)
The accuracy of this number is in fact better than the accuracy of the participation ratio, because the source of error--the network size estimate--is present both in the numerator and the denominator, and therefore cancels out. The primary source of bias in this number comes from the bias in the "estimated number of readers, worldwide", which is described above. Treat this value as being accurate to within about 25%.
Brian Reid
Network Systems Laboratory, Digital Equipment Corporation, Palo Alto CA
reid@pa.dec.com