资源说明:Implementation of the PageRank algorithm
WARNING ====== This is NOT a complete, nor even a very good, PageRank implementation. This was a learning implementation and, as such, should be treated as a simple learning project. Use at your own risk. PageRank ======== Timothy Asp (@timothyasp) Caleb Carlton (@ccarlton) Run Instructions: ================ NOTE: NetworkX is required for installation. Run sudo easy_install networkx For directed data, run: python pageRank.pydirected For undirected data, run: python pageRank.py undirected Implementation ============= Generates a directed or undirected graph of the data, then runs the PageRank algorithm, iterating over every node checking the neighbors (undirected) and out-edges (directed). We used NetworkX (http://networkx.lanl.gov/) for out graphs which is a great python library for creating and maintaining these graphs. For D, which is the probability that the user will randomly go to a different page, we chose .85 which is what the book had as the probability for D. The graph is iterated over 10 times, to calculate the rank best. State Borders ================ Analysis: The top ranked states are all high density area states, meaning they have lots of state borders, which makes sense and matches expected output. ``` MO :0.0315971844476 KY :0.0311331684216 TN :0.0310935756102 MA :0.0287725140929 PA :0.0269173607527 MD :0.0267222576413 GA :0.0262810787353 NY :0.0261652680999 SD :0.0259332059378 WY :0.0256935355929 IA :0.0255032593163 CO :0.0252932994762 VA :0.0250814382815 ID :0.0250734427792 AR :0.023278072507 OK :0.0223321781313 NV :0.0216708088742 NB :0.0214511655309 NE :0.0214511655309 OH :0.021122132435 WV :0.02097603798 TX :0.0203753104517 NH :0.0203612457806 UT :0.0201041909315 IL :0.0194874330536 OR :0.0186831020097 KS :0.0182382396929 AL :0.0180729949833 VT :0.0180524967929 NC :0.017964064708 CT :0.0177488522188 AZ :0.0175795169308 MS :0.0171642829093 IN :0.0170252621532 WI :0.0167143241873 MT :0.0164648690288 MN :0.0164414868817 NM :0.0163676118695 NJ :0.0154237955317 DE :0.0149027237957 CA :0.014328872014 MI :0.0137016010819 LA :0.0133453258507 ND :0.0130759232999 RI :0.0128664771188 FL :0.0104997265484 SC :0.0104816930419 WA :0.0104611056458 DC :0.0102730766081 MV :0.0100317563706 ME :0.00871019610841 ``` Political Blogs ============== ``` 855 :0.0123982365159 155 :0.0102073178341 963 :0.0085996555082 1051 :0.00779787367556 641 :0.00740159936356 55 :0.00708484611118 1245 :0.0059611211307 1153 :0.00569707706033 1437 :0.00562599374973 729 :0.00547134446998 1112 :0.00527363212751 1479 :0.00487418572979 1041 :0.00469773070457 454 :0.00469567338062 1101 :0.00464578664097 363 :0.00447834018621 323 :0.00435790083927 1000 :0.0038614062333 798 :0.00381999172434 642 :0.00370102086354 741 :0.0036547261639 979 :0.00362174829003 434 :0.0036180870679 1179 :0.00350056696695 493 :0.00346775923273 512 :0.0034349511101 880 :0.00342661996956 180 :0.00341641339595 996 :0.00340638815647 1270 :0.00340087077945 387 :0.00339655223689 1461 :0.00332533847476 878 :0.00324778609618 756 :0.0032359049905 1306 :0.00315517498621 144 :0.00312457924806 483 :0.00309610699993 535 :0.00308475291018 99 :0.00307278802084 1330 :0.00301907244338 189 :0.00301478979974 1209 :0.00300901159605 980 :0.00299508763393 170 :0.00298107564567 1463 :0.00283380952926 762 :0.00280848085965 856 :0.00279802241224 644 :0.00276932320703 1384 :0.00276752568959 547 :0.0027426663457 514 :0.00273565583007 72 :0.00273238941209 919 :0.00273076502953 664 :0.00272610535733 687 :0.00270819350753 150 :0.00269966673637 524 :0.00268026659323 826 :0.00266091439199 119 :0.00265691688035 297 :0.00264045909771 935 :0.00263408868513 892 :0.00261845274543 … trimmed ``` NCAA ===== Analysis: A quick look through the 2009 season results show that most of the teams ranked the highest we’re the division leaders and winning teams. ``` Boise State :0.00733262361219 Utah :0.00722435930833 Montana :0.00709947017054 Tulsa :0.00670055195551 Ball State :0.0063758368291 Texas :0.00636546801443 Rice :0.00636420852951 Alabama :0.00625447664103 Florida :0.00624166939591 TCU :0.00612222447911 Texas Tech :0.00602273860572 Penn State :0.00583129342948 Cincinnati :0.00575753270947 Oklahoma :0.0056904527237 Richmond :0.00547645775694 East Carolina :0.00529388406925 Missouri :0.00524749922454 Ohio State :0.00522310227913 Brigham Young :0.00514457541395 Weber State :0.00513022880521 Nebraska :0.00504668897937 USC :0.00499134298746 New Hampshire :0.00498805759706 Georgia :0.00496366497189 Michigan State :0.00492355290029 Iowa :0.00491360116607 Western Michigan :0.00488826180424 James Madison :0.00488409341977 Houston :0.00487147333886 Northwestern :0.00485804367807 Troy :0.00481302701871 Pittsburgh :0.00480537805717 Louisiana Tech :0.00476687389604 Air Force :0.00472795714934 Oklahoma State :0.00472453412691 Buffalo :0.00469183049703 West Virginia :0.00468412914986 Virginia Tech :0.00467792833048 Villanova :0.00461018702304 Mississippi :0.00460722692705 LSU :0.00458586681751 South Carolina State :0.00457632203178 Navy :0.00455469420055 Oregon :0.00455031908469 Boston College :0.00453703396878 Fresno State :0.00444681070456 Rutgers :0.00440682553538 Florida Atlantic :0.00439888351631 Connecticut :0.00437901660454 Florida A&M :0.00437700673944 Kansas :0.00437053073582 Southern Miss :0.00434165382392 Colorado State :0.00433892942647 Nevada :0.00432803496027 South Florida :0.00431247388258 Florida State :0.00430781518633 Appalachian State :0.0042855078864 Maine :0.00427719960866 San Jose State :0.00421017077414 Georgia Tech :0.00417210375934 Central Michigan :0.00416730758325 Wake Forest :0.00404176127288 Minnesota :0.00399261481722 Bethune-Cookman :0.00395345951099 Louisiana-Lafayette :0.00394932125891 Montana State :0.00392668285699 California :0.0038692041512 Wisconsin :0.00385875981679 Memphis :0.00382804688824 Maryland :0.00380882019124 North Carolina :0.0038077233882 Notre Dame :0.00380022455306 Kentucky :0.0037939751987 Vanderbilt :0.00378971204221 Arizona :0.0037893865312 Oregon State :0.00377213824036 Cal Poly :0.00374115053075 Hawaii :0.0037386301266 Eastern Washington :0.00369986078233 South Carolina :0.00366865310273 Arkansas State :0.0036312635554 Wofford :0.00362089389909 Miami (FL) :0.00361898361937 Northern Arizona :0.00356914337631 Bowling Green :0.00355155219836 UTEP :0.0034868669963 William & Mary :0.00345594533534 Colgate :0.00343346141159 Northern Iowa :0.00341207809157 Clemson :0.00340172846317 Central Arkansas :0.00339275926671 Kansas State :0.00336454093013 Florida International :0.00330978664396 Elon :0.00330964240795 Morgan State :0.00329645553639 Harvard :0.00324203805793 Grambling State :0.00319995545647 ``` Les Miserables ============= Analysis: The main characters are ranked the highest, which is as expected ``` Valjean :0.0758308901974 Myriel :0.0429505779458 Gavroche :0.0359160666477 Marius :0.0310172618335 Javert :0.0304815194381 Thenardier :0.0280510981415 Fantine :0.027191326266 Enjolras :0.0219890272928 Cosette :0.0207014337229 MmeThenardier :0.0196084909871 Bossuet :0.0190434224611 Courfeyrac :0.0186615760305 Eponine :0.0178862752016 Mabeuf :0.0175458955297 Joly :0.0172827540076 Bahorel :0.0172689775792 Babet :0.0167910676219 Gueulemer :0.0167728629274 Claquesous :0.0166377902786 MlleGillenormand :0.0163128064923 Feuilly :0.0159647347833 Combeferre :0.0159634323963 Tholomyes :0.0157308656902 Bamatabois :0.0156663859666 Montparnasse :0.0152477377714 Gillenormand :0.0150159848507 Grantaire :0.014524331992 Prouvaire :0.0131993961441 Favourite :0.0126897550776 Fameuil :0.0126870828847 Dahlia :0.0126820677133 Blacheville :0.0126796581368 Zephine :0.0126773324066 Listolier :0.0126750883243 Cochepaille :0.0125074398829 Chenildieu :0.0125030179402 Judge :0.0124920149995 Champmathieu :0.0124884594437 Brevet :0.0124851057688 Brujon :0.0119298962133 Fauchelevent :0.0116775047428 MmeHucheloup :0.0107290724039 MlleBaptistine :0.0103332200281 MmeMagloire :0.0103247490448 Simplice :0.00911030673281 LtGillenormand :0.00874898560852 MmeBurgon :0.00782036048117 Pontmercy :0.0073937575872 Toussaint :0.00686488659601 Woman2 :0.00686488659601 Anzelma :0.00633806764306 MotherInnocent :0.00621997328001 MmePontmercy :0.00602616566625 Child1 :0.0058066429878 Child2 :0.00580537558791 Champtercier :0.00560655179602 Geborand :0.00560655179602 Napoleon :0.00560655179602 Count :0.00560655179602 OldMan :0.00559885107345 Cravatte :0.00559885107345 CountessDeLo :0.00559885107345 Perpetue :0.00542796568708 Magnon :0.00528924458944 Marguerite :0.0052793453439 Jondrette :0.00527170515255 Woman1 :0.00526257949406 BaronessT :0.00516084129531 Gribier :0.0044295217059 MlleVaubois :0.00393126768321 Scaufflaire :0.00373850352216 Labarre :0.00373850352216 Isabeau :0.00373850352216 Gervais :0.00373850352216 MmeDeR :0.00373850352216 Boulatruelle :0.00343826653682 MotherPlutarch :0.00330387114808 ``` Karate ====== Analysis: Looking through the data, the top people the most connections, thus making them the highest ranked, so it’s as expected ``` 34 :0.100779659381 1 :0.097005456951 33 :0.071576888392 3 :0.0570213307483 2 :0.0528547897183 32 :0.037117301531 4 :0.0358484502685 24 :0.0314579138303 9 :0.0297284975846 14 :0.0295173382004 7 :0.0291367362044 6 :0.0291334696727 30 :0.0262477118839 28 :0.0256009815829 31 :0.0245624539154 8 :0.0244823644259 5 :0.0219971434271 11 :0.0219885658829 25 :0.0210407262397 26 :0.0209763705902 20 :0.0195946521061 29 :0.0195468185624 17 :0.0167941834548 27 :0.0150190551952 13 :0.0146437100611 22 :0.0145596783316 18 :0.0145570208576 19 :0.0145207772694 21 :0.0145118570135 23 :0.0145118570135 15 :0.0145118570135 16 :0.0145118570135 10 :0.014293551594 12 :0.0095651796064 ``` Dolphins ======== ``` Grin :0.032098769619 Jet :0.0318806892333 Trigger :0.0312662964391 Web :0.0302903715199 SN4 :0.0298427560105 Topless :0.0294801891168 Scabs :0.0283966864331 Patchback :0.0264319121028 Gallatin :0.0263223854408 Beescratch :0.0247553683412 Kringel :0.0246260790508 SN63 :0.0239144788977 Feather :0.0235998207139 SN9 :0.0219581962847 Upbang :0.0217432490668 Stripes :0.0216651111975 SN100 :0.0206241003106 DN21 :0.0201591774591 Haecksel :0.019855361239 Jonah :0.0193677489256 TR99 :0.019209987275 SN96 :0.0176136632619 TR77 :0.0173381999006 Number1 :0.0171934420037 Double :0.0170864915312 Beak :0.0169533238355 MN105 :0.0169216400508 MN83 :0.0168828868947 Hook :0.0166108587996 SN90 :0.0162182764868 Shmuddel :0.015898717753 DN63 :0.0156906891579 PL :0.0153202288699 Fish :0.0150996337083 Oscar :0.0148566792738 Zap :0.0147590831571 DN16 :0.0145192871451 Ripplefluke :0.0133626577398 Bumper :0.0133280233838 Knit :0.0129680844496 Thumper :0.0128173211756 TSN103 :0.0120603851553 Mus :0.0115560295913 Notch :0.0112467062498 Zipfel :0.0110318043671 MN60 :0.00985621187565 CCL :0.00962034429327 TR88 :0.00887000341 TR120 :0.00881850953712 Wave :0.00836058683041 TSN83 :0.00817461979539 SN89 :0.00778446718663 Vau :0.00748623884361 Zig :0.00620544119831 Quasi :0.0054370288389 MN23 :0.00543030882186 TR82 :0.00528011214892 Cross :0.00507523069316 Five :0.00507523069316 Whitetip :0.0049602682216 SMN5 :0.00491419391454 Fork :0.00483181375568 ``` Conclusion ================ PageRank worked well for most of the datasets, but it worked best for datasets with Directed graphs. The results seemed less relevant for undirected graphs because each edge specifies a two way relation between two nodes. Directed graphs are more specific in that a link to a page does not necessarily imply that page links back. For this reason we conclude that pageRank is much more useful on directed graphs. Also, at first we only iterated over the graph once, but realizing that we need to recalibrate the links after every run, we ran it over the graph multiple times, which yielded much better results.
本源码包内暂不包含可直接显示的源代码文件,请下载源码包。