CHANGES.txt
上传用户:quxuerui
上传日期:2018-01-08
资源大小:41811k
文件大小:325k
- 77. HADOOP-1515. Add MultiFileInputFormat, which can pack multiple,
- typically small, input files into each split. (Enis Soztutar via cutting)
- 78. HADOOP-1514. Make reducers report progress while waiting for map
- outputs, so they're not killed. (Vivek Ratan via cutting)
- 79. HADOOP-1508. Add an Ant task for FsShell operations. Also add
- new FsShell commands "touchz", "test" and "stat".
- (Chris Douglas via cutting)
- 80. HADOOP-1028. Add log messages for server startup and shutdown.
- (Tsz Wo Sze via cutting)
- 81. HADOOP-1485. Add metrics for monitoring shuffle.
- (Devaraj Das via cutting)
- 82. HADOOP-1536. Remove file locks from libhdfs tests.
- (Dhruba Borthakur via nigel)
- 83. HADOOP-1520. Add appropriate synchronization to FSEditsLog.
- (Dhruba Borthakur via nigel)
- 84. HADOOP-1513. Fix a race condition in directory creation.
- (Devaraj via omalley)
- 85. HADOOP-1546. Remove spurious column from HDFS web UI.
- (Dhruba Borthakur via cutting)
- 86. HADOOP-1556. Make LocalJobRunner delete working files at end of
- job run. (Devaraj Das via tomwhite)
- 87. HADOOP-1571. Add contrib lib directories to root build.xml
- javadoc classpath. (Michael Stack via tomwhite)
- 88. HADOOP-1554. Log killed tasks to the job history and display them on the
- web/ui. (Devaraj Das via omalley)
- 89. HADOOP-1533. Add persistent error logging for distcp. The logs are stored
- into a specified hdfs directory. (Senthil Subramanian via omalley)
- 90. HADOOP-1286. Add support to HDFS for distributed upgrades, which
- permits coordinated upgrade of datanode data.
- (Konstantin Shvachko via cutting)
- 91. HADOOP-1580. Improve contrib/streaming so that subprocess exit
- status is displayed for errors. (John Heidemann via cutting)
- 92. HADOOP-1448. In HDFS, randomize lists of non-local block
- locations returned to client, so that load is better balanced.
- (Hairong Kuang via cutting)
- 93. HADOOP-1578. Fix datanode to send its storage id to namenode
- during registration. (Konstantin Shvachko via cutting)
- 94. HADOOP-1584. Fix a bug in GenericWritable which limited it to
- 128 types instead of 256. (Espen Amble Kolstad via cutting)
- 95. HADOOP-1473. Make job ids unique across jobtracker restarts.
- (omalley via cutting)
- 96. HADOOP-1582. Fix hdfslib to return 0 instead of -1 at
- end-of-file, per C conventions. (Christian Kunz via cutting)
- 97. HADOOP-911. Fix a multithreading bug in libhdfs.
- (Christian Kunz)
- 98. HADOOP-1486. Fix so that fatal exceptions in namenode cause it
- to exit. (Dhruba Borthakur via cutting)
- 99. HADOOP-1470. Factor checksum generation and validation out of
- ChecksumFileSystem so that it can be reused by FileSystem's with
- built-in checksumming. (Hairong Kuang via cutting)
- 100. HADOOP-1590. Use relative urls in jobtracker jsp pages, so that
- webapp can be used in non-root contexts. (Thomas Friol via cutting)
- 101. HADOOP-1596. Fix the parsing of taskids by streaming and improve the
- error reporting. (omalley)
- 102. HADOOP-1535. Fix the user-controlled grouping to the reduce function.
- (Vivek Ratan via omalley)
- 103. HADOOP-1585. Modify GenericWritable to declare the classes as subtypes
- of Writable (Espen Amble Kolstad via omalley)
- 104. HADOOP-1576. Fix errors in count of completed tasks when
- speculative execution is enabled. (Arun C Murthy via cutting)
- 105. HADOOP-1598. Fix license headers: adding missing; updating old.
- (Enis Soztutar via cutting)
- 106. HADOOP-1547. Provide examples for aggregate library.
- (Runping Qi via tomwhite)
- 107. HADOOP-1570. Permit jobs to enable and disable the use of
- hadoop's native library. (Arun C Murthy via cutting)
- 108. HADOOP-1433. Add job priority. (Johan Oskarsson via tomwhite)
- 109. HADOOP-1597. Add status reports and post-upgrade options to HDFS
- distributed upgrade. (Konstantin Shvachko via cutting)
- 110. HADOOP-1524. Permit user task logs to appear as they're
- created. (Michael Bieniosek via cutting)
- 111. HADOOP-1599. Fix distcp bug on Windows. (Senthil Subramanian via cutting)
- 112. HADOOP-1562. Add JVM metrics, including GC and logging stats.
- (David Bowen via cutting)
- 113. HADOOP-1613. Fix "DFS Health" page to display correct time of
- last contact. (Dhruba Borthakur via cutting)
- 114. HADOOP-1134. Add optimized checksum support to HDFS. Checksums
- are now stored with each block, rather than as parallel files.
- This reduces the namenode's memory requirements and increases
- data integrity. (Raghu Angadi via cutting)
- 115. HADOOP-1400. Make JobClient retry requests, so that clients can
- survive jobtracker problems. (omalley via cutting)
- 116. HADOOP-1564. Add unit tests for HDFS block-level checksums.
- (Dhruba Borthakur via cutting)
- 117. HADOOP-1620. Reduce the number of abstract FileSystem methods,
- simplifying implementations. (cutting)
- 118. HADOOP-1625. Fix a "could not move files" exception in datanode.
- (Raghu Angadi via cutting)
- 119. HADOOP-1624. Fix an infinite loop in datanode. (Raghu Angadi via cutting)
- 120. HADOOP-1084. Switch mapred file cache to use file modification
- time instead of checksum to detect file changes, as checksums are
- no longer easily accessed. (Arun C Murthy via cutting)
- 130. HADOOP-1623. Fix an infinite loop when copying directories.
- (Dhruba Borthakur via cutting)
- 131. HADOOP-1603. Fix a bug in namenode initialization where
- default replication is sometimes reset to one on restart.
- (Raghu Angadi via cutting)
- 132. HADOOP-1635. Remove hardcoded keypair name and fix launch-hadoop-cluster
- to support later versions of ec2-api-tools. (Stu Hood via tomwhite)
- 133. HADOOP-1638. Fix contrib EC2 scripts to support NAT addressing.
- (Stu Hood via tomwhite)
- 134. HADOOP-1632. Fix an IllegalArgumentException in fsck.
- (Hairong Kuang via cutting)
- 135. HADOOP-1619. Fix FSInputChecker to not attempt to read past EOF.
- (Hairong Kuang via cutting)
- 136. HADOOP-1640. Fix TestDecommission on Windows.
- (Dhruba Borthakur via cutting)
- 137. HADOOP-1587. Fix TestSymLink to get required system properties.
- (Devaraj Das via omalley)
- 138. HADOOP-1628. Add block CRC protocol unit tests. (Raghu Angadi via omalley)
- 139. HADOOP-1653. FSDirectory code-cleanups. FSDirectory.INode
- becomes a static class. (Christophe Taton via dhruba)
- 140. HADOOP-1066. Restructure documentation to make more user
- friendly. (Connie Kleinjans and Jeff Hammerbacher via cutting)
- 141. HADOOP-1551. libhdfs supports setting replication factor and
- retrieving modification time of files. (Sameer Paranjpye via dhruba)
- 141. HADOOP-1647. FileSystem.getFileStatus returns valid values for "/".
- (Dhruba Borthakur via dhruba)
- 142. HADOOP-1657. Fix NNBench to ensure that the block size is a
- multiple of bytes.per.checksum. (Raghu Angadi via dhruba)
- 143. HADOOP-1553. Replace user task output and log capture code to use shell
- redirection instead of copier threads in the TaskTracker. Capping the
- size of the output is now done via tail in memory and thus should not be
- large. The output of the tasklog servlet is not forced into UTF8 and is
- not buffered entirely in memory. (omalley)
- Configuration changes to hadoop-default.xml:
- remove mapred.userlog.num.splits
- remove mapred.userlog.purge.splits
- change default mapred.userlog.limit.kb to 0 (no limit)
- change default mapred.userlog.retain.hours to 24
- Configuration changes to log4j.properties:
- remove log4j.appender.TLA.noKeepSplits
- remove log4j.appender.TLA.purgeLogSplits
- remove log4j.appender.TLA.logsRetainHours
- URL changes:
- http://<tasktracker>/tasklog.jsp -> http://<tasktracker>tasklog with
- parameters limited to start and end, which may be positive (from
- start) or negative (from end).
- Environment:
- require bash (v2 or later) and tail
- 144. HADOOP-1659. Fix a job id/job name mixup. (Arun C. Murthy via omalley)
- 145. HADOOP-1665. With HDFS Trash enabled and the same file was created
- and deleted more than once, the suceeding deletions creates Trash item
- names suffixed with a integer. (Dhruba Borthakur via dhruba)
- 146. HADOOP-1666. FsShell object can be used for multiple fs commands.
- (Dhruba Borthakur via dhruba)
- 147. HADOOP-1654. Remove performance regression introduced by Block CRC.
- (Raghu Angadi via dhruba)
- 148. HADOOP-1680. Improvements to Block CRC upgrade messages.
- (Raghu Angadi via dhruba)
- 149. HADOOP-71. Allow Text and SequenceFile Map/Reduce inputs from non-default
- filesystems. (omalley)
- 150. HADOOP-1568. Expose HDFS as xml/http filesystem to provide cross-version
- compatability. (Chris Douglas via omalley)
- 151. HADOOP-1668. Added an INCOMPATIBILITY section to CHANGES.txt. (nigel)
- 152. HADOOP-1629. Added a upgrade test for HADOOP-1134.
- (Raghu Angadi via nigel)
- 153. HADOOP-1698. Fix performance problems on map output sorting for jobs
- with large numbers of reduces. (Devaraj Das via omalley)
- 154. HADOOP-1716. Fix a Pipes wordcount example to remove the 'file:'
- schema from its output path. (omalley via cutting)
- 155. HADOOP-1714. Fix TestDFSUpgradeFromImage to work on Windows.
- (Raghu Angadi via nigel)
- 156. HADOOP-1663. Return a non-zero exit code if streaming fails. (Lohit Renu
- via omalley)
- 157. HADOOP-1712. Fix an unhandled exception on datanode during block
- CRC upgrade. (Raghu Angadi via cutting)
- 158. HADOOP-1717. Fix TestDFSUpgradeFromImage to work on Solaris.
- (nigel via cutting)
- 159. HADOOP-1437. Add Eclipse plugin in contrib.
- (Eugene Hung and Christophe Taton via cutting)
- Release 0.13.0 - 2007-06-08
- 1. HADOOP-1047. Fix TestReplication to succeed more reliably.
- (Hairong Kuang via cutting)
- 2. HADOOP-1063. Fix a race condition in MiniDFSCluster test code.
- (Hairong Kuang via cutting)
- 3. HADOOP-1101. In web ui, split shuffle statistics from reduce
- statistics, and add some task averages. (Devaraj Das via cutting)
- 4. HADOOP-1071. Improve handling of protocol version mismatch in
- JobTracker. (Tahir Hashmi via cutting)
- 5. HADOOP-1116. Increase heap size used for contrib unit tests.
- (Philippe Gassmann via cutting)
- 6. HADOOP-1120. Add contrib/data_join, tools to simplify joining
- data from multiple sources using MapReduce. (Runping Qi via cutting)
- 7. HADOOP-1064. Reduce log level of some DFSClient messages.
- (Dhruba Borthakur via cutting)
- 8. HADOOP-1137. Fix StatusHttpServer to work correctly when
- resources are in a jar file. (Benjamin Reed via cutting)
- 9. HADOOP-1094. Optimize generated Writable implementations for
- records to not allocate a new BinaryOutputArchive or
- BinaryInputArchive per call. (Milind Bhandarkar via cutting)
- 10. HADOOP-1068. Improve error message for clusters with 0 datanodes.
- (Dhruba Borthakur via tomwhite)
- 11. HADOOP-1122. Fix divide-by-zero exception in FSNamesystem
- chooseTarget method. (Dhruba Borthakur via tomwhite)
- 12. HADOOP-1131. Add a closeAll() static method to FileSystem.
- (Philippe Gassmann via tomwhite)
- 13. HADOOP-1085. Improve port selection in HDFS and MapReduce test
- code. Ports are now selected by the OS during testing rather than
- by probing for free ports, improving test reliability.
- (Arun C Murthy via cutting)
- 14. HADOOP-1153. Fix HDFS daemons to correctly stop their threads.
- (Konstantin Shvachko via cutting)
- 15. HADOOP-1146. Add a counter for reduce input keys and rename the
- "reduce input records" counter to be "reduce input groups".
- (David Bowen via cutting)
- 16. HADOOP-1165. In records, replace idential generated toString
- methods with a method on the base class. (Milind Bhandarkar via cutting)
- 17. HADOOP-1164. Fix TestReplicationPolicy to specify port zero, so
- that a free port is automatically selected. (omalley via cutting)
- 18. HADOOP-1166. Add a NullOutputFormat and use it in the
- RandomWriter example. (omalley via cutting)
- 19. HADOOP-1169. Fix a cut/paste error in CopyFiles utility so that
- S3-based source files are correctly copied. (Michael Stack via cutting)
- 20. HADOOP-1167. Remove extra synchronization in InMemoryFileSystem.
- (omalley via cutting)
- 21. HADOOP-1110. Fix an off-by-one error counting map inputs.
- (David Bowen via cutting)
- 22. HADOOP-1178. Fix a NullPointerException during namenode startup.
- (Dhruba Borthakur via cutting)
- 23. HADOOP-1011. Fix a ConcurrentModificationException when viewing
- job history. (Tahir Hashmi via cutting)
- 24. HADOOP-672. Improve help for fs shell commands.
- (Dhruba Borthakur via cutting)
- 25. HADOOP-1170. Improve datanode performance by removing device
- checks from common operations. (Igor Bolotin via cutting)
- 26. HADOOP-1090. Fix SortValidator's detection of whether the input
- file belongs to the sort-input or sort-output directory.
- (Arun C Murthy via tomwhite)
- 27. HADOOP-1081. Fix bin/hadoop on Darwin. (Michael Bieniosek via cutting)
- 28. HADOOP-1045. Add contrib/hbase, a BigTable-like online database.
- (Jim Kellerman via cutting)
- 29. HADOOP-1156. Fix a NullPointerException in MiniDFSCluster.
- (Hairong Kuang via cutting)
- 30. HADOOP-702. Add tools to help automate HDFS upgrades.
- (Konstantin Shvachko via cutting)
- 31. HADOOP-1163. Fix ganglia metrics to aggregate metrics from different
- hosts properly. (Michael Bieniosek via tomwhite)
- 32. HADOOP-1194. Make compression style record level for map output
- compression. (Arun C Murthy via tomwhite)
- 33. HADOOP-1187. Improve DFS Scalability: avoid scanning entire list of
- datanodes in getAdditionalBlocks. (Dhruba Borthakur via tomwhite)
- 34. HADOOP-1133. Add tool to analyze and debug namenode on a production
- cluster. (Dhruba Borthakur via tomwhite)
- 35. HADOOP-1151. Remove spurious printing to stderr in streaming
- PipeMapRed. (Koji Noguchi via tomwhite)
- 36. HADOOP-988. Change namenode to use a single map of blocks to metadata.
- (Raghu Angadi via tomwhite)
- 37. HADOOP-1203. Change UpgradeUtilities used by DFS tests to use
- MiniDFSCluster to start and stop NameNode/DataNodes.
- (Nigel Daley via tomwhite)
- 38. HADOOP-1217. Add test.timeout property to build.xml, so that
- long-running unit tests may be automatically terminated.
- (Nigel Daley via cutting)
- 39. HADOOP-1149. Improve DFS Scalability: make
- processOverReplicatedBlock() a no-op if blocks are not
- over-replicated. (Raghu Angadi via tomwhite)
- 40. HADOOP-1149. Improve DFS Scalability: optimize getDistance(),
- contains(), and isOnSameRack() in NetworkTopology.
- (Hairong Kuang via tomwhite)
- 41. HADOOP-1218. Make synchronization on TaskTracker's RunningJob
- object consistent. (Devaraj Das via tomwhite)
- 42. HADOOP-1219. Ignore progress report once a task has reported as
- 'done'. (Devaraj Das via tomwhite)
- 43. HADOOP-1114. Permit user to specify additional CLASSPATH elements
- with a HADOOP_CLASSPATH environment variable. (cutting)
- 44. HADOOP-1198. Remove ipc.client.timeout parameter override from
- unit test configuration. Using the default is more robust and
- has almost the same run time. (Arun C Murthy via tomwhite)
- 45. HADOOP-1211. Remove deprecated constructor and unused static
- members in DataNode class. (Konstantin Shvachko via tomwhite)
- 46. HADOOP-1136. Fix ArrayIndexOutOfBoundsException in
- FSNamesystem$UnderReplicatedBlocks add() method.
- (Hairong Kuang via tomwhite)
- 47. HADOOP-978. Add the client name and the address of the node that
- previously started to create the file to the description of
- AlreadyBeingCreatedException. (Konstantin Shvachko via tomwhite)
- 48. HADOOP-1001. Check the type of keys and values generated by the
- mapper against the types specified in JobConf.
- (Tahir Hashmi via tomwhite)
- 49. HADOOP-971. Improve DFS Scalability: Improve name node performance
- by adding a hostname to datanodes map. (Hairong Kuang via tomwhite)
- 50. HADOOP-1189. Fix 'No space left on device' exceptions on datanodes.
- (Raghu Angadi via tomwhite)
- 51. HADOOP-819. Change LineRecordWriter to not insert a tab between
- key and value when either is null, and to print nothing when both
- are null. (Runping Qi via cutting)
- 52. HADOOP-1204. Rename InputFormatBase to be FileInputFormat, and
- deprecate InputFormatBase. Also make LineRecordReader easier to
- extend. (Runping Qi via cutting)
- 53. HADOOP-1213. Improve logging of errors by IPC server, to
- consistently include the service name and the call. (cutting)
- 54. HADOOP-1238. Fix metrics reporting by TaskTracker to correctly
- track maps_running and reduces_running.
- (Michael Bieniosek via cutting)
- 55. HADOOP-1093. Fix a race condition in HDFS where blocks were
- sometimes erased before they were reported written.
- (Dhruba Borthakur via cutting)
- 56. HADOOP-1239. Add a package name to some testjar test classes.
- (Jim Kellerman via cutting)
- 57. HADOOP-1241. Fix NullPointerException in processReport when
- namenode is restarted. (Dhruba Borthakur via tomwhite)
- 58. HADOOP-1244. Fix stop-dfs.sh to no longer incorrectly specify
- slaves file for stopping datanode.
- (Michael Bieniosek via tomwhite)
- 59. HADOOP-1253. Fix ConcurrentModificationException and
- NullPointerException in JobControl.
- (Johan Oskarson via tomwhite)
- 60. HADOOP-1256. Fix NameNode so that multiple DataNodeDescriptors
- can no longer be created on startup. (Hairong Kuang via cutting)
- 61. HADOOP-1214. Replace streaming classes with new counterparts
- from Hadoop core. (Runping Qi via tomwhite)
- 62. HADOOP-1250. Move a chmod utility from streaming to FileUtil.
- (omalley via cutting)
- 63. HADOOP-1258. Fix TestCheckpoint test case to wait for
- MiniDFSCluster to be active. (Nigel Daley via tomwhite)
- 64. HADOOP-1148. Re-indent all Java source code to consistently use
- two spaces per indent level. (cutting)
- 65. HADOOP-1251. Add a method to Reporter to get the map InputSplit.
- (omalley via cutting)
- 66. HADOOP-1224. Fix "Browse the filesystem" link to no longer point
- to dead datanodes. (Enis Soztutar via tomwhite)
- 67. HADOOP-1154. Fail a streaming task if the threads reading from or
- writing to the streaming process fail. (Koji Noguchi via tomwhite)
- 68. HADOOP-968. Move shuffle and sort to run in reduce's child JVM,
- rather than in TaskTracker. (Devaraj Das via cutting)
- 69. HADOOP-1111. Add support for client notification of job
- completion. If the job configuration has a job.end.notification.url
- property it will make a HTTP GET request to the specified URL.
- The number of retries and the interval between retries is also
- configurable. (Alejandro Abdelnur via tomwhite)
- 70. HADOOP-1275. Fix misspelled job notification property in
- hadoop-default.xml. (Alejandro Abdelnur via tomwhite)
- 71. HADOOP-1152. Fix race condition in MapOutputCopier.copyOutput file
- rename causing possible reduce task hang.
- (Tahir Hashmi via tomwhite)
- 72. HADOOP-1050. Distinguish between failed and killed tasks so as to
- not count a lost tasktracker against the job.
- (Arun C Murthy via tomwhite)
- 73. HADOOP-1271. Fix StreamBaseRecordReader to be able to log record
- data that's not UTF-8. (Arun C Murthy via tomwhite)
- 74. HADOOP-1190. Fix unchecked warnings in main Hadoop code.
- (tomwhite)
- 75. HADOOP-1127. Fix AlreadyBeingCreatedException in namenode for
- jobs run with speculative execution.
- (Arun C Murthy via tomwhite)
- 76. HADOOP-1282. Omnibus HBase patch. Improved tests & configuration.
- (Jim Kellerman via cutting)
- 77. HADOOP-1262. Make dfs client try to read from a different replica
- of the checksum file when a checksum error is detected.
- (Hairong Kuang via tomwhite)
- 78. HADOOP-1279. Fix JobTracker to maintain list of recently
- completed jobs by order of completion, not submission.
- (Arun C Murthy via cutting)
- 79. HADOOP-1284. In contrib/streaming, permit flexible specification
- of field delimiter and fields for partitioning and sorting.
- (Runping Qi via cutting)
- 80. HADOOP-1176. Fix a bug where reduce would hang when a map had
- more than 2GB of output for it. (Arun C Murthy via cutting)
- 81. HADOOP-1293. Fix contrib/streaming to print more than the first
- twenty lines of standard error. (Koji Noguchi via cutting)
- 82. HADOOP-1297. Fix datanode so that requests to remove blocks that
- do not exist no longer causes block reports to be re-sent every
- second. (Dhruba Borthakur via cutting)
- 83. HADOOP-1216. Change MapReduce so that, when numReduceTasks is
- zero, map outputs are written directly as final output, skipping
- shuffle, sort and reduce. Use this to implement reduce=NONE
- option in contrib/streaming. (Runping Qi via cutting)
- 84. HADOOP-1294. Fix unchecked warnings in main Hadoop code under
- Java 6. (tomwhite)
- 85. HADOOP-1299. Fix so that RPC will restart after RPC.stopClient()
- has been called. (Michael Stack via cutting)
- 86. HADOOP-1278. Improve blacklisting of TaskTrackers by JobTracker,
- to reduce false positives. (Arun C Murthy via cutting)
- 87. HADOOP-1290. Move contrib/abacus into mapred/lib/aggregate.
- (Runping Qi via cutting)
- 88. HADOOP-1272. Extract inner classes from FSNamesystem into separate
- classes. (Dhruba Borthakur via tomwhite)
- 89. HADOOP-1247. Add support to contrib/streaming for aggregate
- package, formerly called Abacus. (Runping Qi via cutting)
- 90. HADOOP-1061. Fix bug in listing files in the S3 filesystem.
- NOTE: this change is not backwards compatible! You should use the
- MigrationTool supplied to migrate existing S3 filesystem data to
- the new format. Please backup your data first before upgrading
- (using 'hadoop distcp' for example). (tomwhite)
- 91. HADOOP-1304. Make configurable the maximum number of task
- attempts before a job fails. (Devaraj Das via cutting)
- 92. HADOOP-1308. Use generics to restrict types when classes are
- passed as parameters to JobConf methods. (Michael Bieniosek via cutting)
- 93. HADOOP-1312. Fix a ConcurrentModificationException in NameNode
- that killed the heartbeat monitoring thread.
- (Dhruba Borthakur via cutting)
- 94. HADOOP-1315. Clean up contrib/streaming, switching it to use core
- classes more and removing unused code. (Runping Qi via cutting)
- 95. HADOOP-485. Allow a different comparator for grouping keys in
- calls to reduce. (Tahir Hashmi via cutting)
- 96. HADOOP-1322. Fix TaskTracker blacklisting to work correctly in
- one- and two-node clusters. (Arun C Murthy via cutting)
- 97. HADOOP-1144. Permit one to specify a maximum percentage of tasks
- that can fail before a job is aborted. The default is zero.
- (Arun C Murthy via cutting)
- 98. HADOOP-1184. Fix HDFS decomissioning to complete when the only
- copy of a block is on a decommissioned node. (Dhruba Borthakur via cutting)
- 99. HADOOP-1263. Change DFSClient to retry certain namenode calls
- with a random, exponentially increasing backoff time, to avoid
- overloading the namenode on, e.g., job start. (Hairong Kuang via cutting)
- 100. HADOOP-1325. First complete, functioning version of HBase.
- (Jim Kellerman via cutting)
- 101. HADOOP-1276. Make tasktracker expiry interval configurable.
- (Arun C Murthy via cutting)
- 102. HADOOP-1326. Change JobClient#RunJob() to return the job.
- (omalley via cutting)
- 103. HADOOP-1270. Randomize the fetch of map outputs, speeding the
- shuffle. (Arun C Murthy via cutting)
- 104. HADOOP-1200. Restore disk checking lost in HADOOP-1170.
- (Hairong Kuang via cutting)
- 105. HADOOP-1252. Changed MapReduce's allocation of local files to
- use round-robin among available devices, rather than a hashcode.
- More care is also taken to not allocate files on full or offline
- drives. (Devaraj Das via cutting)
- 106. HADOOP-1324. Change so that an FSError kills only the task that
- generates it rather than the entire task tracker.
- (Arun C Murthy via cutting)
- 107. HADOOP-1310. Fix unchecked warnings in aggregate code. (tomwhite)
- 108. HADOOP-1255. Fix a bug where the namenode falls into an infinite
- loop trying to remove a dead node. (Hairong Kuang via cutting)
- 109. HADOOP-1160. Fix DistributedFileSystem.close() to close the
- underlying FileSystem, correctly aborting files being written.
- (Hairong Kuang via cutting)
- 110. HADOOP-1341. Fix intermittent failures in HBase unit tests
- caused by deadlock. (Jim Kellerman via cutting)
- 111. HADOOP-1350. Fix shuffle performance problem caused by forcing
- chunked encoding of map outputs. (Devaraj Das via cutting)
- 112. HADOOP-1345. Fix HDFS to correctly retry another replica when a
- checksum error is encountered. (Hairong Kuang via cutting)
- 113. HADOOP-1205. Improve synchronization around HDFS block map.
- (Hairong Kuang via cutting)
- 114. HADOOP-1353. Fix a potential NullPointerException in namenode.
- (Dhruba Borthakur via cutting)
- 115. HADOOP-1354. Fix a potential NullPointerException in FsShell.
- (Hairong Kuang via cutting)
- 116. HADOOP-1358. Fix a potential bug when DFSClient calls skipBytes.
- (Hairong Kuang via cutting)
- 117. HADOOP-1356. Fix a bug in ValueHistogram. (Runping Qi via cutting)
- 118. HADOOP-1363. Fix locking bug in JobClient#waitForCompletion().
- (omalley via cutting)
- 119. HADOOP-1368. Fix inconsistent synchronization in JobInProgress.
- (omalley via cutting)
- 120. HADOOP-1369. Fix inconsistent synchronization in TaskTracker.
- (omalley via cutting)
- 121. HADOOP-1361. Fix various calls to skipBytes() to check return
- value. (Hairong Kuang via cutting)
- 122. HADOOP-1388. Fix a potential NullPointerException in web ui.
- (Devaraj Das via cutting)
- 123. HADOOP-1385. Fix MD5Hash#hashCode() to generally hash to more
- than 256 values. (omalley via cutting)
- 124. HADOOP-1386. Fix Path to not permit the empty string as a
- path, as this has lead to accidental file deletion. Instead
- force applications to use "." to name the default directory.
- (Hairong Kuang via cutting)
- 125. HADOOP-1407. Fix integer division bug in JobInProgress which
- meant failed tasks didn't cause the job to fail.
- (Arun C Murthy via tomwhite)
- 126. HADOOP-1427. Fix a typo that caused GzipCodec to incorrectly use
- a very small input buffer. (Espen Amble Kolstad via cutting)
- 127. HADOOP-1435. Fix globbing code to no longer use the empty string
- to indicate the default directory, per HADOOP-1386.
- (Hairong Kuang via cutting)
- 128. HADOOP-1411. Make task retry framework handle
- AlreadyBeingCreatedException when wrapped as a RemoteException.
- (Hairong Kuang via tomwhite)
- 129. HADOOP-1242. Improve handling of DFS upgrades.
- (Konstantin Shvachko via cutting)
- 130. HADOOP-1332. Fix so that TaskTracker exits reliably during unit
- tests on Windows. (omalley via cutting)
- 131. HADOOP-1431. Fix so that sort progress reporting during map runs
- only while sorting, so that stuck maps are correctly terminated.
- (Devaraj Das and Arun C Murthy via cutting)
- 132. HADOOP-1452. Change TaskTracker.MapOutputServlet.doGet.totalRead
- to a long, permitting map outputs to exceed 2^31 bytes.
- (omalley via cutting)
- 133. HADOOP-1443. Fix a bug opening zero-length files in HDFS.
- (Konstantin Shvachko via cutting)
- Release 0.12.3 - 2007-04-06
- 1. HADOOP-1162. Fix bug in record CSV and XML serialization of
- binary values. (Milind Bhandarkar via cutting)
- 2. HADOOP-1123. Fix NullPointerException in LocalFileSystem when
- trying to recover from a checksum error.
- (Hairong Kuang & Nigel Daley via tomwhite)
- 3. HADOOP-1177. Fix bug where IOException in MapOutputLocation.getFile
- was not being logged. (Devaraj Das via tomwhite)
- 4. HADOOP-1175. Fix bugs in JSP for displaying a task's log messages.
- (Arun C Murthy via cutting)
- 5. HADOOP-1191. Fix map tasks to wait until sort progress thread has
- stopped before reporting the task done. (Devaraj Das via cutting)
- 6. HADOOP-1192. Fix an integer overflow bug in FSShell's 'dus'
- command and a performance problem in HDFS's implementation of it.
- (Hairong Kuang via cutting)
- 7. HADOOP-1105. Fix reducers to make "progress" while iterating
- through values. (Devaraj Das & Owen O'Malley via tomwhite)
- 8. HADOOP-1179. Make Task Tracker close index file as soon as the read
- is done when serving get-map-output requests.
- (Devaraj Das via tomwhite)
- Release 0.12.2 - 2007-23-17
- 1. HADOOP-1135. Fix bug in block report processing which may cause
- the namenode to delete blocks. (Dhruba Borthakur via tomwhite)
- 2. HADOOP-1145. Make XML serializer and deserializer classes public
- in record package. (Milind Bhandarkar via cutting)
- 3. HADOOP-1140. Fix a deadlock in metrics. (David Bowen via cutting)
- 4. HADOOP-1150. Fix streaming -reducer and -mapper to give them
- defaults. (Owen O'Malley via tomwhite)
- Release 0.12.1 - 2007-03-17
- 1. HADOOP-1035. Fix a StackOverflowError in FSDataSet.
- (Raghu Angadi via cutting)
- 2. HADOOP-1053. Fix VInt representation of negative values. Also
- remove references in generated record code to methods outside of
- the record package and improve some record documentation.
- (Milind Bhandarkar via cutting)
- 3. HADOOP-1067. Compile fails if Checkstyle jar is present in lib
- directory. Also remove dependency on a particular Checkstyle
- version number. (tomwhite)
- 4. HADOOP-1060. Fix an IndexOutOfBoundsException in the JobTracker
- that could cause jobs to hang. (Arun C Murthy via cutting)
- 5. HADOOP-1077. Fix a race condition fetching map outputs that could
- hang reduces. (Devaraj Das via cutting)
- 6. HADOOP-1083. Fix so that when a cluster restarts with a missing
- datanode, its blocks are replicated. (Hairong Kuang via cutting)
- 7. HADOOP-1082. Fix a NullPointerException in ChecksumFileSystem.
- (Hairong Kuang via cutting)
- 8. HADOOP-1088. Fix record serialization of negative values.
- (Milind Bhandarkar via cutting)
- 9. HADOOP-1080. Fix bug in bin/hadoop on Windows when native
- libraries are present. (ab via cutting)
- 10. HADOOP-1091. Fix a NullPointerException in MetricsRecord.
- (David Bowen via tomwhite)
- 11. HADOOP-1092. Fix a NullPointerException in HeartbeatMonitor
- thread. (Hairong Kuang via tomwhite)
- 12. HADOOP-1112. Fix a race condition in Hadoop metrics.
- (David Bowen via tomwhite)
- 13. HADOOP-1108. Checksummed file system should retry reading if a
- different replica is found when handling ChecksumException.
- (Hairong Kuang via tomwhite)
- 14. HADOOP-1070. Fix a problem with number of racks and datanodes
- temporarily doubling. (Konstantin Shvachko via tomwhite)
- 15. HADOOP-1099. Fix NullPointerException in JobInProgress.
- (Gautam Kowshik via tomwhite)
- 16. HADOOP-1115. Fix bug where FsShell copyToLocal doesn't
- copy directories. (Hairong Kuang via tomwhite)
- 17. HADOOP-1109. Fix NullPointerException in StreamInputFormat.
- (Koji Noguchi via tomwhite)
- 18. HADOOP-1117. Fix DFS scalability: when the namenode is
- restarted it consumes 80% CPU. (Dhruba Borthakur via
- tomwhite)
- 19. HADOOP-1089. Make the C++ version of write and read v-int
- agree with the Java versions. (Milind Bhandarkar via
- tomwhite)
- 20. HADOOP-1096. Rename InputArchive and OutputArchive and
- make them public. (Milind Bhandarkar via tomwhite)
- 21. HADOOP-1128. Fix missing progress information in map tasks.
- (Espen Amble Kolstad, Andrzej Bialecki, and Owen O'Malley
- via tomwhite)
- 22. HADOOP-1129. Fix DFSClient to not hide IOExceptions in
- flush method. (Hairong Kuang via tomwhite)
- 23. HADOOP-1126. Optimize CPU usage for under replicated blocks
- when cluster restarts. (Hairong Kuang via tomwhite)
- Release 0.12.0 - 2007-03-02
- 1. HADOOP-975. Separate stdout and stderr from tasks.
- (Arun C Murthy via cutting)
- 2. HADOOP-982. Add some setters and a toString() method to
- BytesWritable. (omalley via cutting)
- 3. HADOOP-858. Move contrib/smallJobsBenchmark to src/test, removing
- obsolete bits. (Nigel Daley via cutting)
- 4. HADOOP-992. Fix MiniMR unit tests to use MiniDFS when specified,
- rather than the local FS. (omalley via cutting)
- 5. HADOOP-954. Change use of metrics to use callback mechanism.
- Also rename utility class Metrics to MetricsUtil.
- (David Bowen & Nigel Daley via cutting)
- 6. HADOOP-893. Improve HDFS client's handling of dead datanodes.
- The set is no longer reset with each block, but rather is now
- maintained for the life of an open file. (Raghu Angadi via cutting)
- 7. HADOOP-882. Upgrade to jets3t version 0.5, used by the S3
- FileSystem. This version supports retries. (Michael Stack via cutting)
- 8. HADOOP-977. Send task's stdout and stderr to JobClient's stdout
- and stderr respectively, with each line tagged by the task's name.
- (Arun C Murthy via cutting)
- 9. HADOOP-761. Change unit tests to not use /tmp. (Nigel Daley via cutting)
- 10. HADOOP-1007. Make names of metrics used in Hadoop unique.
- (Nigel Daley via cutting)
- 11. HADOOP-491. Change mapred.task.timeout to be per-job, and make a
- value of zero mean no timeout. Also change contrib/streaming to
- disable task timeouts. (Arun C Murthy via cutting)
- 12. HADOOP-1010. Add Reporter.NULL, a Reporter implementation that
- does nothing. (Runping Qi via cutting)
- 13. HADOOP-923. In HDFS NameNode, move replication computation to a
- separate thread, to improve heartbeat processing time.
- (Dhruba Borthakur via cutting)
- 14. HADOOP-476. Rewrite contrib/streaming command-line processing,
- improving parameter validation. (Sanjay Dahiya via cutting)
- 15. HADOOP-973. Improve error messages in Namenode. This should help
- to track down a problem that was appearing as a
- NullPointerException. (Dhruba Borthakur via cutting)
- 16. HADOOP-649. Fix so that jobs with no tasks are not lost.
- (Thomas Friol via cutting)
- 17. HADOOP-803. Reduce memory use by HDFS namenode, phase I.
- (Raghu Angadi via cutting)
- 18. HADOOP-1021. Fix MRCaching-based unit tests on Windows.
- (Nigel Daley via cutting)
- 19. HADOOP-889. Remove duplicate code from HDFS unit tests.
- (Milind Bhandarkar via cutting)
- 20. HADOOP-943. Improve HDFS's fsck command to display the filename
- for under-replicated blocks. (Dhruba Borthakur via cutting)
- 21. HADOOP-333. Add validator for sort benchmark output.
- (Arun C Murthy via cutting)
- 22. HADOOP-947. Improve performance of datanode decomissioning.
- (Dhruba Borthakur via cutting)
- 23. HADOOP-442. Permit one to specify hosts allowed to connect to
- namenode and jobtracker with include and exclude files. (Wendy
- Chien via cutting)
- 24. HADOOP-1017. Cache constructors, for improved performance.
- (Ron Bodkin via cutting)
- 25. HADOOP-867. Move split creation out of JobTracker to client.
- Splits are now saved in a separate file, read by task processes
- directly, so that user code is no longer required in the
- JobTracker. (omalley via cutting)
- 26. HADOOP-1006. Remove obsolete '-local' option from test code.
- (Gautam Kowshik via cutting)
- 27. HADOOP-952. Create a public (shared) Hadoop EC2 AMI.
- The EC2 scripts now support launch of public AMIs.
- (tomwhite)
-
- 28. HADOOP-1025. Remove some obsolete code in ipc.Server. (cutting)
- 29. HADOOP-997. Implement S3 retry mechanism for failed block
- transfers. This includes a generic retry mechanism for use
- elsewhere in Hadoop. (tomwhite)
- 30. HADOOP-990. Improve HDFS support for full datanode volumes.
- (Raghu Angadi via cutting)
- 31. HADOOP-564. Replace uses of "dfs://" URIs with the more standard
- "hdfs://". (Wendy Chien via cutting)
- 32. HADOOP-1030. In unit tests, unify setting of ipc.client.timeout.
- Also increase the value used from one to two seconds, in hopes of
- making tests complete more reliably. (cutting)
- 33. HADOOP-654. Stop assigning tasks to a tasktracker if it has
- failed more than a specified number in the job.
- (Arun C Murthy via cutting)
- 34. HADOOP-985. Change HDFS to identify nodes by IP address rather
- than by DNS hostname. (Raghu Angadi via cutting)
- 35. HADOOP-248. Optimize location of map outputs to not use random
- probes. (Devaraj Das via cutting)
- 36. HADOOP-1029. Fix streaming's input format to correctly seek to
- the start of splits. (Arun C Murthy via cutting)
- 37. HADOOP-492. Add per-job and per-task counters. These are
- incremented via the Reporter interface and available through the
- web ui and the JobClient API. The mapreduce framework maintains a
- few basic counters, and applications may add their own. Counters
- are also passed to the metrics system.
- (David Bowen via cutting)
- 38. HADOOP-1034. Fix datanode to better log exceptions.
- (Philippe Gassmann via cutting)
- 39. HADOOP-878. In contrib/streaming, fix reducer=NONE to work with
- multiple maps. (Arun C Murthy via cutting)
- 40. HADOOP-1039. In HDFS's TestCheckpoint, avoid restarting
- MiniDFSCluster so often, speeding this test. (Dhruba Borthakur via cutting)
- 41. HADOOP-1040. Update RandomWriter example to use counters and
- user-defined input and output formats. (omalley via cutting)
- 42. HADOOP-1027. Fix problems with in-memory merging during shuffle
- and re-enable this optimization. (Devaraj Das via cutting)
- 43. HADOOP-1036. Fix exception handling in TaskTracker to keep tasks
- from being lost. (Arun C Murthy via cutting)
- 44. HADOOP-1042. Improve the handling of failed map output fetches.
- (Devaraj Das via cutting)
- 45. HADOOP-928. Make checksums optional per FileSystem.
- (Hairong Kuang via cutting)
- 46. HADOOP-1044. Fix HDFS's TestDecommission to not spuriously fail.
- (Wendy Chien via cutting)
- 47. HADOOP-972. Optimize HDFS's rack-aware block placement algorithm.
- (Hairong Kuang via cutting)
- 48. HADOOP-1043. Optimize shuffle, increasing parallelism.
- (Devaraj Das via cutting)
- 49. HADOOP-940. Improve HDFS's replication scheduling.
- (Dhruba Borthakur via cutting)
- 50. HADOOP-1020. Fix a bug in Path resolution, and a with unit tests
- on Windows. (cutting)
- 51. HADOOP-941. Enhance record facility.
- (Milind Bhandarkar via cutting)
- 52. HADOOP-1000. Fix so that log messages in task subprocesses are
- not written to a task's standard error. (Arun C Murthy via cutting)
- 53. HADOOP-1037. Fix bin/slaves.sh, which currently only works with
- /bin/bash, to specify /bin/bash rather than /bin/sh. (cutting)
- 54. HADOOP-1046. Clean up tmp from partially received stale block files. (ab)
- 55. HADOOP-1041. Optimize mapred counter implementation. Also group
- counters by their declaring Enum. (David Bowen via cutting)
- 56. HADOOP-1032. Permit one to specify jars that will be cached
- across multiple jobs. (Gautam Kowshik via cutting)
- 57. HADOOP-1051. Add optional checkstyle task to build.xml. To use
- this developers must download the (LGPL'd) checkstyle jar
- themselves. (tomwhite via cutting)
- 58. HADOOP-1049. Fix a race condition in IPC client.
- (Devaraj Das via cutting)
- 60. HADOOP-1056. Check HDFS include/exclude node lists with both IP
- address and hostname. (Wendy Chien via cutting)
- 61. HADOOP-994. In HDFS, limit the number of blocks invalidated at
- once. Large lists were causing datenodes to timeout.
- (Dhruba Borthakur via cutting)
- 62. HADOOP-432. Add a trash feature, disabled by default. When
- enabled, the FSShell 'rm' command will move things to a trash
- directory in the filesystem. In HDFS, a thread periodically
- checkpoints the trash and removes old checkpoints. (cutting)
- Release 0.11.2 - 2007-02-16
- 1. HADOOP-1009. Fix an infinite loop in the HDFS namenode.
- (Dhruba Borthakur via cutting)
- 2. HADOOP-1014. Disable in-memory merging during shuffle, as this is
- causing data corruption. (Devaraj Das via cutting)
- Release 0.11.1 - 2007-02-09
- 1. HADOOP-976. Make SequenceFile.Metadata public. (Runping Qi via cutting)
- 2. HADOOP-917. Fix a NullPointerException in SequenceFile's merger
- with large map outputs. (omalley via cutting)
- 3. HADOOP-984. Fix a bug in shuffle error handling introduced by
- HADOOP-331. If a map output is unavailable, the job tracker is
- once more informed. (Arun C Murthy via cutting)
- 4. HADOOP-987. Fix a problem in HDFS where blocks were not removed
- from neededReplications after a replication target was selected.
- (Hairong Kuang via cutting)
- Release 0.11.0 - 2007-02-02
- 1. HADOOP-781. Remove methods deprecated in 0.10 that are no longer
- widely used. (cutting)
- 2. HADOOP-842. Change HDFS protocol so that the open() method is
- passed the client hostname, to permit the namenode to order block
- locations on the basis of network topology.
- (Hairong Kuang via cutting)
- 3. HADOOP-852. Add an ant task to compile record definitions, and
- use it to compile record unit tests. (Milind Bhandarkar via cutting)
- 4. HADOOP-757. Fix "Bad File Descriptor" exception in HDFS client
- when an output file is closed twice. (Raghu Angadi via cutting)
- 5. [ intentionally blank ]
- 6. HADOOP-890. Replace dashes in metric names with underscores,
- for better compatibility with some monitoring systems.
- (Nigel Daley via cutting)
- 7. HADOOP-801. Add to jobtracker a log of task completion events.
- (Sanjay Dahiya via cutting)
- 8. HADOOP-855. In HDFS, try to repair files with checksum errors.
- An exception is still thrown, but corrupt blocks are now removed
- when they have replicas. (Wendy Chien via cutting)
- 9. HADOOP-886. Reduce number of timer threads created by metrics API
- by pooling contexts. (Nigel Daley via cutting)
- 10. HADOOP-897. Add a "javac.args" property to build.xml that permits
- one to pass arbitrary options to javac. (Milind Bhandarkar via cutting)
- 11. HADOOP-899. Update libhdfs for changes in HADOOP-871.
- (Sameer Paranjpye via cutting)
- 12. HADOOP-905. Remove some dead code from JobClient. (cutting)
- 13. HADOOP-902. Fix a NullPointerException in HDFS client when
- closing output streams. (Raghu Angadi via cutting)
- 14. HADOOP-735. Switch generated record code to use BytesWritable to
- represent fields of type 'buffer'. (Milind Bhandarkar via cutting)
- 15. HADOOP-830. Improve mapreduce merge performance by buffering and
- merging multiple map outputs as they arrive at reduce nodes before
- they're written to disk. (Devaraj Das via cutting)
- 16. HADOOP-908. Add a new contrib package, Abacus, that simplifies
- counting and aggregation, built on MapReduce. (Runping Qi via cutting)
- 17. HADOOP-901. Add support for recursive renaming to the S3 filesystem.
- (Tom White via cutting)
- 18. HADOOP-912. Fix a bug in TaskTracker.isIdle() that was
- sporadically causing unit test failures. (Arun C Murthy via cutting)
- 19. HADOOP-909. Fix the 'du' command to correctly compute the size of
- FileSystem directory trees. (Hairong Kuang via cutting)
- 20. HADOOP-731. When a checksum error is encountered on a file stored
- in HDFS, try another replica of the data, if any.
- (Wendy Chien via cutting)
- 21. HADOOP-732. Add support to SequenceFile for arbitrary metadata,
- as a set of attribute value pairs. (Runping Qi via cutting)
- 22. HADOOP-929. Fix PhasedFileSystem to pass configuration to
- underlying FileSystem. (Sanjay Dahiya via cutting)
- 23. HADOOP-935. Fix contrib/abacus to not delete pre-existing output
- files, but rather to fail in this case. (Runping Qi via cutting)
- 24. HADOOP-936. More metric renamings, as in HADOOP-890.
- (Nigel Daley via cutting)
- 25. HADOOP-856. Fix HDFS's fsck command to not report that
- non-existent filesystems are healthy. (Milind Bhandarkar via cutting)
- 26. HADOOP-602. Remove the dependency on Lucene's PriorityQueue
- utility, by copying it into Hadoop. This facilitates using Hadoop
- with different versions of Lucene without worrying about CLASSPATH
- order. (Milind Bhandarkar via cutting)
- 27. [ intentionally blank ]
- 28. HADOOP-227. Add support for backup namenodes, which periodically
- get snapshots of the namenode state. (Dhruba Borthakur via cutting)
- 29. HADOOP-884. Add scripts in contrib/ec2 to facilitate running
- Hadoop on an Amazon's EC2 cluster. (Tom White via cutting)
- 30. HADOOP-937. Change the namenode to request re-registration of
- datanodes in more circumstances. (Hairong Kuang via cutting)
- 31. HADOOP-922. Optimize small forward seeks in HDFS. If data is has
- likely already in flight, skip ahead rather than re-opening the
- block. (Dhruba Borthakur via cutting)
- 32. HADOOP-961. Add a 'job -events' sub-command that prints job
- events, including task completions and failures. (omalley via cutting)
- 33. HADOOP-959. Fix namenode snapshot code added in HADOOP-227 to
- work on Windows. (Dhruba Borthakur via cutting)
- 34. HADOOP-934. Fix TaskTracker to catch metrics exceptions that were
- causing heartbeats to fail. (Arun Murthy via cutting)
- 35. HADOOP-881. Fix JobTracker web interface to display the correct
- number of task failures. (Sanjay Dahiya via cutting)
- 36. HADOOP-788. Change contrib/streaming to subclass TextInputFormat,
- permitting it to take advantage of native compression facilities.
- (Sanjay Dahiya via cutting)
- 37. HADOOP-962. In contrib/ec2: make scripts executable in tar file;
- add a README; make the environment file use a template.
- (Tom White via cutting)
- 38. HADOOP-549. Fix a NullPointerException in TaskReport's
- serialization. (omalley via cutting)
- 39. HADOOP-963. Fix remote exceptions to have the stack trace of the
- caller thread, not the IPC listener thread. (omalley via cutting)
- 40. HADOOP-967. Change RPC clients to start sending a version header.
- (omalley via cutting)
- 41. HADOOP-964. Fix a bug introduced by HADOOP-830 where jobs failed
- whose comparators and/or i/o types were in the job's jar.
- (Dennis Kubes via cutting)
- 42. HADOOP-969. Fix a deadlock in JobTracker. (omalley via cutting)
- 43. HADOOP-862. Add support for the S3 FileSystem to the CopyFiles
- tool. (Michael Stack via cutting)
- 44. HADOOP-965. Fix IsolationRunner so that job's jar can be found.
- (Dennis Kubes via cutting)
- 45. HADOOP-309. Fix two NullPointerExceptions in StatusHttpServer.
- (navychen via cutting)
- 46. HADOOP-692. Add rack awareness to HDFS's placement of blocks.
- (Hairong Kuang via cutting)
- Release 0.10.1 - 2007-01-10
- 1. HADOOP-857. Fix S3 FileSystem implementation to permit its use
- for MapReduce input and output. (Tom White via cutting)
- 2. HADOOP-863. Reduce logging verbosity introduced by HADOOP-813.
- (Devaraj Das via cutting)
- 3. HADOOP-815. Fix memory leaks in JobTracker. (Arun C Murthy via cutting)
- 4. HADOOP-600. Fix a race condition in JobTracker.
- (Arun C Murthy via cutting)
- 5. HADOOP-864. Fix 'bin/hadoop -jar' to operate correctly when
- hadoop.tmp.dir does not yet exist. (omalley via cutting)
- 6. HADOOP-866. Fix 'dfs -get' command to remove existing crc files,
- if any. (Milind Bhandarkar via cutting)
- 7. HADOOP-871. Fix a bug in bin/hadoop setting JAVA_LIBRARY_PATH.
- (Arun C Murthy via cutting)
- 8. HADOOP-868. Decrease the number of open files during map,
- respecting io.sort.fa ctor. (Devaraj Das via cutting)
- 9. HADOOP-865. Fix S3 FileSystem so that partially created files can
- be deleted. (Tom White via cutting)
- 10. HADOOP-873. Pass java.library.path correctly to child processes.
- (omalley via cutting)
- 11. HADOOP-851. Add support for the LZO codec. This is much faster
- than the default, zlib-based compression, but it is only available
- when the native library is built. (Arun C Murthy via cutting)
- 12. HADOOP-880. Fix S3 FileSystem to remove directories.
- (Tom White via cutting)
- 13. HADOOP-879. Fix InputFormatBase to handle output generated by
- MapFileOutputFormat. (cutting)
- 14. HADOOP-659. In HDFS, prioritize replication of blocks based on
- current replication level. Blocks which are severely
- under-replicated should be further replicated before blocks which
- are less under-replicated. (Hairong Kuang via cutting)
- 15. HADOOP-726. Deprecate FileSystem locking methods. They are not
- currently usable. Locking should eventually provided as an
- independent service. (Raghu Angadi via cutting)
- 16. HADOOP-758. Fix exception handling during reduce so that root
- exceptions are not masked by exceptions in cleanups.
- (Raghu Angadi via cutting)
- Release 0.10.0 - 2007-01-05
- 1. HADOOP-763. Change DFS namenode benchmark to not use MapReduce.
- (Nigel Daley via cutting)
- 2. HADOOP-777. Use fully-qualified hostnames for tasktrackers and
- datanodes. (Mahadev Konar via cutting)
- 3. HADOOP-621. Change 'dfs -cat' to exit sooner when output has been
- closed. (Dhruba Borthakur via cutting)
- 4. HADOOP-752. Rationalize some synchronization in DFS namenode.
- (Dhruba Borthakur via cutting)
- 5. HADOOP-629. Fix RPC services to better check the protocol name and
- version. (omalley via cutting)
- 6. HADOOP-774. Limit the number of invalid blocks returned with
- heartbeats by the namenode to datanodes. Transmitting and
- processing very large invalid block lists can tie up both the
- namenode and datanode for too long. (Dhruba Borthakur via cutting)
- 7. HADOOP-738. Change 'dfs -get' command to not create CRC files by
- default, adding a -crc option to force their creation.
- (Milind Bhandarkar via cutting)
- 8. HADOOP-676. Improved exceptions and error messages for common job
- input specification errors. (Sanjay Dahiya via cutting)
- 9. [Included in 0.9.2 release]
- 10. HADOOP-756. Add new dfsadmin option to wait for filesystem to be
- operational. (Dhruba Borthakur via cutting)
- 11. HADOOP-770. Fix jobtracker web interface to display, on restart,
- jobs that were running when it was last stopped.
- (Sanjay Dahiya via cutting)
- 12. HADOOP-331. Write all map outputs to a single file with an index,
- rather than to a separate file per reduce task. This should both
- speed the shuffle and make things more scalable.
- (Devaraj Das via cutting)
- 13. HADOOP-818. Fix contrib unit tests to not depend on core unit
- tests. (omalley via cutting)
- 14. HADOOP-786. Log common exception at debug level.
- (Sanjay Dahiya via cutting)
- 15. HADOOP-796. Provide more convenient access to failed task
- information in the web interface. (Sanjay Dahiya via cutting)
- 16. HADOOP-764. Reduce memory allocations in namenode some.
- (Dhruba Borthakur via cutting)
- 17. HADOOP-802. Update description of mapred.speculative.execution to
- mention reduces. (Nigel Daley via cutting)
- 18. HADOOP-806. Include link to datanodes on front page of namenode
- web interface. (Raghu Angadi via cutting)
- 19. HADOOP-618. Make JobSubmissionProtocol public.
- (Arun C Murthy via cutting)
- 20. HADOOP-782. Fully remove killed tasks. (Arun C Murthy via cutting)
- 21. HADOOP-792. Fix 'dfs -mv' to return correct status.
- (Dhruba Borthakur via cutting)
- 22. HADOOP-673. Give each task its own working directory again.
- (Mahadev Konar via cutting)
- 23. HADOOP-571. Extend the syntax of Path to be a URI; to be
- optionally qualified with a scheme and authority. The scheme
- determines the FileSystem implementation, while the authority
- determines the FileSystem instance. New FileSystem
- implementations may be provided by defining an fs.<scheme>.impl
- property, naming the FileSystem implementation class. This
- permits easy integration of new FileSystem implementations.
- (cutting)
- 24. HADOOP-720. Add an HDFS white paper to website.
- (Dhruba Borthakur via cutting)
- 25. HADOOP-794. Fix a divide-by-zero exception when a job specifies
- zero map tasks. (omalley via cutting)
- 26. HADOOP-454. Add a 'dfs -dus' command that provides summary disk
- usage. (Hairong Kuang via cutting)
- 27. HADOOP-574. Add an Amazon S3 implementation of FileSystem. To
- use this, one need only specify paths of the form
- s3://id:secret@bucket/. Alternately, the AWS access key id and
- secret can be specified in your config, with the properties
- fs.s3.awsAccessKeyId and fs.s3.awsSecretAccessKey.
- (Tom White via cutting)
- 28. HADOOP-824. Rename DFSShell to be FsShell, since it applies
- generically to all FileSystem implementations. (cutting)
- 29. HADOOP-813. Fix map output sorting to report progress, so that
- sorts which take longer than the task timeout do not fail.
- (Devaraj Das via cutting)
- 30. HADOOP-825. Fix HDFS daemons when configured with new URI syntax.
- (omalley via cutting)
- 31. HADOOP-596. Fix a bug in phase reporting during reduce.
- (Sanjay Dahiya via cutting)
- 32. HADOOP-811. Add a utility, MultithreadedMapRunner.
- (Alejandro Abdelnur via cutting)
- 33. HADOOP-829. Within HDFS, clearly separate three different
- representations for datanodes: one for RPCs, one for
- namenode-internal use, and one for namespace persistence.
- (Dhruba Borthakur via cutting)
- 34. HADOOP-823. Fix problem starting datanode when not all configured
- data directories exist. (Bryan Pendleton via cutting)
- 35. HADOOP-451. Add a Split interface. CAUTION: This incompatibly
- changes the InputFormat and RecordReader interfaces. Not only is
- FileSplit replaced with Split, but a FileSystem parameter is no
- longer passed in several methods, input validation has changed,
- etc. (omalley via cutting)
- 36. HADOOP-814. Optimize locking in namenode. (Dhruba Borthakur via cutting)
- 37. HADOOP-738. Change 'fs -put' and 'fs -get' commands to accept
- standard input and output, respectively. Standard i/o is
- specified by a file named '-'. (Wendy Chien via cutting)
- 38. HADOOP-835. Fix a NullPointerException reading record-compressed
- SequenceFiles. (Hairong Kuang via cutting)
- 39. HADOOP-836. Fix a MapReduce bug on Windows, where the wrong
- FileSystem was used. Also add a static FileSystem.getLocal()
- method and better Path checking in HDFS, to help avoid such issues
- in the future. (omalley via cutting)
- 40. HADOOP-837. Improve RunJar utility to unpack jar file
- hadoop.tmp.dir, rather than the system temporary directory.
- (Hairong Kuang via cutting)
- 41. HADOOP-841. Fix native library to build 32-bit version even when
- on a 64-bit host, if a 32-bit JVM is used. (Arun C Murthy via cutting)
- 42. HADOOP-838. Fix tasktracker to pass java.library.path to
- sub-processes, so that libhadoop.a is found.
- (Arun C Murthy via cutting)
- 43. HADOOP-844. Send metrics messages on a fixed-delay schedule
- instead of a fixed-rate schedule. (David Bowen via cutting)
- 44. HADOOP-849. Fix OutOfMemory exceptions in TaskTracker due to a
- file handle leak in SequenceFile. (Devaraj Das via cutting)
- 45. HADOOP-745. Fix a synchronization bug in the HDFS namenode.
- (Dhruba Borthakur via cutting)
- 46. HADOOP-850. Add Writable implementations for variable-length
- integers. (ab via cutting)
- 47. HADOOP-525. Add raw comparators to record types. This greatly
- improves record sort performance. (Milind Bhandarkar via cutting)
- 48. HADOOP-628. Fix a problem with 'fs -cat' command, where some
- characters were replaced with question marks. (Wendy Chien via cutting)
- 49. HADOOP-804. Reduce verbosity of MapReduce logging.
- (Sanjay Dahiya via cutting)
- 50. HADOOP-853. Rename 'site' to 'docs', in preparation for inclusion
- in releases. (cutting)
- 51. HADOOP-371. Include contrib jars and site documentation in
- distributions. Also add contrib and example documentation to
- distributed javadoc, in separate sections. (Nigel Daley via cutting)
- 52. HADOOP-846. Report progress during entire map, as sorting of
- intermediate outputs may happen at any time, potentially causing
- task timeouts. (Devaraj Das via cutting)
- 53. HADOOP-840. In task tracker, queue task cleanups and perform them
- in a separate thread. (omalley & Mahadev Konar via cutting)
- 54. HADOOP-681. Add to HDFS the ability to decommission nodes. This
- causes their blocks to be re-replicated on other nodes, so that
- they may be removed from a cluster. (Dhruba Borthakur via cutting)
- 55. HADOOP-470. In HDFS web ui, list the datanodes containing each
- copy of a block. (Hairong Kuang via cutting)
- 56. HADOOP-700. Change bin/hadoop to only include core jar file on
- classpath, not example, test, etc. Also rename core jar to
- hadoop-${version}-core.jar so that it can be more easily
- identified. (Nigel Daley via cutting)
- 57. HADOOP-619. Extend InputFormatBase to accept individual files and
- glob patterns as MapReduce inputs, not just directories. Also
- change contrib/streaming to use this. (Sanjay Dahia via cutting)
- Release 0.9.2 - 2006-12-15
- 1. HADOOP-639. Restructure InterTrackerProtocol to make task
- accounting more reliable. (Arun C Murthy via cutting)
- 2. HADOOP-827. Turn off speculative execution by default, since it's
- currently broken. (omalley via cutting)
- 3. HADOOP-791. Fix a deadlock in the task tracker.
- (Mahadev Konar via cutting)
- Release 0.9.1 - 2006-12-06
- 1. HADOOP-780. Use ReflectionUtils to instantiate key and value
- objects. (ab)
- 2. HADOOP-779. Fix contrib/streaming to work correctly with gzipped
- input files. (Hairong Kuang via cutting)
- Release 0.9.0 - 2006-12-01
- 1. HADOOP-655. Remove most deprecated code. A few deprecated things
- remain, notably UTF8 and some methods that are still required.
- Also cleaned up constructors for SequenceFile, MapFile, SetFile,
- and ArrayFile a bit. (cutting)
- 2. HADOOP-565. Upgrade to Jetty version 6. (Sanjay Dahiya via cutting)
- 3. HADOOP-682. Fix DFS format command to work correctly when
- configured with a non-existent directory. (Sanjay Dahiya via cutting)
- 4. HADOOP-645. Fix a bug in contrib/streaming when -reducer is NONE.
- (Dhruba Borthakur via cutting)
- 5. HADOOP-687. Fix a classpath bug in bin/hadoop that blocked the
- servers from starting. (Sameer Paranjpye via omalley)
- 6. HADOOP-683. Remove a script dependency on bash, so it works with
- dash, the new default for /bin/sh on Ubuntu. (James Todd via cutting)
- 7. HADOOP-382. Extend unit tests to run multiple datanodes.
- (Milind Bhandarkar via cutting)
- 8. HADOOP-604. Fix some synchronization issues and a
- NullPointerException in DFS datanode. (Raghu Angadi via cutting)
- 9. HADOOP-459. Fix memory leaks and a host of other issues with
- libhdfs. (Sameer Paranjpye via cutting)
- 10. HADOOP-694. Fix a NullPointerException in jobtracker.
- (Mahadev Konar via cutting)
- 11. HADOOP-637. Fix a memory leak in the IPC server. Direct buffers
- are not collected like normal buffers, and provided little
- advantage. (Raghu Angadi via cutting)
- 12. HADOOP-696. Fix TestTextInputFormat unit test to not rely on the
- order of directory listings. (Sameer Paranjpye via cutting)
- 13. HADOOP-611. Add support for iterator-based merging to
- SequenceFile. (Devaraj Das via cutting)
- 14. HADOOP-688. Move DFS administrative commands to a separate
- command named 'dfsadmin'. (Dhruba Borthakur via cutting)
- 15. HADOOP-708. Fix test-libhdfs to return the correct status, so
- that failures will break the build. (Nigel Daley via cutting)
- 16. HADOOP-646. Fix namenode to handle edits files larger than 2GB.
- (Milind Bhandarkar via cutting)
- 17. HADOOP-705. Fix a bug in the JobTracker when failed jobs were
- not completely cleaned up. (Mahadev Konar via cutting)
- 18. HADOOP-613. Perform final merge while reducing. This removes one
- sort pass over the data and should consequently significantly
- decrease overall processing time. (Devaraj Das via cutting)
- 19. HADOOP-661. Make each job's configuration visible through the web
- ui. (Arun C Murthy via cutting)
- 20. HADOOP-489. In MapReduce, separate user logs from system logs.
- Each task's log output is now available through the web ui. (Arun
- C Murthy via cutting)
- 21. HADOOP-712. Fix record io's xml serialization to correctly handle
- control-characters. (Milind Bhandarkar via cutting)
- 22. HADOOP-668. Improvements to the web-based DFS browser.
- (Hairong Kuang via cutting)
- 23. HADOOP-715. Fix build.xml so that test logs are written in build
- directory, rather than in CWD. (Arun C Murthy via cutting)
- 24. HADOOP-538. Add support for building an optional native library,
- libhadoop.so, that improves the performance of zlib-based
- compression. To build this, specify -Dcompile.native to Ant.
- (Arun C Murthy via cutting)
- 25. HADOOP-610. Fix an problem when the DFS block size is configured
- to be smaller than the buffer size, typically only when debugging.
- (Milind Bhandarkar via cutting)
- 26. HADOOP-695. Fix a NullPointerException in contrib/streaming.
- (Hairong Kuang via cutting)
- 27. HADOOP-652. In DFS, when a file is deleted, the block count is
- now decremented. (Vladimir Krokhmalyov via cutting)
- 28. HADOOP-725. In DFS, optimize block placement algorithm,
- previously a performance bottleneck. (Milind Bhandarkar via cutting)
- 29. HADOOP-723. In MapReduce, fix a race condition during the
- shuffle, which resulted in FileNotFoundExceptions. (omalley via cutting)
- 30. HADOOP-447. In DFS, fix getBlockSize(Path) to work with relative
- paths. (Raghu Angadi via cutting)
- 31. HADOOP-733. Make exit codes in DFShell consistent and add a unit
- test. (Dhruba Borthakur via cutting)
- 32. HADOOP-709. Fix contrib/streaming to work with commands that
- contain control characters. (Dhruba Borthakur via cutting)
- 33. HADOOP-677. In IPC, permit a version header to be transmitted
- when connections are established. This will permit us to change
- the format of IPC requests back-compatibly in subsequent releases.
- (omalley via cutting)
- 34. HADOOP-699. Fix DFS web interface so that filesystem browsing
- works correctly, using the right port number. Also add support
- for sorting datanode list by various columns.
- (Raghu Angadi via cutting)
- 35. HADOOP-76. Implement speculative reduce. Now when a job is
- configured for speculative execution, both maps and reduces will
- execute speculatively. Reduce outputs are written to temporary
- location and moved to the final location when reduce is complete.
- (Sanjay Dahiya via cutting)
- 36. HADOOP-736. Roll back to Jetty 5.1.4, due to performance problems
- with Jetty 6.0.1.
- 37. HADOOP-739. Fix TestIPC to use different port number, making it
- more reliable. (Nigel Daley via cutting)
- 38. HADOOP-749. Fix a NullPointerException in jobfailures.jsp.
- (omalley via cutting)
- 39. HADOOP-747. Fix record serialization to work correctly when
- records are embedded in Maps. (Milind Bhandarkar via cutting)
- 40. HADOOP-698. Fix HDFS client not to retry the same datanode on
- read failures. (Milind Bhandarkar via cutting)
- 41. HADOOP-689. Add GenericWritable, to facilitate polymorphism in
- MapReduce, SequenceFile, etc. (Feng Jiang via cutting)
- 42. HADOOP-430. Stop datanode's HTTP server when registration with
- namenode fails. (Wendy Chien via cutting)
- 43. HADOOP-750. Fix a potential race condition during mapreduce
- shuffle. (omalley via cutting)
- 44. HADOOP-728. Fix contrib/streaming-related issues, including
- '-reducer NONE'. (Sanjay Dahiya via cutting)
- Release 0.8.0 - 2006-11-03
- 1. HADOOP-477. Extend contrib/streaming to scan the PATH environment
- variables when resolving executable program names.
- (Dhruba Borthakur via cutting)
- 2. HADOOP-583. In DFSClient, reduce the log level of re-connect
- attempts from 'info' to 'debug', so they are not normally shown.
- (Konstantin Shvachko via cutting)
- 3. HADOOP-498. Re-implement DFS integrity checker to run server-side,
- for much improved performance. (Milind Bhandarkar via cutting)
- 4. HADOOP-586. Use the jar name for otherwise un-named jobs.
- (Sanjay Dahiya via cutting)
- 5. HADOOP-514. Make DFS heartbeat interval configurable.
- (Milind Bhandarkar via cutting)
- 6. HADOOP-588. Fix logging and accounting of failed tasks.
- (Sanjay Dahiya via cutting)
- 7. HADOOP-462. Improve command line parsing in DFSShell, so that
- incorrect numbers of arguments result in informative errors rather
- than ArrayOutOfBoundsException. (Dhruba Borthakur via cutting)
- 8. HADOOP-561. Fix DFS so that one replica of each block is written
- locally, if possible. This was the intent, but there as a bug.
- (Dhruba Borthakur via cutting)
- 9. HADOOP-610. Fix TaskTracker to survive more exceptions, keeping
- tasks from becoming lost. (omalley via cutting)
- 10. HADOOP-625. Add a servlet to all http daemons that displays a
- stack dump, useful for debugging. (omalley via cutting)
- 11. HADOOP-554. Fix DFSShell to return -1 for errors.
- (Dhruba Borthakur via cutting)
- 12. HADOOP-626. Correct the documentation in the NNBench example
- code, and also remove a mistaken call there.
- (Nigel Daley via cutting)
- 13. HADOOP-634. Add missing license to many files.
- (Nigel Daley via cutting)
- 14. HADOOP-627. Fix some synchronization problems in MiniMRCluster
- that sometimes caused unit tests to fail. (Nigel Daley via cutting)
- 15. HADOOP-563. Improve the NameNode's lease policy so that leases
- are held for one hour without renewal (instead of one minute).
- However another attempt to create the same file will still succeed
- if the lease has not been renewed within a minute. This prevents
- communication or scheduling problems from causing a write to fail
- for up to an hour, barring some other process trying to create the
- same file. (Dhruba Borthakur via cutting)
- 16. HADOOP-635. In DFSShell, permit specification of multiple files
- as the source for file copy and move commands.
- (Dhruba Borthakur via cutting)
- 17. HADOOP-641. Change NameNode to request a fresh block report from
- a re-discovered DataNode, so that no-longer-needed replications
- are stopped promptly. (Konstantin Shvachko via cutting)
- 18. HADOOP-642. Change IPC client to specify an explicit connect
- timeout. (Konstantin Shvachko via cutting)
- 19. HADOOP-638. Fix an unsynchronized access to TaskTracker's
- internal state. (Nigel Daley via cutting)
- 20. HADOOP-624. Fix servlet path to stop a Jetty warning on startup.
- (omalley via cutting)
- 21. HADOOP-578. Failed tasks are no longer placed at the end of the
- task queue. This was originally done to work around other
- problems that have now been fixed. Re-executing failed tasks
- sooner causes buggy jobs to fail faster. (Sanjay Dahiya via cutting)
- 22. HADOOP-658. Update source file headers per Apache policy. (cutting)
- 23. HADOOP-636. Add MapFile & ArrayFile constructors which accept a
- Progressable, and pass it down to SequenceFile. This permits
- reduce tasks which use MapFile to still report progress while
- writing blocks to the filesystem. (cutting)
- 24. HADOOP-576. Enable contrib/streaming to use the file cache. Also
- extend the cache to permit symbolic links to cached items, rather
- than local file copies. (Mahadev Konar via cutting)
- 25. HADOOP-482. Fix unit tests to work when a cluster is running on
- the same machine, removing port conflicts. (Wendy Chien via cutting)
- 26. HADOOP-90. Permit dfs.name.dir to list multiple directories,
- where namenode data is to be replicated. (Milind Bhandarkar via cutting)
- 27. HADOOP-651. Fix DFSCk to correctly pass parameters to the servlet
- on the namenode. (Milind Bhandarkar via cutting)
- 28. HADOOP-553. Change main() routines of DataNode and NameNode to
- log exceptions rather than letting the JVM print them to standard
- error. Also, change the hadoop-daemon.sh script to rotate
- standard i/o log files. (Raghu Angadi via cutting)
- 29. HADOOP-399. Fix javadoc warnings. (Nigel Daley via cutting)
- 30. HADOOP-599. Fix web ui and command line to correctly report DFS
- filesystem size statistics. Also improve web layout.
- (Raghu Angadi via cutting)
- 31. HADOOP-660. Permit specification of junit test output format.
- (Nigel Daley via cutting)
- 32. HADOOP-663. Fix a few unit test issues. (Mahadev Konar via cutting)
- 33. HADOOP-664. Cause entire build to fail if libhdfs tests fail.
- (Nigel Daley via cutting)
- 34. HADOOP-633. Keep jobtracker from dying when job initialization
- throws exceptions. Also improve exception handling in a few other
- places and add more informative thread names.
- (omalley via cutting)
- 35. HADOOP-669. Fix a problem introduced by HADOOP-90 that can cause
- DFS to lose files. (Milind Bhandarkar via cutting)
- 36. HADOOP-373. Consistently check the value returned by
- FileSystem.mkdirs(). (Wendy Chien via cutting)
- 37. HADOOP-670. Code cleanups in some DFS internals: use generic
- types, replace Vector with ArrayList, etc.
- (Konstantin Shvachko via cutting)
- 38. HADOOP-647. Permit map outputs to use a different compression
- type than the job output. (omalley via cutting)
- 39. HADOOP-671. Fix file cache to check for pre-existence before
- creating . (Mahadev Konar via cutting)
- 40. HADOOP-665. Extend many DFSShell commands to accept multiple
- arguments. Now commands like "ls", "rm", etc. will operate on
- multiple files. (Dhruba Borthakur via cutting)
- Release 0.7.2 - 2006-10-18
- 1. HADOOP-607. Fix a bug where classes included in job jars were not
- found by tasks. (Mahadev Konar via cutting)
- 2. HADOOP-609. Add a unit test that checks that classes in job jars
- can be found by tasks. Also modify unit tests to specify multiple
- local directories. (Mahadev Konar via cutting)
- Release 0.7.1 - 2006-10-11
- 1. HADOOP-593. Fix a NullPointerException in the JobTracker.
- (omalley via cutting)
- 2. HADOOP-592. Fix a NullPointerException in the IPC Server. Also
- consistently log when stale calls are discarded. (omalley via cutting)
- 3. HADOOP-594. Increase the DFS safe-mode threshold from .95 to
- .999, so that nearly all blocks must be reported before filesystem
- modifications are permitted. (Konstantin Shvachko via cutting)
- 4. HADOOP-598. Fix tasks to retry when reporting completion, so that
- a single RPC timeout won't fail a task. (omalley via cutting)
- 5. HADOOP-597. Fix TaskTracker to not discard map outputs for errors
- in transmitting them to reduce nodes. (omalley via cutting)
- Release 0.7.0 - 2006-10-06
- 1. HADOOP-243. Fix rounding in the display of task and job progress
- so that things are not shown to be 100% complete until they are in
- fact finished. (omalley via cutting)
- 2. HADOOP-438. Limit the length of absolute paths in DFS, since the
- file format used to store pathnames has some limitations.
- (Wendy Chien via cutting)
- 3. HADOOP-530. Improve error messages in SequenceFile when keys or
- values are of the wrong type. (Hairong Kuang via cutting)
- 4. HADOOP-288. Add a file caching system and use it in MapReduce to
- cache job jar files on slave nodes. (Mahadev Konar via cutting)
- 5. HADOOP-533. Fix unit test to not modify conf directory.
- (Hairong Kuang via cutting)
- 6. HADOOP-527. Permit specification of the local address that various
- Hadoop daemons should bind to. (Philippe Gassmann via cutting)
- 7. HADOOP-542. Updates to contrib/streaming: reformatted source code,
- on-the-fly merge sort, a fix for HADOOP-540, etc.
- (Michel Tourn via cutting)
- 8. HADOOP-545. Remove an unused config file parameter.
- (Philippe Gassmann via cutting)
- 9. HADOOP-548. Add an Ant property "test.output" to build.xml that
- causes test output to be logged to the console. (omalley via cutting)
- 10. HADOOP-261. Record an error message when map output is lost.
- (omalley via cutting)
- 11. HADOOP-293. Report the full list of task error messages in the
- web ui, not just the most recent. (omalley via cutting)
- 12. HADOOP-551. Restore JobClient's console printouts to only include
- a maximum of one update per one percent of progress.
- (omalley via cutting)
- 13. HADOOP-306. Add a "safe" mode to DFS. The name node enters this
- when less than a specified percentage of file data is complete.
- Currently safe mode is only used on startup, but eventually it
- will also be entered when datanodes disconnect and file data
- becomes incomplete. While in safe mode no filesystem
- modifications are permitted and block replication is inhibited.
- (Konstantin Shvachko via cutting)
- 14. HADOOP-431. Change 'dfs -rm' to not operate recursively and add a
- new command, 'dfs -rmr' which operates recursively.
- (Sameer Paranjpye via cutting)
- 15. HADOOP-263. Include timestamps for job transitions. The web
- interface now displays the start and end times of tasks and the
- start times of sorting and reducing for reduce tasks. Also,
- extend ObjectWritable to handle enums, so that they can be passed
- as RPC parameters. (Sanjay Dahiya via cutting)
- 16. HADOOP-556. Contrib/streaming: send keep-alive reports to task
- tracker every 10 seconds rather than every 100 records, to avoid
- task timeouts. (Michel Tourn via cutting)
- 17. HADOOP-547. Fix reduce tasks to ping tasktracker while copying
- data, rather than only between copies, avoiding task timeouts.
- (Sanjay Dahiya via cutting)
- 18. HADOOP-537. Fix src/c++/libhdfs build process to create files in
- build/, no longer modifying the source tree.
- (Arun C Murthy via cutting)
- 19. HADOOP-487. Throw a more informative exception for unknown RPC
- hosts. (Sameer Paranjpye via cutting)
- 20. HADOOP-559. Add file name globbing (pattern matching) support to
- the FileSystem API, and use it in DFSShell ('bin/hadoop dfs')
- commands. (Hairong Kuang via cutting)
- 21. HADOOP-508. Fix a bug in FSDataInputStream. Incorrect data was
- returned after seeking to a random location.
- (Milind Bhandarkar via cutting)
- 22. HADOOP-560. Add a "killed" task state. This can be used to
- distinguish kills from other failures. Task state has also been
- converted to use an enum type instead of an int, uncovering a bug
- elsewhere. The web interface is also updated to display killed
- tasks. (omalley via cutting)
- 23. HADOOP-423. Normalize Paths containing directories named "." and
- "..", using the standard, unix interpretation. Also add checks in
- DFS, prohibiting the use of "." or ".." as directory or file
- names. (Wendy Chien via cutting)
- 24. HADOOP-513. Replace map output handling with a servlet, rather
- than a JSP page. This fixes an issue where
- IllegalStateException's were logged, sets content-length
- correctly, and better handles some errors. (omalley via cutting)
- 25. HADOOP-552. Improved error checking when copying map output files
- to reduce nodes. (omalley via cutting)
- 26. HADOOP-566. Fix scripts to work correctly when accessed through
- relative symbolic links. (Lee Faris via cutting)
- 27. HADOOP-519. Add positioned read methods to FSInputStream. These
- permit one to read from a stream without moving its position, and
- can hence be performed by multiple threads at once on a single
- stream. Implement an optimized version for DFS and local FS.
- (Milind Bhandarkar via cutting)
- 28. HADOOP-522. Permit block compression with MapFile and SetFile.
- Since these formats are always sorted, block compression can
- provide a big advantage. (cutting)
- 29. HADOOP-567. Record version and revision information in builds. A
- package manifest is added to the generated jar file containing
- version information, and a VersionInfo utility is added that
- includes further information, including the build date and user,
- and the subversion revision and repository. A 'bin/hadoop
- version' comand is added to show this information, and it is also
- added to various web interfaces. (omalley via cutting)
- 30. HADOOP-568. Fix so that errors while initializing tasks on a
- tasktracker correctly report the task as failed to the jobtracker,
- so that it will be rescheduled. (omalley via cutting)
- 31. HADOOP-550. Disable automatic UTF-8 validation in Text. This
- permits, e.g., TextInputFormat to again operate on non-UTF-8 data.
- (Hairong and Mahadev via cutting)
- 32. HADOOP-343. Fix mapred copying so that a failed tasktracker
- doesn't cause other copies to slow. (Sameer Paranjpye via cutting)
- 33. HADOOP-239. Add a persistent job history mechanism, so that basic
- job statistics are not lost after 24 hours and/or when the
- jobtracker is restarted. (Sanjay Dahiya via cutting)
- 34. HADOOP-506. Ignore heartbeats from stale task trackers.
- (Sanjay Dahiya via cutting)
- 35. HADOOP-255. Discard stale, queued IPC calls. Do not process
- calls whose clients will likely time out before they receive a
- response. When the queue is full, new calls are now received and
- queued, and the oldest calls are discarded, so that, when servers
- get bogged down, they no longer develop a backlog on the socket.
- This should improve some DFS namenode failure modes.
- (omalley via cutting)
- 36. HADOOP-581. Fix datanode to not reset itself on communications
- errors with the namenode. If a request to the namenode fails, the
- datanode should retry, not restart. This reduces the load on the
- namenode, since restarts cause a resend of the block report.
- (omalley via cutting)
- Release 0.6.2 - 2006-09-18
- 1. HADOOP-532. Fix a bug reading value-compressed sequence files,
- where an exception was thrown reporting that the full value had not
- been read. (omalley via cutting)
- 2. HADOOP-534. Change the default value class in JobConf to be Text
- instead of the now-deprecated UTF8. This fixes the Grep example
- program, which was updated to use Text, but relies on this
- default. (Hairong Kuang via cutting)
- Release 0.6.1 - 2006-09-13
- 1. HADOOP-520. Fix a bug in libhdfs, where write failures were not
- correctly returning error codes. (Arun C Murthy via cutting)
- 2. HADOOP-523. Fix a NullPointerException when TextInputFormat is
- explicitly specified. Also add a test case for this.
- (omalley via cutting)
- 3. HADOOP-521. Fix another NullPointerException finding the
- ClassLoader when using libhdfs. (omalley via cutting)
- 4. HADOOP-526. Fix a NullPointerException when attempting to start
- two datanodes in the same directory. (Milind Bhandarkar via cutting)
- 5. HADOOP-529. Fix a NullPointerException when opening
- value-compressed sequence files generated by pre-0.6.0 Hadoop.
- (omalley via cutting)
- Release 0.6.0 - 2006-09-08
- 1. HADOOP-427. Replace some uses of DatanodeDescriptor in the DFS
- web UI code with DatanodeInfo, the preferred public class.
- (Devaraj Das via cutting)
- 2. HADOOP-426. Fix streaming contrib module to work correctly on
- Solaris. This was causing nightly builds to fail.
- (Michel Tourn via cutting)
- 3. HADOOP-400. Improvements to task assignment. Tasks are no longer
- re-run on nodes where they have failed (unless no other node is
- available). Also, tasks are better load-balanced among nodes.
- (omalley via cutting)
- 4. HADOOP-324. Fix datanode to not exit when a disk is full, but
- rather simply to fail writes. (Wendy Chien via cutting)
- 5. HADOOP-434. Change smallJobsBenchmark to use standard Hadoop
- scripts. (Sanjay Dahiya via cutting)
- 6. HADOOP-453. Fix a bug in Text.setCapacity(). (siren via cutting)
- 7. HADOOP-450. Change so that input types are determined by the
- RecordReader rather than specified directly in the JobConf. This
- facilitates jobs with a variety of input types.
- WARNING: This contains incompatible API changes! The RecordReader
- interface has two new methods that all user-defined InputFormats
- must now define. Also, the values returned by TextInputFormat are
- no longer of class UTF8, but now of class Text.
- 8. HADOOP-436. Fix an error-handling bug in the web ui.
- (Devaraj Das via cutting)
- 9. HADOOP-455. Fix a bug in Text, where DEL was not permitted.
- (Hairong Kuang via cutting)
- 10. HADOOP-456. Change the DFS namenode to keep a persistent record
- of the set of known datanodes. This will be used to implement a
- "safe mode" where filesystem changes are prohibited when a
- critical percentage of the datanodes are unavailable.
- (Konstantin Shvachko via cutting)
- 11. HADOOP-322. Add a job control utility. This permits one to
- specify job interdependencies. Each job is submitted only after
- the jobs it depends on have successfully completed.
- (Runping Qi via cutting)
- 12. HADOOP-176. Fix a bug in IntWritable.Comparator.
- (Dick King via cutting)
- 13. HADOOP-421. Replace uses of String in recordio package with Text
- class, for improved handling of UTF-8 data.
- (Milind Bhandarkar via cutting)
- 14. HADOOP-464. Improved error message when job jar not found.
- (Michel Tourn via cutting)
- 15. HADOOP-469. Fix /bin/bash specifics that have crept into our
- /bin/sh scripts since HADOOP-352.
- (Jean-Baptiste Quenot via cutting)
- 16. HADOOP-468. Add HADOOP_NICENESS environment variable to set
- scheduling priority for daemons. (Vetle Roeim via cutting)
- 17. HADOOP-473. Fix TextInputFormat to correctly handle more EOL
- formats. Things now work correctly with CR, LF or CRLF.
- (Dennis Kubes & James White via cutting)
- 18. HADOOP-461. Make Java 1.5 an explicit requirement. (cutting)
- 19. HADOOP-54. Add block compression to SequenceFile. One may now
- specify that blocks of keys and values are compressed together,
- improving compression for small keys and values.
- SequenceFile.Writer's constructor is now deprecated and replaced
- with a factory method. (Arun C Murthy via cutting)
- 20. HADOOP-281. Prohibit DFS files that are also directories.
- (Wendy Chien via cutting)
- 21. HADOOP-486. Add the job username to JobStatus instances returned
- by JobClient. (Mahadev Konar via cutting)
- 22. HADOOP-437. contrib/streaming: Add support for gzipped inputs.
- (Michel Tourn via cutting)
- 23. HADOOP-463. Add variable expansion to config files.
- Configuration property values may now contain variable
- expressions. A variable is referenced with the syntax
- '${variable}'. Variables values are found first in the
- configuration, and then in Java system properties. The default
- configuration is modified so that temporary directories are now
- under ${hadoop.tmp.dir}, which is, by default,
- /tmp/hadoop-${user.name}. (Michel Tourn via cutting)
- 24. HADOOP-419. Fix a NullPointerException finding the ClassLoader
- when using libhdfs. (omalley via cutting)
- 25. HADOOP-460. Fix contrib/smallJobsBenchmark to use Text instead of
- UTF8. (Sanjay Dahiya via cutting)
- 26. HADOOP-196. Fix Configuration(Configuration) constructor to work
- correctly. (Sami Siren via cutting)
- 27. HADOOP-501. Fix Configuration.toString() to handle URL resources.
- (Thomas Friol via cutting)
- 28. HADOOP-499. Reduce the use of Strings in contrib/streaming,
- replacing them with Text for better performance.
- (Hairong Kuang via cutting)
- 29. HADOOP-64. Manage multiple volumes with a single DataNode.
- Previously DataNode would create a separate daemon per configured
- volume, each with its own connection to the NameNode. Now all
- volumes are handled by a single DataNode daemon, reducing the load
- on the NameNode. (Milind Bhandarkar via cutting)
- 30. HADOOP-424. Fix MapReduce so that jobs which generate zero splits
- do not fail. (Fr??d??ric Bertin via cutting)
- 31. HADOOP-408. Adjust some timeouts and remove some others so that
- unit tests run faster. (cutting)
- 32. HADOOP-507. Fix an IllegalAccessException in DFS.
- (omalley via cutting)
- 33. HADOOP-320. Fix so that checksum files are correctly copied when
- the destination of a file copy is a directory.
- (Hairong Kuang via cutting)
- 34. HADOOP-286. In DFSClient, avoid pinging the NameNode with
- renewLease() calls when no files are being written.
- (Konstantin Shvachko via cutting)
- 35. HADOOP-312. Close idle IPC connections. All IPC connections were
- cached forever. Now, after a connection has been idle for more
- than a configurable amount of time (one second by default), the
- connection is closed, conserving resources on both client and
- server. (Devaraj Das via cutting)
- 36. HADOOP-497. Permit the specification of the network interface and
- nameserver to be used when determining the local hostname
- advertised by datanodes and tasktrackers.
- (Lorenzo Thione via cutting)
- 37. HADOOP-441. Add a compression codec API and extend SequenceFile
- to use it. This will permit the use of alternate compression
- codecs in SequenceFile. (Arun C Murthy via cutting)
- 38. HADOOP-483. Improvements to libhdfs build and documentation.
- (Arun C Murthy via cutting)
- 39. HADOOP-458. Fix a memory corruption bug in libhdfs.
- (Arun C Murthy via cutting)
- 40. HADOOP-517. Fix a contrib/streaming bug in end-of-line detection.
- (Hairong Kuang via cutting)
- 41. HADOOP-474. Add CompressionCodecFactory, and use it in
- TextInputFormat and TextOutputFormat. Compressed input files are
- automatically decompressed when they have the correct extension.
- Output files will, when output compression is specified, be
- generated with an approprate extension. Also add a gzip codec and
- fix problems with UTF8 text inputs. (omalley via cutting)
- Release 0.5.0 - 2006-08-04
- 1. HADOOP-352. Fix shell scripts to use /bin/sh instead of
- /bin/bash, for better portability.
- (Jean-Baptiste Quenot via cutting)
- 2. HADOOP-313. Permit task state to be saved so that single tasks
- may be manually re-executed when debugging. (omalley via cutting)
- 3. HADOOP-339. Add method to JobClient API listing jobs that are
- not yet complete, i.e., that are queued or running.
- (Mahadev Konar via cutting)
- 4. HADOOP-355. Updates to the streaming contrib module, including
- API fixes, making reduce optional, and adding an input type for
- StreamSequenceRecordReader. (Michel Tourn via cutting)
- 5. HADOOP-358. Fix a NPE bug in Path.equals().
- (Fr??d??ric Bertin via cutting)
- 6. HADOOP-327. Fix ToolBase to not call System.exit() when
- exceptions are thrown. (Hairong Kuang via cutting)
- 7. HADOOP-359. Permit map output to be compressed.
- (omalley via cutting)
- 8. HADOOP-341. Permit input URI to CopyFiles to use the HTTP
- protocol. This lets one, e.g., more easily copy log files into
- DFS. (Arun C Murthy via cutting)
- 9. HADOOP-361. Remove unix dependencies from streaming contrib
- module tests, making them pure java. (Michel Tourn via cutting)
- 10. HADOOP-354. Make public methods to stop DFS daemons.
- (Barry Kaplan via cutting)
- 11. HADOOP-252. Add versioning to RPC protocols.
- (Milind Bhandarkar via cutting)
- 12. HADOOP-356. Add contrib to "compile" and "test" build targets, so
- that this code is better maintained. (Michel Tourn via cutting)
- 13. HADOOP-307. Add smallJobsBenchmark contrib module. This runs
- lots of small jobs, in order to determine per-task overheads.
- (Sanjay Dahiya via cutting)
- 14. HADOOP-342. Add a tool for log analysis: Logalyzer.
- (Arun C Murthy via cutting)
- 15. HADOOP-347. Add web-based browsing of DFS content. The namenode
- redirects browsing requests to datanodes. Content requests are
- redirected to datanodes where the data is local when possible.
- (Devaraj Das via cutting)
- 16. HADOOP-351. Make Hadoop IPC kernel independent of Jetty.
- (Devaraj Das via cutting)
- 17. HADOOP-237. Add metric reporting to DFS and MapReduce. With only
- minor configuration changes, one can now monitor many Hadoop
- system statistics using Ganglia or other monitoring systems.
- (Milind Bhandarkar via cutting)
- 18. HADOOP-376. Fix datanode's HTTP server to scan for a free port.
- (omalley via cutting)
- 19. HADOOP-260. Add --config option to shell scripts, specifying an
- alternate configuration directory. (Milind Bhandarkar via cutting)
- 20. HADOOP-381. Permit developers to save the temporary files for
- tasks whose names match a regular expression, to facilliate
- debugging. (omalley via cutting)
- 21. HADOOP-344. Fix some Windows-related problems with DF.
- (Konstantin Shvachko via cutting)
- 22. HADOOP-380. Fix reduce tasks to poll less frequently for map
- outputs. (Mahadev Konar via cutting)
- 23. HADOOP-321. Refactor DatanodeInfo, in preparation for
- HADOOP-306. (Konstantin Shvachko & omalley via cutting)
- 24. HADOOP-385. Fix some bugs in record io code generation.
- (Milind Bhandarkar via cutting)
- 25. HADOOP-302. Add new Text class to replace UTF8, removing
- limitations of that class. Also refactor utility methods for
- writing zero-compressed integers (VInts and VLongs).
- (Hairong Kuang via cutting)
- 26. HADOOP-335. Refactor DFS namespace/transaction logging in
- namenode. (Konstantin Shvachko via cutting)
- 27. HADOOP-375. Fix handling of the datanode HTTP daemon's port so
- that multiple datanode's can be run on a single host.
- (Devaraj Das via cutting)
- 28. HADOOP-386. When removing excess DFS block replicas, remove those
- on nodes with the least free space first.
- (Johan Oskarson via cutting)
- 29. HADOOP-389. Fix intermittent failures of mapreduce unit tests.
- Also fix some build dependencies.
- (Mahadev & Konstantin via cutting)
- 30. HADOOP-362. Fix a problem where jobs hang when status messages
- are recieved out-of-order. (omalley via cutting)
- 31. HADOOP-394. Change order of DFS shutdown in unit tests to
- minimize errors logged. (Konstantin Shvachko via cutting)
- 32. HADOOP-396. Make DatanodeID implement Writable.
- (Konstantin Shvachko via cutting)
- 33. HADOOP-377. Permit one to add URL resources to a Configuration.
- (Jean-Baptiste Quenot via cutting)
- 34. HADOOP-345. Permit iteration over Configuration key/value pairs.
- (Michel Tourn via cutting)
- 35. HADOOP-409. Streaming contrib module: make configuration
- properties available to commands as environment variables.
- (Michel Tourn via cutting)
- 36. HADOOP-369. Add -getmerge option to dfs command that appends all
- files in a directory into a single local file.
- (Johan Oskarson via cutting)
- 37. HADOOP-410. Replace some TreeMaps with HashMaps in DFS, for
- a 17% performance improvement. (Milind Bhandarkar via cutting)
- 38. HADOOP-411. Add unit tests for command line parser.
- (Hairong Kuang via cutting)
- 39. HADOOP-412. Add MapReduce input formats that support filtering
- of SequenceFile data, including sampling and regex matching.
- Also, move JobConf.newInstance() to a new utility class.
- (Hairong Kuang via cutting)
- 40. HADOOP-226. Fix fsck command to properly consider replication
- counts, now that these can vary per file. (Bryan Pendleton via cutting)
- 41. HADOOP-425. Add a Python MapReduce example, using Jython.
- (omalley via cutting)
- Release 0.4.0 - 2006-06-28
- 1. HADOOP-298. Improved progress reports for CopyFiles utility, the
- distributed file copier. (omalley via cutting)
- 2. HADOOP-299. Fix the task tracker, permitting multiple jobs to
- more easily execute at the same time. (omalley via cutting)
- 3. HADOOP-250. Add an HTTP user interface to the namenode, running
- on port 50070. (Devaraj Das via cutting)
- 4. HADOOP-123. Add MapReduce unit tests that run a jobtracker and
- tasktracker, greatly increasing code coverage.
- (Milind Bhandarkar via cutting)
- 5. HADOOP-271. Add links from jobtracker's web ui to tasktracker's
- web ui. Also attempt to log a thread dump of child processes
- before they're killed. (omalley via cutting)
- 6. HADOOP-210. Change RPC server to use a selector instead of a
- thread per connection. This should make it easier to scale to
- larger clusters. Note that this incompatibly changes the RPC
- protocol: clients and servers must both be upgraded to the new
- version to ensure correct operation. (Devaraj Das via cutting)
- 7. HADOOP-311. Change DFS client to retry failed reads, so that a
- single read failure will not alone cause failure of a task.
- (omalley via cutting)
- 8. HADOOP-314. Remove the "append" phase when reducing. Map output
- files are now directly passed to the sorter, without first
- appending them into a single file. Now, the first third of reduce
- progress is "copy" (transferring map output to reduce nodes), the
- middle third is "sort" (sorting map output) and the last third is
- "reduce" (generating output). Long-term, the "sort" phase will
- also be removed. (omalley via cutting)
- 9. HADOOP-316. Fix a potential deadlock in the jobtracker.
- (omalley via cutting)
- 10. HADOOP-319. Fix FileSystem.close() to remove the FileSystem
- instance from the cache. (Hairong Kuang via cutting)
- 11. HADOOP-135. Fix potential deadlock in JobTracker by acquiring
- locks in a consistent order. (omalley via cutting)
- 12. HADOOP-278. Check for existence of input directories before
- starting MapReduce jobs, making it easier to debug this common
- error. (omalley via cutting)
- 13. HADOOP-304. Improve error message for
- UnregisterdDatanodeException to include expected node name.
- (Konstantin Shvachko via cutting)
- 14. HADOOP-305. Fix TaskTracker to ask for new tasks as soon as a
- task is finished, rather than waiting for the next heartbeat.
- This improves performance when tasks are short.
- (Mahadev Konar via cutting)
- 15. HADOOP-59. Add support for generic command line options. One may
- now specify the filesystem (-fs), the MapReduce jobtracker (-jt),
- a config file (-conf) or any configuration property (-D). The
- "dfs", "fsck", "job", and "distcp" commands currently support
- this, with more to be added. (Hairong Kuang via cutting)
- 16. HADOOP-296. Permit specification of the amount of reserved space
- on a DFS datanode. One may specify both the percentage free and
- the number of bytes. (Johan Oskarson via cutting)
- 17. HADOOP-325. Fix a problem initializing RPC parameter classes, and
- remove the workaround used to initialize classes.
- (omalley via cutting)
- 18. HADOOP-328. Add an option to the "distcp" command to ignore read
- errors while copying. (omalley via cutting)
- 19. HADOOP-27. Don't allocate tasks to trackers whose local free
- space is too low. (Johan Oskarson via cutting)
- 20. HADOOP-318. Keep slow DFS output from causing task timeouts.
- This incompatibly changes some public interfaces, adding a
- parameter to OutputFormat.getRecordWriter() and the new method
- Reporter.progress(), but it makes lots of tasks succeed that were
- previously failing. (Milind Bhandarkar via cutting)
- Release 0.3.2 - 2006-06-09
- 1. HADOOP-275. Update the streaming contrib module to use log4j for
- its logging. (Michel Tourn via cutting)
- 2. HADOOP-279. Provide defaults for log4j logging parameters, so
- that things still work reasonably when Hadoop-specific system
- properties are not provided. (omalley via cutting)
- 3. HADOOP-280. Fix a typo in AllTestDriver which caused the wrong
- test to be run when "DistributedFSCheck" was specified.
- (Konstantin Shvachko via cutting)
- 4. HADOOP-240. DFS's mkdirs() implementation no longer logs a warning
- when the directory already exists. (Hairong Kuang via cutting)
- 5. HADOOP-285. Fix DFS datanodes to be able to re-join the cluster
- after the connection to the namenode is lost. (omalley via cutting)
- 6. HADOOP-277. Fix a race condition when creating directories.
- (Sameer Paranjpye via cutting)
- 7. HADOOP-289. Improved exception handling in DFS datanode.
- (Konstantin Shvachko via cutting)
- 8. HADOOP-292. Fix client-side logging to go to standard error
- rather than standard output, so that it can be distinguished from
- application output. (omalley via cutting)
- 9. HADOOP-294. Fixed bug where conditions for retrying after errors
- in the DFS client were reversed. (omalley via cutting)
- Release 0.3.1 - 2006-06-05
- 1. HADOOP-272. Fix a bug in bin/hadoop setting log
- parameters. (omalley & cutting)
- 2. HADOOP-274. Change applications to log to standard output rather
- than to a rolling log file like daemons. (omalley via cutting)
- 3. HADOOP-262. Fix reduce tasks to report progress while they're
- waiting for map outputs, so that they do not time out.
- (Mahadev Konar via cutting)
- 4. HADOOP-245 and HADOOP-246. Improvements to record io package.
- (Mahadev Konar via cutting)
- 5. HADOOP-276. Add logging config files to jar file so that they're
- always found. (omalley via cutting)
- Release 0.3.0 - 2006-06-02
- 1. HADOOP-208. Enhance MapReduce web interface, adding new pages
- for failed tasks, and tasktrackers. (omalley via cutting)
- 2. HADOOP-204. Tweaks to metrics package. (David Bowen via cutting)
- 3. HADOOP-209. Add a MapReduce-based file copier. This will
- copy files within or between file systems in parallel.
- (Milind Bhandarkar via cutting)
- 4. HADOOP-146. Fix DFS to check when randomly generating a new block
- id that no existing blocks already have that id.
- (Milind Bhandarkar via cutting)
- 5. HADOOP-180. Make a daemon thread that does the actual task clean ups, so
- that the main offerService thread in the taskTracker doesn't get stuck
- and miss his heartbeat window. This was killing many task trackers as
- big jobs finished (300+ tasks / node). (omalley via cutting)
- 6. HADOOP-200. Avoid transmitting entire list of map task names to
- reduce tasks. Instead just transmit the number of map tasks and
- henceforth refer to them by number when collecting map output.
- (omalley via cutting)
- 7. HADOOP-219. Fix a NullPointerException when handling a checksum
- exception under SequenceFile.Sorter.sort(). (cutting & stack)
- 8. HADOOP-212. Permit alteration of the file block size in DFS. The
- default block size for new files may now be specified in the
- configuration with the dfs.block.size property. The block size
- may also be specified when files are opened.
- (omalley via cutting)
- 9. HADOOP-218. Avoid accessing configuration while looping through
- tasks in JobTracker. (Mahadev Konar via cutting)
- 10. HADOOP-161. Add hashCode() method to DFS's Block.
- (Milind Bhandarkar via cutting)
- 11. HADOOP-115. Map output types may now be specified. These are also
- used as reduce input types, thus permitting reduce input types to
- differ from reduce output types. (Runping Qi via cutting)
- 12. HADOOP-216. Add task progress to task status page.
- (Bryan Pendelton via cutting)
- 13. HADOOP-233. Add web server to task tracker that shows running
- tasks and logs. Also add log access to job tracker web interface.
- (omalley via cutting)
- 14. HADOOP-205. Incorporate pending tasks into tasktracker load
- calculations. (Mahadev Konar via cutting)
- 15. HADOOP-247. Fix sort progress to better handle exceptions.
- (Mahadev Konar via cutting)
- 16. HADOOP-195. Improve performance of the transfer of map outputs to
- reduce nodes by performing multiple transfers in parallel, each on
- a separate socket. (Sameer Paranjpye via cutting)
- 17. HADOOP-251. Fix task processes to be tolerant of failed progress
- reports to their parent process. (omalley via cutting)
- 18. HADOOP-325. Improve the FileNotFound exceptions thrown by
- LocalFileSystem to include the name of the file.
- (Benjamin Reed via cutting)
- 19. HADOOP-254. Use HTTP to transfer map output data to reduce
- nodes. This, together with HADOOP-195, greatly improves the
- performance of these transfers. (omalley via cutting)
- 20. HADOOP-163. Cause datanodes that are unable to either read or
- write data to exit, so that the namenode will no longer target
- them for new blocks and will replicate their data on other nodes.
- (Hairong Kuang via cutting)
- 21. HADOOP-222. Add a -setrep option to the dfs commands that alters
- file replication levels. (Johan Oskarson via cutting)
- 22. HADOOP-75. In DFS, only check for a complete file when the file
- is closed, rather than as each block is written.
- (Milind Bhandarkar via cutting)
- 23. HADOOP-124. Change DFS so that datanodes are identified by a
- persistent ID rather than by host and port. This solves a number
- of filesystem integrity problems, when, e.g., datanodes are
- restarted. (Konstantin Shvachko via cutting)
- 24. HADOOP-256. Add a C API for DFS. (Arun C Murthy via cutting)
- 25. HADOOP-211. Switch to use the Jakarta Commons logging internally,
- configured to use log4j by default. (Arun C Murthy and cutting)
- 26. HADOOP-265. Tasktracker now fails to start if it does not have a
- writable local directory for temporary files. In this case, it
- logs a message to the JobTracker and exits. (Hairong Kuang via cutting)
- 27. HADOOP-270. Fix potential deadlock in datanode shutdown.
- (Hairong Kuang via cutting)
- Release 0.2.1 - 2006-05-12
- 1. HADOOP-199. Fix reduce progress (broken by HADOOP-182).
- (omalley via cutting)
- 2. HADOOP-201. Fix 'bin/hadoop dfs -report'. (cutting)
- 3. HADOOP-207. Fix JDK 1.4 incompatibility introduced by HADOOP-96.
- System.getenv() does not work in JDK 1.4. (Hairong Kuang via cutting)
- Release 0.2.0 - 2006-05-05
- 1. Fix HADOOP-126. 'bin/hadoop dfs -cp' now correctly copies .crc
- files. (Konstantin Shvachko via cutting)
- 2. Fix HADOOP-51. Change DFS to support per-file replication counts.
- (Konstantin Shvachko via cutting)
- 3. Fix HADOOP-131. Add scripts to start/stop dfs and mapred daemons.
- Use these in start/stop-all scripts. (Chris Mattmann via cutting)
- 4. Stop using ssh options by default that are not yet in widely used
- versions of ssh. Folks can still enable their use by uncommenting
- a line in conf/hadoop-env.sh. (cutting)
- 5. Fix HADOOP-92. Show information about all attempts to run each
- task in the web ui. (Mahadev konar via cutting)
- 6. Fix HADOOP-128. Improved DFS error handling. (Owen O'Malley via cutting)
- 7. Fix HADOOP-129. Replace uses of java.io.File with new class named
- Path. This fixes bugs where java.io.File methods were called
- directly when FileSystem methods were desired, and reduces the
- likelihood of such bugs in the future. It also makes the handling
- of pathnames more consistent between local and dfs FileSystems and
- between Windows and Unix. java.io.File-based methods are still
- available for back-compatibility, but are deprecated and will be
- removed once 0.2 is released. (cutting)
- 8. Change dfs.data.dir and mapred.local.dir to be comma-separated
- lists of directories, no longer be space-separated. This fixes
- several bugs on Windows. (cutting)
- 9. Fix HADOOP-144. Use mapred task id for dfs client id, to
- facilitate debugging. (omalley via cutting)
- 10. Fix HADOOP-143. Do not line-wrap stack-traces in web ui.
- (omalley via cutting)
- 11. Fix HADOOP-118. In DFS, improve clean up of abandoned file
- creations. (omalley via cutting)
- 12. Fix HADOOP-138. Stop multiple tasks in a single heartbeat, rather
- than one per heartbeat. (Stefan via cutting)
- 13. Fix HADOOP-139. Remove a potential deadlock in
- LocalFileSystem.lock(). (Igor Bolotin via cutting)
- 14. Fix HADOOP-134. Don't hang jobs when the tasktracker is
- misconfigured to use an un-writable local directory. (omalley via cutting)
- 15. Fix HADOOP-115. Correct an error message. (Stack via cutting)
- 16. Fix HADOOP-133. Retry pings from child to parent, in case of
- (local) communcation problems. Also log exit status, so that one
- can distinguish patricide from other deaths. (omalley via cutting)
- 17. Fix HADOOP-142. Avoid re-running a task on a host where it has
- previously failed. (omalley via cutting)
- 18. Fix HADOOP-148. Maintain a task failure count for each
- tasktracker and display it in the web ui. (omalley via cutting)
- 19. Fix HADOOP-151. Close a potential socket leak, where new IPC
- connection pools were created per configuration instance that RPCs
- use. Now a global RPC connection pool is used again, as
- originally intended. (cutting)
- 20. Fix HADOOP-69. Don't throw a NullPointerException when getting
- hints for non-existing file split. (Bryan Pendelton via cutting)
- 21. Fix HADOOP-157. When a task that writes dfs files (e.g., a reduce
- task) failed and was retried, it would fail again and again,
- eventually failing the job. The problem was that dfs did not yet
- know that the failed task had abandoned the files, and would not
- yet let another task create files with the same names. Dfs now
- retries when creating a file long enough for locks on abandoned
- files to expire. (omalley via cutting)
- 22. Fix HADOOP-150. Improved task names that include job
- names. (omalley via cutting)
- 23. Fix HADOOP-162. Fix ConcurrentModificationException when
- releasing file locks. (omalley via cutting)
- 24. Fix HADOOP-132. Initial check-in of new Metrics API, including
- implementations for writing metric data to a file and for sending
- it to Ganglia. (David Bowen via cutting)
- 25. Fix HADOOP-160. Remove some uneeded synchronization around
- time-consuming operations in the TaskTracker. (omalley via cutting)
- 26. Fix HADOOP-166. RPCs failed when passed subclasses of a declared
- parameter type. This is fixed by changing ObjectWritable to store
- both the declared type and the instance type for Writables. Note
- that this incompatibly changes the format of ObjectWritable and
- will render unreadable any ObjectWritables stored in files.
- Nutch only uses ObjectWritable in intermediate files, so this
- should not be a problem for Nutch. (Stefan & cutting)
- 27. Fix HADOOP-168. MapReduce RPC protocol methods should all declare
- IOException, so that timeouts are handled appropriately.
- (omalley via cutting)
- 28. Fix HADOOP-169. Don't fail a reduce task if a call to the
- jobtracker to locate map outputs fails. (omalley via cutting)
- 29. Fix HADOOP-170. Permit FileSystem clients to examine and modify
- the replication count of individual files. Also fix a few
- replication-related bugs. (Konstantin Shvachko via cutting)
- 30. Permit specification of a higher replication levels for job
- submission files (job.xml and job.jar). This helps with large
- clusters, since these files are read by every node. (cutting)
- 31. HADOOP-173. Optimize allocation of tasks with local data. (cutting)
- 32. HADOOP-167. Reduce number of Configurations and JobConf's
- created. (omalley via cutting)
- 33. NUTCH-256. Change FileSystem#createNewFile() to create a .crc
- file. The lack of a .crc file was causing warnings. (cutting)
- 34. HADOOP-174. Change JobClient to not abort job until it has failed
- to contact the job tracker for five attempts, not just one as
- before. (omalley via cutting)
- 35. HADOOP-177. Change MapReduce web interface to page through tasks.
- Previously, when jobs had more than a few thousand tasks they
- could crash web browsers. (Mahadev Konar via cutting)
- 36. HADOOP-178. In DFS, piggyback blockwork requests from datanodes
- on heartbeat responses from namenode. This reduces the volume of
- RPC traffic. Also move startup delay in blockwork from datanode
- to namenode. This fixes a problem where restarting the namenode
- triggered a lot of uneeded replication. (Hairong Kuang via cutting)
- 37. HADOOP-183. If the DFS namenode is restarted with different
- minimum and/or maximum replication counts, existing files'
- replication counts are now automatically adjusted to be within the
- newly configured bounds. (Hairong Kuang via cutting)
- 38. HADOOP-186. Better error handling in TaskTracker's top-level
- loop. Also improve calculation of time to send next heartbeat.
- (omalley via cutting)
- 39. HADOOP-187. Add two MapReduce examples/benchmarks. One creates
- files containing random data. The second sorts the output of the
- first. (omalley via cutting)
- 40. HADOOP-185. Fix so that, when a task tracker times out making the
- RPC asking for a new task to run, the job tracker does not think
- that it is actually running the task returned. (omalley via cutting)
- 41. HADOOP-190. If a child process hangs after it has reported
- completion, its output should not be lost. (Stack via cutting)
- 42. HADOOP-184. Re-structure some test code to better support testing
- on a cluster. (Mahadev Konar via cutting)
- 43. HADOOP-191 Add streaming package, Hadoop's first contrib module.
- This permits folks to easily submit MapReduce jobs whose map and
- reduce functions are implemented by shell commands. Use
- 'bin/hadoop jar build/hadoop-streaming.jar' to get details.
- (Michel Tourn via cutting)
- 44. HADOOP-189. Fix MapReduce in standalone configuration to
- correctly handle job jar files that contain a lib directory with
- nested jar files. (cutting)
- 45. HADOOP-65. Initial version of record I/O framework that enables
- the specification of record types and generates marshalling code
- in both Java and C++. Generated Java code implements
- WritableComparable, but is not yet otherwise used by
- Hadoop. (Milind Bhandarkar via cutting)
- 46. HADOOP-193. Add a MapReduce-based FileSystem benchmark.
- (Konstantin Shvachko via cutting)
- 47. HADOOP-194. Add a MapReduce-based FileSystem checker. This reads
- every block in every file in the filesystem. (Konstantin Shvachko
- via cutting)
- 48. HADOOP-182. Fix so that lost task trackers to not change the
- status of reduce tasks or completed jobs. Also fixes the progress
- meter so that failed tasks are subtracted. (omalley via cutting)
- 49. HADOOP-96. Logging improvements. Log files are now separate from
- standard output and standard error files. Logs are now rolled.
- Logging of all DFS state changes can be enabled, to facilitate
- debugging. (Hairong Kuang via cutting)
- Release 0.1.1 - 2006-04-08
- 1. Added CHANGES.txt, logging all significant changes to Hadoop. (cutting)
- 2. Fix MapReduceBase.close() to throw IOException, as declared in the
- Closeable interface. This permits subclasses which override this
- method to throw that exception. (cutting)
- 3. Fix HADOOP-117. Pathnames were mistakenly transposed in
- JobConf.getLocalFile() causing many mapred temporary files to not
- be removed. (Raghavendra Prabhu via cutting)
-
- 4. Fix HADOOP-116. Clean up job submission files when jobs complete.
- (cutting)
- 5. Fix HADOOP-125. Fix handling of absolute paths on Windows (cutting)
- Release 0.1.0 - 2006-04-01
- 1. The first release of Hadoop.