CHANGES.txt
上传用户:quxuerui
上传日期:2018-01-08
资源大小:41811k
文件大小:325k
- Hadoop Change Log
- Release 0.20.0 - 2009-04-15
- INCOMPATIBLE CHANGES
- HADOOP-4210. Fix findbugs warnings for equals implementations of mapred ID
- classes. Removed public, static ID::read and ID::forName; made ID an
- abstract class. (Suresh Srinivas via cdouglas)
- HADOOP-4253. Fix various warnings generated by findbugs.
- Following deprecated methods in RawLocalFileSystem are removed:
- public String getName()
- public void lock(Path p, boolean shared)
- public void release(Path p)
- (Suresh Srinivas via johan)
- HADOOP-4618. Move http server from FSNamesystem into NameNode.
- FSNamesystem.getNameNodeInfoPort() is removed.
- FSNamesystem.getDFSNameNodeMachine() and FSNamesystem.getDFSNameNodePort()
- replaced by FSNamesystem.getDFSNameNodeAddress().
- NameNode(bindAddress, conf) is removed.
- (shv)
- HADOOP-4567. GetFileBlockLocations returns the NetworkTopology
- information of the machines where the blocks reside. (dhruba)
- HADOOP-4435. The JobTracker WebUI displays the amount of heap memory
- in use. (dhruba)
- HADOOP-4628. Move Hive into a standalone subproject. (omalley)
- HADOOP-4188. Removes task's dependency on concrete filesystems.
- (Sharad Agarwal via ddas)
- HADOOP-1650. Upgrade to Jetty 6. (cdouglas)
- HADOOP-3986. Remove static Configuration from JobClient. (Amareshwari
- Sriramadasu via cdouglas)
- JobClient::setCommandLineConfig is removed
- JobClient::getCommandLineConfig is removed
- JobShell, TestJobShell classes are removed
- HADOOP-4422. S3 file systems should not create bucket.
- (David Phillips via tomwhite)
- HADOOP-4035. Support memory based scheduling in capacity scheduler.
- (Vinod Kumar Vavilapalli via yhemanth)
- HADOOP-3497. Fix bug in overly restrictive file globbing with a
- PathFilter. (tomwhite)
- HADOOP-4445. Replace running task counts with running task
- percentage in capacity scheduler UI. (Sreekanth Ramakrishnan via
- yhemanth)
- HADOOP-4631. Splits the configuration into three parts - one for core,
- one for mapred and the last one for HDFS. (Sharad Agarwal via cdouglas)
- HADOOP-3344. Fix libhdfs build to use autoconf and build the same
- architecture (32 vs 64 bit) of the JVM running Ant. The libraries for
- pipes, utils, and libhdfs are now all in c++/<os_osarch_jvmdatamodel>/lib.
- (Giridharan Kesavan via nigel)
- HADOOP-4874. Remove LZO codec because of licensing issues. (omalley)
- HADOOP-4970. The full path name of a file is preserved inside Trash.
- (Prasad Chakka via dhruba)
- HADOOP-4103. NameNode keeps a count of missing blocks. It warns on
- WebUI if there are such blocks. '-report' and '-metaSave' have extra
- info to track such blocks. (Raghu Angadi)
- HADOOP-4783. Change permissions on history files on the jobtracker
- to be only group readable instead of world readable.
- (Amareshwari Sriramadasu via yhemanth)
- HADOOP-5531. Removed Chukwa from Hadoop 0.20.0. (nigel)
- NEW FEATURES
- HADOOP-4575. Add a proxy service for relaying HsftpFileSystem requests.
- Includes client authentication via user certificates and config-based
- access control. (Kan Zhang via cdouglas)
- HADOOP-4661. Add DistCh, a new tool for distributed ch{mod,own,grp}.
- (szetszwo)
- HADOOP-4709. Add several new features and bug fixes to Chukwa.
- Added Hadoop Infrastructure Care Center (UI for visualize data collected
- by Chukwa)
- Added FileAdaptor for streaming small file in one chunk
- Added compression to archive and demux output
- Added unit tests and validation for agent, collector, and demux map
- reduce job
- Added database loader for loading demux output (sequence file) to jdbc
- connected database
- Added algorithm to distribute collector load more evenly
- (Jerome Boulon, Eric Yang, Andy Konwinski, Ariel Rabkin via cdouglas)
- HADOOP-4179. Add Vaidya tool to analyze map/reduce job logs for performanc
- problems. (Suhas Gogate via omalley)
- HADOOP-4029. Add NameNode storage information to the dfshealth page and
- move DataNode information to a separated page. (Boris Shkolnik via
- szetszwo)
- HADOOP-4348. Add service-level authorization for Hadoop. (acmurthy)
- HADOOP-4826. Introduce admin command saveNamespace. (shv)
- HADOOP-3063 BloomMapFile - fail-fast version of MapFile for sparsely
- populated key space (Andrzej Bialecki via stack)
- HADOOP-1230. Add new map/reduce API and deprecate the old one. Generally,
- the old code should work without problem. The new api is in
- org.apache.hadoop.mapreduce and the old classes in org.apache.hadoop.mapred
- are deprecated. Differences in the new API:
- 1. All of the methods take Context objects that allow us to add new
- methods without breaking compatability.
- 2. Mapper and Reducer now have a "run" method that is called once and
- contains the control loop for the task, which lets applications
- replace it.
- 3. Mapper and Reducer by default are Identity Mapper and Reducer.
- 4. The FileOutputFormats use part-r-00000 for the output of reduce 0 and
- part-m-00000 for the output of map 0.
- 5. The reduce grouping comparator now uses the raw compare instead of
- object compare.
- 6. The number of maps in FileInputFormat is controlled by min and max
- split size rather than min size and the desired number of maps.
- (omalley)
-
- HADOOP-3305. Use Ivy to manage dependencies. (Giridharan Kesavan
- and Steve Loughran via cutting)
- IMPROVEMENTS
- HADOOP-4565. Added CombineFileInputFormat to use data locality information
- to create splits. (dhruba via zshao)
- HADOOP-4749. Added a new counter REDUCE_INPUT_BYTES. (Yongqiang He via
- zshao)
- HADOOP-4234. Fix KFS "glue" layer to allow applications to interface
- with multiple KFS metaservers. (Sriram Rao via lohit)
- HADOOP-4245. Update to latest version of KFS "glue" library jar.
- (Sriram Rao via lohit)
- HADOOP-4244. Change test-patch.sh to check Eclipse classpath no matter
- it is run by Hudson or not. (szetszwo)
- HADOOP-3180. Add name of missing class to WritableName.getClass
- IOException. (Pete Wyckoff via omalley)
- HADOOP-4178. Make the capacity scheduler's default values configurable.
- (Sreekanth Ramakrishnan via omalley)
- HADOOP-4262. Generate better error message when client exception has null
- message. (stevel via omalley)
- HADOOP-4226. Refactor and document LineReader to make it more readily
- understandable. (Yuri Pradkin via cdouglas)
-
- HADOOP-4238. When listing jobs, if scheduling information isn't available
- print NA instead of empty output. (Sreekanth Ramakrishnan via johan)
- HADOOP-4284. Support filters that apply to all requests, or global filters,
- to HttpServer. (Kan Zhang via cdouglas)
-
- HADOOP-4276. Improve the hashing functions and deserialization of the
- mapred ID classes. (omalley)
- HADOOP-4485. Add a compile-native ant task, as a shorthand. (enis)
- HADOOP-4454. Allow # comments in slaves file. (Rama Ramasamy via omalley)
- HADOOP-3461. Remove hdfs.StringBytesWritable. (szetszwo)
- HADOOP-4437. Use Halton sequence instead of java.util.Random in
- PiEstimator. (szetszwo)
- HADOOP-4572. Change INode and its sub-classes to package private.
- (szetszwo)
- HADOOP-4187. Does a runtime lookup for JobConf/JobConfigurable, and if
- found, invokes the appropriate configure method. (Sharad Agarwal via ddas)
- HADOOP-4453. Improve ssl configuration and handling in HsftpFileSystem,
- particularly when used with DistCp. (Kan Zhang via cdouglas)
- HADOOP-4583. Several code optimizations in HDFS. (Suresh Srinivas via
- szetszwo)
- HADOOP-3923. Remove org.apache.hadoop.mapred.StatusHttpServer. (szetszwo)
-
- HADOOP-4622. Explicitly specify interpretor for non-native
- pipes binaries. (Fredrik Hedberg via johan)
-
- HADOOP-4505. Add a unit test to test faulty setup task and cleanup
- task killing the job. (Amareshwari Sriramadasu via johan)
- HADOOP-4608. Don't print a stack trace when the example driver gets an
- unknown program to run. (Edward Yoon via omalley)
- HADOOP-4645. Package HdfsProxy contrib project without the extra level
- of directories. (Kan Zhang via omalley)
- HADOOP-4126. Allow access to HDFS web UI on EC2 (tomwhite via omalley)
- HADOOP-4612. Removes RunJar's dependency on JobClient.
- (Sharad Agarwal via ddas)
- HADOOP-4185. Adds setVerifyChecksum() method to FileSystem.
- (Sharad Agarwal via ddas)
- HADOOP-4523. Prevent too many tasks scheduled on a node from bringing
- it down by monitoring for cumulative memory usage across tasks.
- (Vinod Kumar Vavilapalli via yhemanth)
- HADOOP-4640. Adds an input format that can split lzo compressed
- text files. (johan)
-
- HADOOP-4666. Launch reduces only after a few maps have run in the
- Fair Scheduler. (Matei Zaharia via johan)
- HADOOP-4339. Remove redundant calls from FileSystem/FsShell when
- generating/processing ContentSummary. (David Phillips via cdouglas)
- HADOOP-2774. Add counters tracking records spilled to disk in MapTask and
- ReduceTask. (Ravi Gummadi via cdouglas)
- HADOOP-4513. Initialize jobs asynchronously in the capacity scheduler.
- (Sreekanth Ramakrishnan via yhemanth)
- HADOOP-4649. Improve abstraction for spill indices. (cdouglas)
- HADOOP-3770. Add gridmix2, an iteration on the gridmix benchmark. (Runping
- Qi via cdouglas)
- HADOOP-4708. Add support for dfsadmin commands in TestCLI. (Boris Shkolnik
- via cdouglas)
- HADOOP-4758. Add a splitter for metrics contexts to support more than one
- type of collector. (cdouglas)
- HADOOP-4722. Add tests for dfsadmin quota error messages. (Boris Shkolnik
- via cdouglas)
- HADOOP-4690. fuse-dfs - create source file/function + utils + config +
- main source files. (pete wyckoff via mahadev)
- HADOOP-3750. Fix and enforce module dependencies. (Sharad Agarwal via
- tomwhite)
- HADOOP-4747. Speed up FsShell::ls by removing redundant calls to the
- filesystem. (David Phillips via cdouglas)
- HADOOP-4305. Improves the blacklisting strategy, whereby, tasktrackers
- that are blacklisted are not given tasks to run from other jobs, subject
- to the following conditions (all must be met):
- 1) The TaskTracker has been blacklisted by at least 4 jobs (configurable)
- 2) The TaskTracker has been blacklisted 50% more number of times than
- the average (configurable)
- 3) The cluster has less than 50% trackers blacklisted
- Once in 24 hours, a TaskTracker blacklisted for all jobs is given a chance.
- Restarting the TaskTracker moves it out of the blacklist.
- (Amareshwari Sriramadasu via ddas)
- HADOOP-4688. Modify the MiniMRDFSSort unit test to spill multiple times,
- exercising the map-side merge code. (cdouglas)
- HADOOP-4737. Adds the KILLED notification when jobs get killed.
- (Amareshwari Sriramadasu via ddas)
- HADOOP-4728. Add a test exercising different namenode configurations.
- (Boris Shkolnik via cdouglas)
- HADOOP-4807. Adds JobClient commands to get the active/blacklisted tracker
- names. Also adds commands to display running/completed task attempt IDs.
- (ddas)
- HADOOP-4699. Remove checksum validation from map output servlet. (cdouglas)
- HADOOP-4838. Added a registry to automate metrics and mbeans management.
- (Sanjay Radia via acmurthy)
- HADOOP-3136. Fixed the default scheduler to assign multiple tasks to each
- tasktracker per heartbeat, when feasible. To ensure locality isn't hurt
- too badly, the scheudler will not assign more than one off-switch task per
- heartbeat. The heartbeat interval is also halved since the task-tracker is
- fixed to no longer send out heartbeats on each task completion. A
- slow-start for scheduling reduces is introduced to ensure that reduces
- aren't started till sufficient number of maps are done, else reduces of
- jobs whose maps aren't scheduled might swamp the cluster.
- Configuration changes to mapred-default.xml:
- add mapred.reduce.slowstart.completed.maps
- (acmurthy)
- HADOOP-4545. Add example and test case of secondary sort for the reduce.
- (omalley)
- HADOOP-4753. Refactor gridmix2 to reduce code duplication. (cdouglas)
- HADOOP-4909. Fix Javadoc and make some of the API more consistent in their
- use of the JobContext instead of Configuration. (omalley)
- HADOOP-4830. Add end-to-end test cases for testing queue capacities.
- (Vinod Kumar Vavilapalli via yhemanth)
- HADOOP-4980. Improve code layout of capacity scheduler to make it
- easier to fix some blocker bugs. (Vivek Ratan via yhemanth)
- HADOOP-4916. Make user/location of Chukwa installation configurable by an
- external properties file. (Eric Yang via cdouglas)
- HADOOP-4950. Make the CompressorStream, DecompressorStream,
- BlockCompressorStream, and BlockDecompressorStream public to facilitate
- non-Hadoop codecs. (omalley)
- HADOOP-4843. Collect job history and configuration in Chukwa. (Eric Yang
- via cdouglas)
- HADOOP-5030. Build Chukwa RPM to install into configured directory. (Eric
- Yang via cdouglas)
-
- HADOOP-4828. Updates documents to do with configuration (HADOOP-4631).
- (Sharad Agarwal via ddas)
- HADOOP-4939. Adds a test that would inject random failures for tasks in
- large jobs and would also inject TaskTracker failures. (ddas)
- HADOOP-4920. Stop storing Forrest output in Subversion. (cutting)
- HADOOP-4944. A configuration file can include other configuration
- files. (Rama Ramasamy via dhruba)
- HADOOP-4804. Provide Forrest documentation for the Fair Scheduler.
- (Sreekanth Ramakrishnan via yhemanth)
- HADOOP-5248. A testcase that checks for the existence of job directory
- after the job completes. Fails if it exists. (ddas)
- HADOOP-4664. Introduces multiple job initialization threads, where the
- number of threads are configurable via mapred.jobinit.threads.
- (Matei Zaharia and Jothi Padmanabhan via ddas)
- HADOOP-4191. Adds a testcase for JobHistory. (Ravi Gummadi via ddas)
- HADOOP-5466. Change documenation CSS style for headers and code. (Corinne
- Chandel via szetszwo)
- HADOOP-5275. Add ivy directory and files to built tar.
- (Giridharan Kesavan via nigel)
- HADOOP-5468. Add sub-menus to forrest documentation and make some minor
- edits. (Corinne Chandel via szetszwo)
- HADOOP-5437. Fix TestMiniMRDFSSort to properly test jvm-reuse. (omalley)
- HADOOP-5521. Removes dependency of TestJobInProgress on RESTART_COUNT
- JobHistory tag. (Ravi Gummadi via ddas)
- OPTIMIZATIONS
- HADOOP-3293. Fixes FileInputFormat to do provide locations for splits
- based on the rack/host that has the most number of bytes.
- (Jothi Padmanabhan via ddas)
- HADOOP-4683. Fixes Reduce shuffle scheduler to invoke
- getMapCompletionEvents in a separate thread. (Jothi Padmanabhan
- via ddas)
- BUG FIXES
- HADOOP-5379. CBZip2InputStream to throw IOException on data crc error.
- (Rodrigo Schmidt via zshao)
- HADOOP-5326. Fixes CBZip2OutputStream data corruption problem.
- (Rodrigo Schmidt via zshao)
- HADOOP-4204. Fix findbugs warnings related to unused variables, naive
- Number subclass instantiation, Map iteration, and badly scoped inner
- classes. (Suresh Srinivas via cdouglas)
- HADOOP-4207. Update derby jar file to release 10.4.2 release.
- (Prasad Chakka via dhruba)
- HADOOP-4325. SocketInputStream.read() should return -1 in case EOF.
- (Raghu Angadi)
- HADOOP-4408. FsAction functions need not create new objects. (cdouglas)
- HADOOP-4440. TestJobInProgressListener tests for jobs killed in queued
- state (Amar Kamat via ddas)
- HADOOP-4346. Implement blocking connect so that Hadoop is not affected
- by selector problem with JDK default implementation. (Raghu Angadi)
- HADOOP-4388. If there are invalid blocks in the transfer list, Datanode
- should handle them and keep transferring the remaining blocks. (Suresh
- Srinivas via szetszwo)
- HADOOP-4587. Fix a typo in Mapper javadoc. (Koji Noguchi via szetszwo)
- HADOOP-4530. In fsck, HttpServletResponse sendError fails with
- IllegalStateException. (hairong)
- HADOOP-4377. Fix a race condition in directory creation in
- NativeS3FileSystem. (David Phillips via cdouglas)
- HADOOP-4621. Fix javadoc warnings caused by duplicate jars. (Kan Zhang via
- cdouglas)
- HADOOP-4566. Deploy new hive code to support more types.
- (Zheng Shao via dhruba)
- HADOOP-4571. Add chukwa conf files to svn:ignore list. (Eric Yang via
- szetszwo)
- HADOOP-4589. Correct PiEstimator output messages and improve the code
- readability. (szetszwo)
- HADOOP-4650. Correct a mismatch between the default value of
- local.cache.size in the config and the source. (Jeff Hammerbacher via
- cdouglas)
- HADOOP-4606. Fix cygpath error if the log directory does not exist.
- (szetszwo via omalley)
- HADOOP-4141. Fix bug in ScriptBasedMapping causing potential infinite
- loop on misconfigured hadoop-site. (Aaron Kimball via tomwhite)
- HADOOP-4691. Correct a link in the javadoc of IndexedSortable. (szetszwo)
- HADOOP-4598. '-setrep' command skips under-replicated blocks. (hairong)
- HADOOP-4429. Set defaults for user, group in UnixUserGroupInformation so
- login fails more predictably when misconfigured. (Alex Loddengaard via
- cdouglas)
- HADOOP-4676. Fix broken URL in blacklisted tasktrackers page. (Amareshwari
- Sriramadasu via cdouglas)
- HADOOP-3422 Ganglia counter metrics are all reported with the metric
- name "value", so the counter values can not be seen. (Jason Attributor
- and Brian Bockelman via stack)
- HADOOP-4704. Fix javadoc typos "the the". (szetszwo)
- HADOOP-4677. Fix semantics of FileSystem::getBlockLocations to return
- meaningful values. (Hong Tang via cdouglas)
- HADOOP-4669. Use correct operator when evaluating whether access time is
- enabled (Dhruba Borthakur via cdouglas)
- HADOOP-4732. Pass connection and read timeouts in the correct order when
- setting up fetch in reduce. (Amareshwari Sriramadasu via cdouglas)
- HADOOP-4558. Fix capacity reclamation in capacity scheduler.
- (Amar Kamat via yhemanth)
- HADOOP-4770. Fix rungridmix_2 script to work with RunJar. (cdouglas)
- HADOOP-4738. When using git, the saveVersion script will use only the
- commit hash for the version and not the message, which requires escaping.
- (cdouglas)
- HADOOP-4576. Show pending job count instead of task count in the UI per
- queue in capacity scheduler. (Sreekanth Ramakrishnan via yhemanth)
- HADOOP-4623. Maintain running tasks even if speculative execution is off.
- (Amar Kamat via yhemanth)
- HADOOP-4786. Fix broken compilation error in
- TestTrackerBlacklistAcrossJobs. (yhemanth)
- HADOOP-4785. Fixes theJobTracker heartbeat to not make two calls to
- System.currentTimeMillis(). (Amareshwari Sriramadasu via ddas)
- HADOOP-4792. Add generated Chukwa configuration files to version control
- ignore lists. (cdouglas)
- HADOOP-4796. Fix Chukwa test configuration, remove unused components. (Eric
- Yang via cdouglas)
- HADOOP-4708. Add binaries missed in the initial checkin for Chukwa. (Eric
- Yang via cdouglas)
- HADOOP-4805. Remove black list collector from Chukwa Agent HTTP Sender.
- (Eric Yang via cdouglas)
- HADOOP-4837. Move HADOOP_CONF_DIR configuration to chukwa-env.sh (Jerome
- Boulon via cdouglas)
- HADOOP-4825. Use ps instead of jps for querying process status in Chukwa.
- (Eric Yang via cdouglas)
- HADOOP-4844. Fixed javadoc for
- org.apache.hadoop.fs.permission.AccessControlException to document that
- it's deprecated in favour of
- org.apache.hadoop.security.AccessControlException. (acmurthy)
- HADOOP-4706. Close the underlying output stream in
- IFileOutputStream::close. (Jothi Padmanabhan via cdouglas)
- HADOOP-4855. Fixed command-specific help messages for refreshServiceAcl in
- DFSAdmin and MRAdmin. (acmurthy)
- HADOOP-4820. Remove unused method FSNamesystem::deleteInSafeMode. (Suresh
- Srinivas via cdouglas)
- HADOOP-4698. Lower io.sort.mb to 10 in the tests and raise the junit memory
- limit to 512m from 256m. (Nigel Daley via cdouglas)
- HADOOP-4860. Split TestFileTailingAdapters into three separate tests to
- avoid contention. (Eric Yang via cdouglas)
- HADOOP-3921. Fixed clover (code coverage) target to work with JDK 6.
- (tomwhite via nigel)
- HADOOP-4845. Modify the reduce input byte counter to record only the
- compressed size and add a human-readable label. (Yongqiang He via cdouglas)
- HADOOP-4458. Add a test creating symlinks in the working directory.
- (Amareshwari Sriramadasu via cdouglas)
- HADOOP-4879. Fix org.apache.hadoop.mapred.Counters to correctly define
- Object.equals rather than depend on contentEquals api. (omalley via
- acmurthy)
- HADOOP-4791. Fix rpm build process for Chukwa. (Eric Yang via cdouglas)
- HADOOP-4771. Correct initialization of the file count for directories
- with quotas. (Ruyue Ma via shv)
- HADOOP-4878. Fix eclipse plugin classpath file to point to ivy's resolved
- lib directory and added the same to test-patch.sh. (Giridharan Kesavan via
- acmurthy)
- HADOOP-4774. Fix default values of some capacity scheduler configuration
- items which would otherwise not work on a fresh checkout.
- (Sreekanth Ramakrishnan via yhemanth)
- HADOOP-4876. Fix capacity scheduler reclamation by updating count of
- pending tasks correctly. (Sreekanth Ramakrishnan via yhemanth)
- HADOOP-4849. Documentation for Service Level Authorization implemented in
- HADOOP-4348. (acmurthy)
- HADOOP-4827. Replace Consolidator with Aggregator macros in Chukwa (Eric
- Yang via cdouglas)
- HADOOP-4894. Correctly parse ps output in Chukwa jettyCollector.sh. (Ari
- Rabkin via cdouglas)
- HADOOP-4892. Close fds out of Chukwa ExecPlugin. (Ari Rabkin via cdouglas)
- HADOOP-4889. Fix permissions in RPM packaging. (Eric Yang via cdouglas)
- HADOOP-4869. Fixes the TT-JT heartbeat to have an explicit flag for
- restart apart from the initialContact flag that there was earlier.
- (Amareshwari Sriramadasu via ddas)
- HADOOP-4716. Fixes ReduceTask.java to clear out the mapping between
- hosts and MapOutputLocation upon a JT restart (Amar Kamat via ddas)
- HADOOP-4880. Removes an unnecessary testcase from TestJobTrackerRestart.
- (Amar Kamat via ddas)
- HADOOP-4924. Fixes a race condition in TaskTracker re-init. (ddas)
- HADOOP-4854. Read reclaim capacity interval from capacity scheduler
- configuration. (Sreekanth Ramakrishnan via yhemanth)
- HADOOP-4896. HDFS Fsck does not load HDFS configuration. (Raghu Angadi)
- HADOOP-4956. Creates TaskStatus for failed tasks with an empty Counters
- object instead of null. (ddas)
- HADOOP-4979. Fix capacity scheduler to block cluster for failed high
- RAM requirements across task types. (Vivek Ratan via yhemanth)
- HADOOP-4949. Fix native compilation. (Chris Douglas via acmurthy)
- HADOOP-4787. Fixes the testcase TestTrackerBlacklistAcrossJobs which was
- earlier failing randomly. (Amareshwari Sriramadasu via ddas)
- HADOOP-4914. Add description fields to Chukwa init.d scripts (Eric Yang via
- cdouglas)
- HADOOP-4884. Make tool tip date format match standard HICC format. (Eric
- Yang via cdouglas)
- HADOOP-4925. Make Chukwa sender properties configurable. (Ari Rabkin via
- cdouglas)
- HADOOP-4947. Make Chukwa command parsing more forgiving of whitespace. (Ari
- Rabkin via cdouglas)
- HADOOP-5026. Make chukwa/bin scripts executable in repository. (Andy
- Konwinski via cdouglas)
- HADOOP-4977. Fix a deadlock between the reclaimCapacity and assignTasks
- in capacity scheduler. (Vivek Ratan via yhemanth)
- HADOOP-4988. Fix reclaim capacity to work even when there are queues with
- no capacity. (Vivek Ratan via yhemanth)
- HADOOP-5065. Remove generic parameters from argument to
- setIn/OutputFormatClass so that it works with SequenceIn/OutputFormat.
- (cdouglas via omalley)
- HADOOP-4818. Pass user config to instrumentation API. (Eric Yang via
- cdouglas)
- HADOOP-4993. Fix Chukwa agent configuration and startup to make it both
- more modular and testable. (Ari Rabkin via cdouglas)
- HADOOP-5048. Fix capacity scheduler to correctly cleanup jobs that are
- killed after initialization, but before running.
- (Sreekanth Ramakrishnan via yhemanth)
- HADOOP-4671. Mark loop control variables shared between threads as
- volatile. (cdouglas)
- HADOOP-5079. HashFunction inadvertently destroys some randomness
- (Jonathan Ellis via stack)
- HADOOP-4999. A failure to write to FsEditsLog results in
- IndexOutOfBounds exception. (Boris Shkolnik via rangadi)
- HADOOP-5139. Catch IllegalArgumentException during metrics registration
- in RPC. (Hairong Kuang via szetszwo)
- HADOOP-5085. Copying a file to local with Crc throws an exception.
- (hairong)
- HADOOP-4759. Removes temporary output directory for failed and
- killed tasks by launching special CLEANUP tasks for the same.
- (Amareshwari Sriramadasu via ddas)
- HADOOP-5211. Fix check for job completion in TestSetupAndCleanupFailure.
- (enis)
- HADOOP-5254. The Configuration class should be able to work with XML
- parsers that do not support xmlinclude. (Steve Loughran via dhruba)
- HADOOP-4692. Namenode in infinite loop for replicating/deleting corrupt
- blocks. (hairong)
- HADOOP-5255. Fix use of Math.abs to avoid overflow. (Jonathan Ellis via
- cdouglas)
- HADOOP-5269. Fixes a problem to do with tasktracker holding on to
- FAILED_UNCLEAN or KILLED_UNCLEAN tasks forever. (Amareshwari Sriramadasu
- via ddas)
- HADOOP-5214. Fixes a ConcurrentModificationException while the Fairshare
- Scheduler accesses the tasktrackers stored by the JobTracker.
- (Rahul Kumar Singh via yhemanth)
- HADOOP-5233. Addresses the three issues - Race condition in updating
- status, NPE in TaskTracker task localization when the conf file is missing
- (HADOOP-5234) and NPE in handling KillTaskAction of a cleanup task
- (HADOOP-5235). (Amareshwari Sriramadasu via ddas)
- HADOOP-5247. Introduces a broadcast of KillJobAction to all trackers when
- a job finishes. This fixes a bunch of problems to do with NPE when a
- completed job is not in memory and a tasktracker comes to the jobtracker
- with a status report of a task belonging to that job. (Amar Kamat via ddas)
- HADOOP-5282. Fixed job history logs for task attempts that are
- failed by the JobTracker, say due to lost task trackers. (Amar
- Kamat via yhemanth)
- HADOOP-4963. Fixes a logging to do with getting the location of
- map output file. (Amareshwari Sriramadasu via ddas)
-
- HADOOP-5292. Fix NPE in KFS::getBlockLocations. (Sriram Rao via lohit)
- HADOOP-5241. Fixes a bug in disk-space resource estimation. Makes
- the estimation formula linear where blowUp =
- Total-Output/Total-Input. (Sharad Agarwal via ddas)
- HADOOP-5142. Fix MapWritable#putAll to store key/value classes.
- (Do??acan G??ney via enis)
- HADOOP-4744. Workaround for jetty6 returning -1 when getLocalPort
- is invoked on the connector. The workaround patch retries a few
- times before failing. (Jothi Padmanabhan via yhemanth)
- HADOOP-5280. Adds a check to prevent a task state transition from
- FAILED to any of UNASSIGNED, RUNNING, COMMIT_PENDING or
- SUCCEEDED. (ddas)
- HADOOP-5272. Fixes a problem to do with detecting whether an
- attempt is the first attempt of a Task. This affects JobTracker
- restart. (Amar Kamat via ddas)
- HADOOP-5306. Fixes a problem to do with logging/parsing the http port of a
- lost tracker. Affects JobTracker restart. (Amar Kamat via ddas)
- HADOOP-5111. Fix Job::set* methods to work with generics. (cdouglas)
- HADOOP-5274. Fix gridmix2 dependency on wordcount example. (cdouglas)
- HADOOP-5145. Balancer sometimes runs out of memory after running
- days or weeks. (hairong)
- HADOOP-5338. Fix jobtracker restart to clear task completion
- events cached by tasktrackers forcing them to fetch all events
- afresh, thus avoiding missed task completion events on the
- tasktrackers. (Amar Kamat via yhemanth)
- HADOOP-4695. Change TestGlobalFilter so that it allows a web page to be
- filtered more than once for a single access. (Kan Zhang via szetszwo)
- HADOOP-5298. Change TestServletFilter so that it allows a web page to be
- filtered more than once for a single access. (szetszwo)
- HADOOP-5432. Disable ssl during unit tests in hdfsproxy, as it is unused
- and causes failures. (cdouglas)
- HADOOP-5416. Correct the shell command "fs -test" forrest doc description.
- (Ravi Phulari via szetszwo)
- HADOOP-5327. Fixed job tracker to remove files from system directory on
- ACL check failures and also check ACLs on restart.
- (Amar Kamat via yhemanth)
- HADOOP-5395. Change the exception message when a job is submitted to an
- invalid queue. (Rahul Kumar Singh via yhemanth)
- HADOOP-5276. Fixes a problem to do with updating the start time of
- a task when the tracker that ran the task is lost. (Amar Kamat via
- ddas)
- HADOOP-5278. Fixes a problem to do with logging the finish time of
- a task during recovery (after a JobTracker restart). (Amar Kamat
- via ddas)
- HADOOP-5490. Fixes a synchronization problem in the
- EagerTaskInitializationListener class. (Jothi Padmanabhan via
- ddas)
- HADOOP-5493. The shuffle copier threads return the codecs back to
- the pool when the shuffle completes. (Jothi Padmanabhan via ddas)
- HADOOP-5505. Fix JspHelper initialization in the context of
- MiniDFSCluster. (Raghu Angadi)
- HADOOP-5414. Fixes IO exception while executing hadoop fs -touchz
- fileName by making sure that lease renewal thread exits before dfs
- client exits. (hairong)
- HADOOP-5103. FileInputFormat now reuses the clusterMap network
- topology object and that brings down the log messages in the
- JobClient to do with NetworkTopology.add significantly. (Jothi
- Padmanabhan via ddas)
- HADOOP-5483. Fixes a problem in the Directory Cleanup Thread due to which
- TestMiniMRWithDFS sometimes used to fail. (ddas)
- HADOOP-5281. Prevent sharing incompatible ZlibCompressor instances between
- GzipCodec and DefaultCodec. (cdouglas)
- HADOOP-5463. Balancer throws "Not a host:port pair" unless port is
- specified in fs.default.name. (Stuart White via hairong)
- HADOOP-5514. Fix JobTracker metrics and add metrics for wating, failed
- tasks. (cdouglas)
- HADOOP-5516. Fix NullPointerException in TaskMemoryManagerThread
- that comes when monitored processes disappear when the thread is
- running. (Vinod Kumar Vavilapalli via yhemanth)
- HADOOP-5382. Support combiners in the new context object API. (omalley)
- HADOOP-5471. Fixes a problem to do with updating the log.index file in the
- case where a cleanup task is run. (Amareshwari Sriramadasu via ddas)
- HADOOP-5534. Fixed a deadlock in Fair scheduler's servlet.
- (Rahul Kumar Singh via yhemanth)
- HADOOP-5328. Fixes a problem in the renaming of job history files during
- job recovery. Amar Kamat via ddas)
- HADOOP-5417. Don't ignore InterruptedExceptions that happen when calling
- into rpc. (omalley)
- HADOOP-5320. Add a close() in TestMapReduceLocal. (Jothi Padmanabhan
- via szetszwo)
- HADOOP-5520. Fix a typo in disk quota help message. (Ravi Phulari
- via szetszwo)
- HADOOP-5519. Remove claims from mapred-default.xml that prime numbers
- of tasks are helpful. (Owen O'Malley via szetszwo)
- HADOOP-5484. TestRecoveryManager fails wtih FileAlreadyExistsException.
- (Amar Kamat via hairong)
- HADOOP-5564. Limit the JVM heap size in the java command for initializing
- JAVA_PLATFORM. (Suresh Srinivas via szetszwo)
- HADOOP-5565. Add API for failing/finalized jobs to the JT metrics
- instrumentation. (Jerome Boulon via cdouglas)
- HADOOP-5390. Remove duplicate jars from tarball, src from binary tarball
- added by hdfsproxy. (Zhiyong Zhang via cdouglas)
- HADOOP-5066. Building binary tarball should not build docs/javadocs, copy
- src, or run jdiff. (Giridharan Kesavan via cdouglas)
- HADOOP-5459. Fix undetected CRC errors where intermediate output is closed
- before it has been completely consumed. (cdouglas)
- HADOOP-5571. Remove widening primitive conversion in TupleWritable mask
- manipulation. (Jingkei Ly via cdouglas)
- HADOOP-5588. Remove an unnecessary call to listStatus(..) in
- FileSystem.globStatusInternal(..). (Hairong Kuang via szetszwo)
- HADOOP-5473. Solves a race condition in killing a task - the state is KILLED
- if there is a user request pending to kill the task and the TT reported
- the state as SUCCESS. (Amareshwari Sriramadasu via ddas)
- HADOOP-5576. Fix LocalRunner to work with the new context object API in
- mapreduce. (Tom White via omalley)
- HADOOP-4374. Installs a shutdown hook in the Task JVM so that log.index is
- updated before the JVM exits. Also makes the update to log.index atomic.
- (Ravi Gummadi via ddas)
- HADOOP-5577. Add a verbose flag to mapreduce.Job.waitForCompletion to get
- the running job's information printed to the user's stdout as it runs.
- (omalley)
- HADOOP-5607. Fix NPE in TestCapacityScheduler. (cdouglas)
- HADOOP-5605. All the replicas incorrectly got marked as corrupt. (hairong)
- HADOOP-5337. JobTracker, upon restart, now waits for the TaskTrackers to
- join back before scheduling new tasks. This fixes race conditions associated
- with greedy scheduling as was the case earlier. (Amar Kamat via ddas)
- HADOOP-5227. Fix distcp so -update and -delete can be meaningfully
- combined. (Tsz Wo (Nicholas), SZE via cdouglas)
- HADOOP-5305. Increase number of files and print debug messages in
- TestCopyFiles. (szetszwo)
- HADOOP-5548. Add synchronization for JobTracker methods in RecoveryManager.
- (Amareshwari Sriramadasu via sharad)
- HADOOP-3810. NameNode seems unstable on a cluster with little space left.
- (hairong)
- HADOOP-5068. Fix NPE in TestCapacityScheduler. (Vinod Kumar Vavilapalli
- via szetszwo)
- HADOOP-5585. Clear FileSystem statistics between tasks when jvm-reuse
- is enabled. (omalley)
- HADOOP-5394. JobTracker might schedule 2 attempts of the same task
- with the same attempt id across restarts. (Amar Kamat via sharad)
- HADOOP-5645. After HADOOP-4920 we need a place to checkin
- releasenotes.html. (nigel)
- Release 0.19.2 - Unreleased
- BUG FIXES
- HADOOP-5154. Fixes a deadlock in the fairshare scheduler.
- (Matei Zaharia via yhemanth)
-
- HADOOP-5146. Fixes a race condition that causes LocalDirAllocator to miss
- files. (Devaraj Das via yhemanth)
- HADOOP-4638. Fixes job recovery to not crash the job tracker for problems
- with a single job file. (Amar Kamat via yhemanth)
- HADOOP-5384. Fix a problem that DataNodeCluster creates blocks with
- generationStamp == 1. (szetszwo)
- HADOOP-5376. Fixes the code handling lost tasktrackers to set the task state
- to KILLED_UNCLEAN only for relevant type of tasks.
- (Amareshwari Sriramadasu via yhemanth)
- HADOOP-5285. Fixes the issues - (1) obtainTaskCleanupTask checks whether job is
- inited before trying to lock the JobInProgress (2) Moves the CleanupQueue class
- outside the TaskTracker and makes it a generic class that is used by the
- JobTracker also for deleting the paths on the job's output fs. (3) Moves the
- references to completedJobStore outside the block where the JobTracker is locked.
- (ddas)
- HADOOP-5392. Fixes a problem to do with JT crashing during recovery when
- the job files are garbled. (Amar Kamat vi ddas)
- HADOOP-5332. Appending to files is not allowed (by default) unless
- dfs.support.append is set to true. (dhruba)
- HADOOP-5333. libhdfs supports appending to files. (dhruba)
- HADOOP-3998. Fix dfsclient exception when JVM is shutdown. (dhruba)
- HADOOP-5440. Fixes a problem to do with removing a taskId from the list
- of taskIds that the TaskTracker's TaskMemoryManager manages.
- (Amareshwari Sriramadasu via ddas)
-
- HADOOP-5446. Restore TaskTracker metrics. (cdouglas)
- HADOOP-5449. Fixes the history cleaner thread.
- (Amareshwari Sriramadasu via ddas)
- HADOOP-5479. NameNode should not send empty block replication request to
- DataNode. (hairong)
- HADOOP-5259. Job with output hdfs:/user/<username>/outputpath (no
- authority) fails with Wrong FS. (Doug Cutting via hairong)
- HADOOP-5522. Documents the setup/cleanup tasks in the mapred tutorial.
- (Amareshwari Sriramadasu via ddas)
- HADOOP-5549. ReplicationMonitor should schedule both replication and
- deletion work in one iteration. (hairong)
- HADOOP-5554. DataNodeCluster and CreateEditsLog should create blocks with
- the same generation stamp value. (hairong via szetszwo)
- HADOOP-5231. Clones the TaskStatus before passing it to the JobInProgress.
- (Amareshwari Sriramadasu via ddas)
- HADOOP-4719. Fix documentation of 'ls' format for FsShell. (Ravi Phulari
- via cdouglas)
- HADOOP-5374. Fixes a NPE problem in getTasksToSave method.
- (Amareshwari Sriramadasu via ddas)
- HADOOP-4780. Cache the size of directories in DistributedCache, avoiding
- long delays in recalculating it. (He Yongqiang via cdouglas)
- HADOOP-5551. Prevent directory destruction on file create.
- (Brian Bockelman via shv)
- Release 0.19.1 - 2009-02-23
- IMPROVEMENTS
- HADOOP-4739. Fix spelling and grammar, improve phrasing of some sections in
- mapred tutorial. (Vivek Ratan via cdouglas)
- HADOOP-3894. DFSClient logging improvements. (Steve Loughran via shv)
- HADOOP-5126. Remove empty file BlocksWithLocations.java (shv)
- HADOOP-5127. Remove public methods in FSDirectory. (Jakob Homan via shv)
- BUG FIXES
- HADOOP-4697. Fix getBlockLocations in KosmosFileSystem to handle multiple
- blocks correctly. (Sriram Rao via cdouglas)
- HADOOP-4420. Add null checks for job, caused by invalid job IDs.
- (Aaron Kimball via tomwhite)
- HADOOP-4632. Fix TestJobHistoryVersion to use test.build.dir instead of the
- current workding directory for scratch space. (Amar Kamat via cdouglas)
- HADOOP-4508. Fix FSDataOutputStream.getPos() for append. (dhruba via
- szetszwo)
- HADOOP-4727. Fix a group checking bug in fill_stat_structure(...) in
- fuse-dfs. (Brian Bockelman via szetszwo)
- HADOOP-4836. Correct typos in mapred related documentation. (Jord? Polo
- via szetszwo)
- HADOOP-4821. Usage description in the Quotas guide documentations are
- incorrect. (Boris Shkolnik via hairong)
- HADOOP-4847. Moves the loading of OutputCommitter to the Task.
- (Amareshwari Sriramadasu via ddas)
- HADOOP-4966. Marks completed setup tasks for removal.
- (Amareshwari Sriramadasu via ddas)
- HADOOP-4982. TestFsck should run in Eclipse. (shv)
- HADOOP-5008. TestReplication#testPendingReplicationRetry leaves an opened
- fd unclosed. (hairong)
- HADOOP-4906. Fix TaskTracker OOM by keeping a shallow copy of JobConf in
- TaskTracker.TaskInProgress. (Sharad Agarwal via acmurthy)
- HADOOP-4918. Fix bzip2 compression to work with Sequence Files.
- (Zheng Shao via dhruba).
- HADOOP-4965. TestFileAppend3 should close FileSystem. (shv)
- HADOOP-4967. Fixes a race condition in the JvmManager to do with killing
- tasks. (ddas)
- HADOOP-5009. DataNode#shutdown sometimes leaves data block scanner
- verification log unclosed. (hairong)
- HADOOP-5086. Use the appropriate FileSystem for trash URIs. (cdouglas)
-
- HADOOP-4955. Make DBOutputFormat us column names from setOutput().
- (Kevin Peterson via enis)
- HADOOP-4862. Minor : HADOOP-3678 did not remove all the cases of
- spurious IOExceptions logged by DataNode. (Raghu Angadi)
- HADOOP-5034. NameNode should send both replication and deletion requests
- to DataNode in one reply to a heartbeat. (hairong)
- HADOOP-5156. TestHeartbeatHandling uses MiiDFSCluster.getNamesystem()
- which does not exit in branch 0.19 and 0.20. (hairong)
- HADOOP-5161. Accepted sockets do not get placed in
- DataXceiverServer#childSockets. (hairong)
- HADOOP-5193. Correct calculation of edits modification time. (shv)
- HADOOP-4494. Allow libhdfs to append to files.
- (Pete Wyckoff via dhruba)
- HADOOP-5166. Fix JobTracker restart to work when ACLs are configured
- for the JobTracker. (Amar Kamat via yhemanth).
- HADOOP-5067. Fixes TaskInProgress.java to keep track of count of failed and
- killed tasks correctly. (Amareshwari Sriramadasu via ddas)
- HADOOP-4760. HDFS streams should not throw exceptions when closed twice.
- (enis)
- Release 0.19.0 - 2008-11-18
- INCOMPATIBLE CHANGES
- HADOOP-3595. Remove deprecated methods for mapred.combine.once
- functionality, which was necessary to providing backwards
- compatible combiner semantics for 0.18. (cdouglas via omalley)
- HADOOP-3667. Remove the following deprecated methods from JobConf:
- addInputPath(Path)
- getInputPaths()
- getMapOutputCompressionType()
- getOutputPath()
- getSystemDir()
- setInputPath(Path)
- setMapOutputCompressionType(CompressionType style)
- setOutputPath(Path)
- (Amareshwari Sriramadasu via omalley)
- HADOOP-3652. Remove deprecated class OutputFormatBase.
- (Amareshwari Sriramadasu via cdouglas)
- HADOOP-2885. Break the hadoop.dfs package into separate packages under
- hadoop.hdfs that reflect whether they are client, server, protocol,
- etc. DistributedFileSystem and DFSClient have moved and are now
- considered package private. (Sanjay Radia via omalley)
- HADOOP-2325. Require Java 6. (cutting)
- HADOOP-372. Add support for multiple input paths with a different
- InputFormat and Mapper for each path. (Chris Smith via tomwhite)
- HADOOP-1700. Support appending to file in HDFS. (dhruba)
- HADOOP-3792. Make FsShell -test consistent with unix semantics, returning
- zero for true and non-zero for false. (Ben Slusky via cdouglas)
- HADOOP-3664. Remove the deprecated method InputFormat.validateInput,
- which is no longer needed. (tomwhite via omalley)
- HADOOP-3549. Give more meaningful errno's in libhdfs. In particular,
- EACCES is returned for permission problems. (Ben Slusky via omalley)
- HADOOP-4036. ResourceStatus was added to TaskTrackerStatus by HADOOP-3759,
- so increment the InterTrackerProtocol version. (Hemanth Yamijala via
- omalley)
- HADOOP-3150. Moves task promotion to tasks. Defines a new interface for
- committing output files. Moves job setup to jobclient, and moves jobcleanup
- to a separate task. (Amareshwari Sriramadasu via ddas)
- HADOOP-3446. Keep map outputs in memory during the reduce. Remove
- fs.inmemory.size.mb and replace with properties defining in memory map
- output retention during the shuffle and reduce relative to maximum heap
- usage. (cdouglas)
- HADOOP-3245. Adds the feature for supporting JobTracker restart. Running
- jobs can be recovered from the history file. The history file format has
- been modified to support recovery. The task attempt ID now has the
- JobTracker start time to disinguish attempts of the same TIP across
- restarts. (Amar Ramesh Kamat via ddas)
- HADOOP-4007. REMOVE DFSFileInfo - FileStatus is sufficient.
- (Sanjay Radia via hairong)
- HADOOP-3722. Fixed Hadoop Streaming and Hadoop Pipes to use the Tool
- interface and GenericOptionsParser. (Enis Soztutar via acmurthy)
- HADOOP-2816. Cluster summary at name node web reports the space
- utilization as:
- Configured Capacity: capacity of all the data directories - Reserved space
- Present Capacity: Space available for dfs,i.e. remaining+used space
- DFS Used%: DFS used space/Present Capacity
- (Suresh Srinivas via hairong)
- HADOOP-3938. Disk space quotas for HDFS. This is similar to namespace
- quotas in 0.18. (rangadi)
- HADOOP-4293. Make Configuration Writable and remove unreleased
- WritableJobConf. Configuration.write is renamed to writeXml. (omalley)
- HADOOP-4281. Change dfsadmin to report available disk space in a format
- consistent with the web interface as defined in HADOOP-2816. (Suresh
- Srinivas via cdouglas)
- HADOOP-4430. Further change the cluster summary at name node web that was
- changed in HADOOP-2816:
- Non DFS Used - This indicates the disk space taken by non DFS file from
- the Configured capacity
- DFS Used % - DFS Used % of Configured Capacity
- DFS Remaining % - Remaing % Configured Capacity available for DFS use
- DFS command line report reflects the same change. Config parameter
- dfs.datanode.du.pct is no longer used and is removed from the
- hadoop-default.xml. (Suresh Srinivas via hairong)
- HADOOP-4116. Balancer should provide better resource management. (hairong)
- HADOOP-4599. BlocksMap and BlockInfo made package private. (shv)
- NEW FEATURES
- HADOOP-3341. Allow streaming jobs to specify the field separator for map
- and reduce input and output. The new configuration values are:
- stream.map.input.field.separator
- stream.map.output.field.separator
- stream.reduce.input.field.separator
- stream.reduce.output.field.separator
- All of them default to "t". (Zheng Shao via omalley)
- HADOOP-3479. Defines the configuration file for the resource manager in
- Hadoop. You can configure various parameters related to scheduling, such
- as queues and queue properties here. The properties for a queue follow a
- naming convention,such as, hadoop.rm.queue.queue-name.property-name.
- (Hemanth Yamijala via ddas)
- HADOOP-3149. Adds a way in which map/reducetasks can create multiple
- outputs. (Alejandro Abdelnur via ddas)
- HADOOP-3714. Add a new contrib, bash-tab-completion, which enables
- bash tab completion for the bin/hadoop script. See the README file
- in the contrib directory for the installation. (Chris Smith via enis)
- HADOOP-3730. Adds a new JobConf constructor that disables loading
- default configurations. (Alejandro Abdelnur via ddas)
- HADOOP-3772. Add a new Hadoop Instrumentation api for the JobTracker and
- the TaskTracker, refactor Hadoop Metrics as an implementation of the api.
- (Ari Rabkin via acmurthy)
- HADOOP-2302. Provides a comparator for numerical sorting of key fields.
- (ddas)
- HADOOP-153. Provides a way to skip bad records. (Sharad Agarwal via ddas)
- HADOOP-657. Free disk space should be modelled and used by the scheduler
- to make scheduling decisions. (Ari Rabkin via omalley)
- HADOOP-3719. Initial checkin of Chukwa, which is a data collection and
- analysis framework. (Jerome Boulon, Andy Konwinski, Ari Rabkin,
- and Eric Yang)
- HADOOP-3873. Add -filelimit and -sizelimit options to distcp to cap the
- number of files/bytes copied in a particular run to support incremental
- updates and mirroring. (TszWo (Nicholas), SZE via cdouglas)
- HADOOP-3585. FailMon package for hardware failure monitoring and
- analysis of anomalies. (Ioannis Koltsidas via dhruba)
- HADOOP-1480. Add counters to the C++ Pipes API. (acmurthy via omalley)
- HADOOP-3854. Add support for pluggable servlet filters in the HttpServers.
- (Tsz Wo (Nicholas) Sze via omalley)
- HADOOP-3759. Provides ability to run memory intensive jobs without
- affecting other running tasks on the nodes. (Hemanth Yamijala via ddas)
- HADOOP-3746. Add a fair share scheduler. (Matei Zaharia via omalley)
- HADOOP-3754. Add a thrift interface to access HDFS. (dhruba via omalley)
- HADOOP-3828. Provides a way to write skipped records to DFS.
- (Sharad Agarwal via ddas)
- HADOOP-3948. Separate name-node edits and fsimage directories.
- (Lohit Vijayarenu via shv)
- HADOOP-3939. Add an option to DistCp to delete files at the destination
- not present at the source. (Tsz Wo (Nicholas) Sze via cdouglas)
- HADOOP-3601. Add a new contrib module for Hive, which is a sql-like
- query processing tool that uses map/reduce. (Ashish Thusoo via omalley)
- HADOOP-3866. Added sort and multi-job updates in the JobTracker web ui.
- (Craig Weisenfluh via omalley)
- HADOOP-3698. Add access control to control who is allowed to submit or
- modify jobs in the JobTracker. (Hemanth Yamijala via omalley)
- HADOOP-1869. Support access times for HDFS files. (dhruba)
- HADOOP-3941. Extend FileSystem API to return file-checksums.
- (szetszwo)
- HADOOP-3581. Prevents memory intensive user tasks from taking down
- nodes. (Vinod K V via ddas)
- HADOOP-3970. Provides a way to recover counters written to JobHistory.
- (Amar Kamat via ddas)
- HADOOP-3702. Adds ChainMapper and ChainReducer classes allow composing
- chains of Maps and Reduces in a single Map/Reduce job, something like
- MAP+ / REDUCE MAP*. (Alejandro Abdelnur via ddas)
- HADOOP-3445. Add capacity scheduler that provides guaranteed capacities to
- queues as a percentage of the cluster. (Vivek Ratan via omalley)
- HADOOP-3992. Add a synthetic load generation facility to the test
- directory. (hairong via szetszwo)
- HADOOP-3981. Implement a distributed file checksum algorithm in HDFS
- and change DistCp to use file checksum for comparing src and dst files
- (szetszwo)
- HADOOP-3829. Narrown down skipped records based on user acceptable value.
- (Sharad Agarwal via ddas)
- HADOOP-3930. Add common interfaces for the pluggable schedulers and the
- cli & gui clients. (Sreekanth Ramakrishnan via omalley)
- HADOOP-4176. Implement getFileChecksum(Path) in HftpFileSystem. (szetszwo)
- HADOOP-249. Reuse JVMs across Map-Reduce Tasks.
- Configuration changes to hadoop-default.xml:
- add mapred.job.reuse.jvm.num.tasks
- (Devaraj Das via acmurthy)
- HADOOP-4070. Provide a mechanism in Hive for registering UDFs from the
- query language. (tomwhite)
- HADOOP-2536. Implement a JDBC based database input and output formats to
- allow Map-Reduce applications to work with databases. (Fredrik Hedberg and
- Enis Soztutar via acmurthy)
- HADOOP-3019. A new library to support total order partitions.
- (cdouglas via omalley)
- HADOOP-3924. Added a 'KILLED' job status. (Subramaniam Krishnan via
- acmurthy)
- IMPROVEMENTS
- HADOOP-4205. hive: metastore and ql to use the refactored SerDe library.
- (zshao)
- HADOOP-4106. libhdfs: add time, permission and user attribute support
- (part 2). (Pete Wyckoff through zshao)
- HADOOP-4104. libhdfs: add time, permission and user attribute support.
- (Pete Wyckoff through zshao)
- HADOOP-3908. libhdfs: better error message if llibhdfs.so doesn't exist.
- (Pete Wyckoff through zshao)
- HADOOP-3732. Delay intialization of datanode block verification till
- the verification thread is started. (rangadi)
- HADOOP-1627. Various small improvements to 'dfsadmin -report' output.
- (rangadi)
- HADOOP-3577. Tools to inject blocks into name node and simulated
- data nodes for testing. (Sanjay Radia via hairong)
- HADOOP-2664. Add a lzop compatible codec, so that files compressed by lzop
- may be processed by map/reduce. (cdouglas via omalley)
- HADOOP-3655. Add additional ant properties to control junit. (Steve
- Loughran via omalley)
- HADOOP-3543. Update the copyright year to 2008. (cdouglas via omalley)
- HADOOP-3587. Add a unit test for the contrib/data_join framework.
- (cdouglas)
- HADOOP-3402. Add terasort example program (omalley)
- HADOOP-3660. Add replication factor for injecting blocks in simulated
- datanodes. (Sanjay Radia via cdouglas)
- HADOOP-3684. Add a cloning function to the contrib/data_join framework
- permitting users to define a more efficient method for cloning values from
- the reduce than serialization/deserialization. (Runping Qi via cdouglas)
- HADOOP-3478. Improves the handling of map output fetching. Now the
- randomization is by the hosts (and not the map outputs themselves).
- (Jothi Padmanabhan via ddas)
- HADOOP-3617. Removed redundant checks of accounting space in MapTask and
- makes the spill thread persistent so as to avoid creating a new one for
- each spill. (Chris Douglas via acmurthy)
- HADOOP-3412. Factor the scheduler out of the JobTracker and make
- it pluggable. (Tom White and Brice Arnould via omalley)
- HADOOP-3756. Minor. Remove unused dfs.client.buffer.dir from
- hadoop-default.xml. (rangadi)
- HADOOP-3747. Adds counter suport for MultipleOutputs.
- (Alejandro Abdelnur via ddas)
- HADOOP-3169. LeaseChecker daemon should not be started in DFSClient
- constructor. (TszWo (Nicholas), SZE via hairong)
- HADOOP-3824. Move base functionality of StatusHttpServer to a core
- package. (TszWo (Nicholas), SZE via cdouglas)
- HADOOP-3646. Add a bzip2 compatible codec, so bzip compressed data
- may be processed by map/reduce. (Abdul Qadeer via cdouglas)
- HADOOP-3861. MapFile.Reader and Writer should implement Closeable.
- (tomwhite via omalley)
- HADOOP-3791. Introduce generics into ReflectionUtils. (Chris Smith via
- cdouglas)
- HADOOP-3694. Improve unit test performance by changing
- MiniDFSCluster to listen only on 127.0.0.1. (cutting)
- HADOOP-3620. Namenode should synchronously resolve a datanode's network
- location when the datanode registers. (hairong)
- HADOOP-3860. NNThroughputBenchmark is extended with rename and delete
- benchmarks. (shv)
-
- HADOOP-3892. Include unix group name in JobConf. (Matei Zaharia via johan)
- HADOOP-3875. Change the time period between heartbeats to be relative to
- the end of the heartbeat rpc, rather than the start. This causes better
- behavior if the JobTracker is overloaded. (acmurthy via omalley)
- HADOOP-3853. Move multiple input format (HADOOP-372) extension to
- library package. (tomwhite via johan)
- HADOOP-9. Use roulette scheduling for temporary space when the size
- is not known. (Ari Rabkin via omalley)
- HADOOP-3202. Use recursive delete rather than FileUtil.fullyDelete.
- (Amareshwari Sriramadasu via omalley)
- HADOOP-3368. Remove common-logging.properties from conf. (Steve Loughran
- via omalley)
- HADOOP-3851. Fix spelling mistake in FSNamesystemMetrics. (Steve Loughran
- via omalley)
- HADOOP-3780. Remove asynchronous resolution of network topology in the
- JobTracker (Amar Kamat via omalley)
- HADOOP-3852. Add ShellCommandExecutor.toString method to make nicer
- error messages. (Steve Loughran via omalley)
- HADOOP-3844. Include message of local exception in RPC client failures.
- (Steve Loughran via omalley)
- HADOOP-3935. Split out inner classes from DataNode.java. (johan)
- HADOOP-3905. Create generic interfaces for edit log streams. (shv)
- HADOOP-3062. Add metrics to DataNode and TaskTracker to record network
- traffic for HDFS reads/writes and MR shuffling. (cdouglas)
- HADOOP-3742. Remove HDFS from public java doc and add javadoc-dev for
- generative javadoc for developers. (Sanjay Radia via omalley)
- HADOOP-3944. Improve documentation for public TupleWritable class in
- join package. (Chris Douglas via enis)
- HADOOP-2330. Preallocate HDFS transaction log to improve performance.
- (dhruba and hairong)
- HADOOP-3965. Convert DataBlockScanner into a package private class. (shv)
- HADOOP-3488. Prevent hadoop-daemon from rsync'ing log files (Stefan
- Groshupf and Craig Macdonald via omalley)
- HADOOP-3342. Change the kill task actions to require http post instead of
- get to prevent accidental crawls from triggering it. (enis via omalley)
- HADOOP-3937. Limit the job name in the job history filename to 50
- characters. (Matei Zaharia via omalley)
- HADOOP-3943. Remove unnecessary synchronization in
- NetworkTopology.pseudoSortByDistance. (hairong via omalley)
- HADOOP-3498. File globbing alternation should be able to span path
- components. (tomwhite)
- HADOOP-3361. Implement renames for NativeS3FileSystem.
- (Albert Chern via tomwhite)
- HADOOP-3605. Make EC2 scripts show an error message if AWS_ACCOUNT_ID is
- unset. (Al Hoang via tomwhite)
- HADOOP-4147. Remove unused class JobWithTaskContext from class
- JobInProgress. (Amareshwari Sriramadasu via johan)
- HADOOP-4151. Add a byte-comparable interface that both Text and
- BytesWritable implement. (cdouglas via omalley)
- HADOOP-4174. Move fs image/edit log methods from ClientProtocol to
- NamenodeProtocol. (shv via szetszwo)
- HADOOP-4181. Include a .gitignore and saveVersion.sh change to support
- developing under git. (omalley)
- HADOOP-4186. Factor LineReader out of LineRecordReader. (tomwhite via
- omalley)
- HADOOP-4184. Break the module dependencies between core, hdfs, and
- mapred. (tomwhite via omalley)
- HADOOP-4075. test-patch.sh now spits out ant commands that it runs.
- (Ramya R via nigel)
- HADOOP-4117. Improve configurability of Hadoop EC2 instances.
- (tomwhite)
- HADOOP-2411. Add support for larger CPU EC2 instance types.
- (Chris K Wensel via tomwhite)
- HADOOP-4083. Changed the configuration attribute queue.name to
- mapred.job.queue.name. (Hemanth Yamijala via acmurthy)
- HADOOP-4194. Added the JobConf and JobID to job-related methods in
- JobTrackerInstrumentation for better metrics. (Mac Yang via acmurthy)
- HADOOP-3975. Change test-patch script to report working the dir
- modifications preventing the suite from being run. (Ramya R via cdouglas)
- HADOOP-4124. Added a command-line switch to allow users to set job
- priorities, also allow it to be manipulated via the web-ui. (Hemanth
- Yamijala via acmurthy)
- HADOOP-2165. Augmented JobHistory to include the URIs to the tasks'
- userlogs. (Vinod Kumar Vavilapalli via acmurthy)
- HADOOP-4062. Remove the synchronization on the output stream when a
- connection is closed and also remove an undesirable exception when
- a client is stoped while there is no pending RPC request. (hairong)
- HADOOP-4227. Remove the deprecated class org.apache.hadoop.fs.ShellCommand.
- (szetszwo)
- HADOOP-4006. Clean up FSConstants and move some of the constants to
- better places. (Sanjay Radia via rangadi)
- HADOOP-4279. Trace the seeds of random sequences in append unit tests to
- make itermitant failures reproducible. (szetszwo via cdouglas)
- HADOOP-4209. Remove the change to the format of task attempt id by
- incrementing the task attempt numbers by 1000 when the job restarts.
- (Amar Kamat via omalley)
- HADOOP-4301. Adds forrest doc for the skip bad records feature.
- (Sharad Agarwal via ddas)
- HADOOP-4354. Separate TestDatanodeDeath.testDatanodeDeath() into 4 tests.
- (szetszwo)
- HADOOP-3790. Add more unit tests for testing HDFS file append. (szetszwo)
- HADOOP-4321. Include documentation for the capacity scheduler. (Hemanth
- Yamijala via omalley)
- HADOOP-4424. Change menu layout for Hadoop documentation (Boris Shkolnik
- via cdouglas).
- HADOOP-4438. Update forrest documentation to include missing FsShell
- commands. (Suresh Srinivas via cdouglas)
- HADOOP-4105. Add forrest documentation for libhdfs.
- (Pete Wyckoff via cutting)
- HADOOP-4510. Make getTaskOutputPath public. (Chris Wensel via omalley)
- OPTIMIZATIONS
- HADOOP-3556. Removed lock contention in MD5Hash by changing the
- singleton MessageDigester by an instance per Thread using
- ThreadLocal. (Iv?n de Prado via omalley)
- HADOOP-3328. When client is writing data to DFS, only the last
- datanode in the pipeline needs to verify the checksum. Saves around
- 30% CPU on intermediate datanodes. (rangadi)
- HADOOP-3863. Use a thread-local string encoder rather than a static one
- that is protected by a lock. (acmurthy via omalley)
- HADOOP-3864. Prevent the JobTracker from locking up when a job is being
- initialized. (acmurthy via omalley)
- HADOOP-3816. Faster directory listing in KFS. (Sriram Rao via omalley)
- HADOOP-2130. Pipes submit job should have both blocking and non-blocking
- versions. (acmurthy via omalley)
- HADOOP-3769. Make the SampleMapper and SampleReducer from
- GenericMRLoadGenerator public, so they can be used in other contexts.
- (Lingyun Yang via omalley)
- HADOOP-3514. Inline the CRCs in intermediate files as opposed to reading
- it from a different .crc file. (Jothi Padmanabhan via ddas)
- HADOOP-3638. Caches the iFile index files in memory to reduce seeks
- (Jothi Padmanabhan via ddas)
- HADOOP-4225. FSEditLog.logOpenFile() should persist accessTime
- rather than modificationTime. (shv)
- HADOOP-4380. Made several new classes (Child, JVMId,
- JobTrackerInstrumentation, QueueManager, ResourceEstimator,
- TaskTrackerInstrumentation, and TaskTrackerMetricsInst) in
- org.apache.hadoop.mapred package private instead of public. (omalley)
- BUG FIXES
- HADOOP-3563. Refactor the distributed upgrade code so that it is
- easier to identify datanode and namenode related code. (dhruba)
- HADOOP-3640. Fix the read method in the NativeS3InputStream. (tomwhite via
- omalley)
- HADOOP-3711. Fixes the Streaming input parsing to properly find the
- separator. (Amareshwari Sriramadasu via ddas)
- HADOOP-3725. Prevent TestMiniMRMapDebugScript from swallowing exceptions.
- (Steve Loughran via cdouglas)
- HADOOP-3726. Throw exceptions from TestCLI setup and teardown instead of
- swallowing them. (Steve Loughran via cdouglas)
- HADOOP-3721. Refactor CompositeRecordReader and related mapred.join classes
- to make them clearer. (cdouglas)
- HADOOP-3720. Re-read the config file when dfsadmin -refreshNodes is invoked
- so dfs.hosts and dfs.hosts.exclude are observed. (lohit vijayarenu via
- cdouglas)
- HADOOP-3485. Allow writing to files over fuse.
- (Pete Wyckoff via dhruba)
- HADOOP-3723. The flags to the libhdfs.create call can be treated as
- a bitmask. (Pete Wyckoff via dhruba)
- HADOOP-3643. Filter out completed tasks when asking for running tasks in
- the JobTracker web/ui. (Amar Kamat via omalley)
- HADOOP-3777. Ensure that Lzo compressors/decompressors correctly handle the
- case where native libraries aren't available. (Chris Douglas via acmurthy)
- HADOOP-3728. Fix SleepJob so that it doesn't depend on temporary files,
- this ensures we can now run more than one instance of SleepJob
- simultaneously. (Chris Douglas via acmurthy)
- HADOOP-3795. Fix saving image files on Namenode with different checkpoint
- stamps. (Lohit Vijayarenu via mahadev)
-
- HADOOP-3624. Improving createeditslog to create tree directory structure.
- (Lohit Vijayarenu via mahadev)
- HADOOP-3778. DFSInputStream.seek() did not retry in case of some errors.
- (LN via rangadi)
- HADOOP-3661. The handling of moving files deleted through fuse-dfs to
- Trash made similar to the behaviour from dfs shell.
- (Pete Wyckoff via dhruba)
- HADOOP-3819. Unset LANG and LC_CTYPE in saveVersion.sh to make it
- compatible with non-English locales. (Rong-En Fan via cdouglas)
- HADOOP-3848. Cache calls to getSystemDir in the TaskTracker instead of
- calling it for each task start. (acmurthy via omalley)
- HADOOP-3131. Fix reduce progress reporting for compressed intermediate
- data. (Matei Zaharia via acmurthy)
- HADOOP-3796. fuse-dfs configuration is implemented as file system
- mount options. (Pete Wyckoff via dhruba)
- HADOOP-3836. Fix TestMultipleOutputs to correctly clean up. (Alejandro
- Abdelnur via acmurthy)
- HADOOP-3805. Improve fuse-dfs write performance.
- (Pete Wyckoff via zshao)
- HADOOP-3846. Fix unit test CreateEditsLog to generate paths correctly.
- (Lohit Vjayarenu via cdouglas)
-
- HADOOP-3904. Fix unit tests using the old dfs package name.
- (TszWo (Nicholas), SZE via johan)
- HADOOP-3319. Fix some HOD error messages to go stderr instead of
- stdout. (Vinod Kumar Vavilapalli via omalley)
- HADOOP-3907. Move INodeDirectoryWithQuota to its own .java file.
- (Tsz Wo (Nicholas), SZE via hairong)
- HADOOP-3919. Fix attribute name in hadoop-default for
- mapred.jobtracker.instrumentation. (Ari Rabkin via omalley)
- HADOOP-3903. Change the package name for the servlets to be hdfs instead of
- dfs. (Tsz Wo (Nicholas) Sze via omalley)
- HADOOP-3773. Change Pipes to set the default map output key and value
- types correctly. (Koji Noguchi via omalley)
- HADOOP-3952. Fix compilation error in TestDataJoin referencing dfs package.
- (omalley)
- HADOOP-3951. Fix package name for FSNamesystem logs and modify other
- hard-coded Logs to use the class name. (cdouglas)
- HADOOP-3889. Improve error reporting from HftpFileSystem, handling in
- DistCp. (Tsz Wo (Nicholas), SZE via cdouglas)
- HADOOP-3946. Fix TestMapRed after hadoop-3664. (tomwhite via omalley)
- HADOOP-3949. Remove duplicate jars from Chukwa. (Jerome Boulon via omalley)
- HADOOP-3933. DataNode sometimes sends up to io.byte.per.checksum bytes
- more than required to client. (Ning Li via rangadi)
- HADOOP-3962. Shell command "fs -count" should support paths with different
- file systems. (Tsz Wo (Nicholas), SZE via mahadev)
- HADOOP-3957. Fix javac warnings in DistCp and TestCopyFiles. (Tsz Wo
- (Nicholas), SZE via cdouglas)
- HADOOP-3958. Fix TestMapRed to check the success of test-job. (omalley via
- acmurthy)
- HADOOP-3985. Fix TestHDFSServerPorts to use random ports. (Hairong Kuang
- via omalley)
- HADOOP-3964. Fix javadoc warnings introduced by FailMon. (dhruba)
- HADOOP-3785. Fix FileSystem cache to be case-insensitive for scheme and
- authority. (Bill de hOra via cdouglas)
- HADOOP-3506. Fix a rare NPE caused by error handling in S3. (Tom White via
- cdouglas)
- HADOOP-3705. Fix mapred.join parser to accept InputFormats named with
- underscore and static, inner classes. (cdouglas)
- HADOOP-4023. Fix javadoc warnings introduced when the HDFS javadoc was
- made private. (omalley)
- HADOOP-4030. Remove lzop from the default list of codecs. (Arun Murthy via
- cdouglas)
- HADOOP-3961. Fix task disk space requirement estimates for virtual
- input jobs. Delays limiting task placement until after 10% of the maps
- have finished. (Ari Rabkin via omalley)
- HADOOP-2168. Fix problem with C++ record reader's progress not being
- reported to framework. (acmurthy via omalley)
- HADOOP-3966. Copy findbugs generated output files to PATCH_DIR while
- running test-patch. (Ramya R via lohit)
- HADOOP-4037. Fix the eclipse plugin for versions of kfs and log4j. (nigel
- via omalley)
- HADOOP-3950. Cause the Mini MR cluster to wait for task trackers to
- register before continuing. (enis via omalley)
- HADOOP-3910. Remove unused ClusterTestDFSNamespaceLogging and
- ClusterTestDFS. (Tsz Wo (Nicholas), SZE via cdouglas)
- HADOOP-3954. Disable record skipping by default. (Sharad Agarwal via
- cdouglas)
- HADOOP-4050. Fix TestFairScheduler to use absolute paths for the work
- directory. (Matei Zaharia via omalley)
- HADOOP-4069. Keep temporary test files from TestKosmosFileSystem under
- test.build.data instead of /tmp. (lohit via omalley)
-
- HADOOP-4078. Create test files for TestKosmosFileSystem in separate
- directory under test.build.data. (lohit)
- HADOOP-3968. Fix getFileBlockLocations calls to use FileStatus instead
- of Path reflecting the new API. (Pete Wyckoff via lohit)
- HADOOP-3963. libhdfs does not exit on its own, instead it returns error
- to the caller and behaves as a true library. (Pete Wyckoff via dhruba)
- HADOOP-4100. Removes the cleanupTask scheduling from the Scheduler
- implementations and moves it to the JobTracker.
- (Amareshwari Sriramadasu via ddas)
- HADOOP-4097. Make hive work well with speculative execution turned on.
- (Joydeep Sen Sarma via dhruba)
- HADOOP-4113. Changes to libhdfs to not exit on its own, rather return
- an error code to the caller. (Pete Wyckoff via dhruba)
- HADOOP-4054. Remove duplicate lease removal during edit log loading.
- (hairong)
- HADOOP-4071. FSNameSystem.isReplicationInProgress should add an
- underReplicated block to the neededReplication queue using method
- "add" not "update". (hairong)
- HADOOP-4154. Fix type warnings in WritableUtils. (szetszwo via omalley)
- HADOOP-4133. Log files generated by Hive should reside in the
- build directory. (Prasad Chakka via dhruba)
- HADOOP-4094. Hive now has hive-default.xml and hive-site.xml similar
- to core hadoop. (Prasad Chakka via dhruba)
- HADOOP-4112. Handles cleanupTask in JobHistory
- (Amareshwari Sriramadasu via ddas)
- HADOOP-3831. Very slow reading clients sometimes failed while reading.
- (rangadi)
- HADOOP-4155. Use JobTracker's start time while initializing JobHistory's
- JobTracker Unique String. (lohit)
- HADOOP-4099. Fix null pointer when using HFTP from an 0.18 server.
- (dhruba via omalley)
- HADOOP-3570. Includes user specified libjar files in the client side
- classpath path. (Sharad Agarwal via ddas)
- HADOOP-4129. Changed memory limits of TaskTracker and Tasks to be in
- KiloBytes rather than bytes. (Vinod Kumar Vavilapalli via acmurthy)
- HADOOP-4139. Optimize Hive multi group-by.
- (Namin Jain via dhruba)
- HADOOP-3911. Add a check to fsck options to make sure -files is not
- the first option to resolve conflicts with GenericOptionsParser
- (lohit)
- HADOOP-3623. Refactor LeaseManager. (szetszwo)
- HADOOP-4125. Handles Reduce cleanup tip on the web ui.
- (Amareshwari Sriramadasu via ddas)
- HADOOP-4087. Hive Metastore API for php and python clients.
- (Prasad Chakka via dhruba)
- HADOOP-4197. Update DATA_TRANSFER_VERSION for HADOOP-3981. (szetszwo)
- HADOOP-4138. Refactor the Hive SerDe library to better structure
- the interfaces to the serializer and de-serializer.
- (Zheng Shao via dhruba)
- HADOOP-4195. Close compressor before returning to codec pool.
- (acmurthy via omalley)
- HADOOP-2403. Escapes some special characters before logging to
- history files. (Amareshwari Sriramadasu via ddas)
- HADOOP-4200. Fix a bug in the test-patch.sh script.
- (Ramya R via nigel)
- HADOOP-4084. Add explain plan capabilities to Hive Query Language.
- (Ashish Thusoo via dhruba)
- HADOOP-4121. Preserve cause for exception if the initialization of
- HistoryViewer for JobHistory fails. (Amareshwari Sri Ramadasu via
- acmurthy)
- HADOOP-4213. Fixes NPE in TestLimitTasksPerJobTaskScheduler.
- (Sreekanth Ramakrishnan via ddas)
- HADOOP-4077. Setting access and modification time for a file
- requires write permissions on the file. (dhruba)
- HADOOP-3592. Fix a couple of possible file leaks in FileUtil
- (Bill de hOra via rangadi)
- HADOOP-4120. Hive interactive shell records the time taken by a
- query. (Raghotham Murthy via dhruba)
- HADOOP-4090. The hive scripts pick up hadoop from HADOOP_HOME
- and then the path. (Raghotham Murthy via dhruba)
- HADOOP-4242. Remove extra ";" in FSDirectory that blocks compilation
- in some IDE's. (szetszwo via omalley)
- HADOOP-4249. Fix eclipse path to include the hsqldb.jar. (szetszwo via
- omalley)
- HADOOP-4247. Move InputSampler into org.apache.hadoop.mapred.lib, so that
- examples.jar doesn't depend on tools.jar. (omalley)
- HADOOP-4269. Fix the deprecation of LineReader by extending the new class
- into the old name and deprecating it. Also update the tests to test the
- new class. (cdouglas via omalley)
- HADOOP-4280. Fix conversions between seconds in C and milliseconds in
- Java for access times for files. (Pete Wyckoff via rangadi)
- HADOOP-4254. -setSpaceQuota command does not convert "TB" extenstion to
- terabytes properly. Implementation now uses StringUtils for parsing this.
- (Raghu Angadi)
- HADOOP-4259. Findbugs should run over tools.jar also. (cdouglas via
- omalley)
- HADOOP-4275. Move public method isJobValidName from JobID to a private
- method in JobTracker. (omalley)
- HADOOP-4173. fix failures in TestProcfsBasedProcessTree and
- TestTaskTrackerMemoryManager tests. ProcfsBasedProcessTree and
- memory management in TaskTracker are disabled on Windows.
- (Vinod K V via rangadi)
- HADOOP-4189. Fixes the history blocksize & intertracker protocol version
- issues introduced as part of HADOOP-3245. (Amar Kamat via ddas)
- HADOOP-4190. Fixes the backward compatibility issue with Job History.
- introduced by HADOOP-3245 and HADOOP-2403. (Amar Kamat via ddas)
- HADOOP-4237. Fixes the TestStreamingBadRecords.testNarrowDown testcase.
- (Sharad Agarwal via ddas)
- HADOOP-4274. Capacity scheduler accidently modifies the underlying
- data structures when browing the job lists. (Hemanth Yamijala via omalley)
- HADOOP-4309. Fix eclipse-plugin compilation. (cdouglas)
- HADOOP-4232. Fix race condition in JVM reuse when multiple slots become
- free. (ddas via acmurthy)
- HADOOP-4302. Fix a race condition in TestReduceFetch that can yield false
- negatvies. (cdouglas)
- HADOOP-3942. Update distcp documentation to include features introduced in
- HADOOP-3873, HADOOP-3939. (Tsz Wo (Nicholas), SZE via cdouglas)
- HADOOP-4319. fuse-dfs dfs_read function returns as many bytes as it is
- told to read unlesss end-of-file is reached. (Pete Wyckoff via dhruba)
- HADOOP-4246. Ensure we have the correct lower bound on the number of
- retries for fetching map-outputs; also fixed the case where the reducer
- automatically kills on too many unique map-outputs could not be fetched
- for small jobs. (Amareshwari Sri Ramadasu via acmurthy)
- HADOOP-4163. Report FSErrors from map output fetch threads instead of
- merely logging them. (Sharad Agarwal via cdouglas)
- HADOOP-4261. Adds a setup task for jobs. This is required so that we
- don't setup jobs that haven't been inited yet (since init could lead
- to job failure). Only after the init has successfully happened do we
- launch the setupJob task. (Amareshwari Sriramadasu via ddas)
- HADOOP-4256. Removes Completed and Failed Job tables from
- jobqueue_details.jsp. (Sreekanth Ramakrishnan via ddas)
- HADOOP-4267. Occasional exceptions during shutting down HSQLDB is logged
- but not rethrown. (enis)
- HADOOP-4018. The number of tasks for a single job cannot exceed a
- pre-configured maximum value. (dhruba)
- HADOOP-4288. Fixes a NPE problem in CapacityScheduler.
- (Amar Kamat via ddas)
- HADOOP-4014. Create hard links with 'fsutil hardlink' on Windows. (shv)
- HADOOP-4393. Merged org.apache.hadoop.fs.permission.AccessControlException
- and org.apache.hadoop.security.AccessControlIOException into a single
- class hadoop.security.AccessControlException. (omalley via acmurthy)
- HADOOP-4287. Fixes an issue to do with maintaining counts of running/pending
- maps/reduces. (Sreekanth Ramakrishnan via ddas)
- HADOOP-4361. Makes sure that jobs killed from command line are killed
- fast (i.e., there is a slot to run the cleanup task soon).
- (Amareshwari Sriramadasu via ddas)
- HADOOP-4400. Add "hdfs://" to fs.default.name on quickstart.html.
- (Jeff Hammerbacher via omalley)
- HADOOP-4378. Fix TestJobQueueInformation to use SleepJob rather than
- WordCount via TestMiniMRWithDFS. (Sreekanth Ramakrishnan via acmurthy)
- HADOOP-4376. Fix formatting in hadoop-default.xml for
- hadoop.http.filter.initializers. (Enis Soztutar via acmurthy)
- HADOOP-4410. Adds an extra arg to the API FileUtil.makeShellPath to
- determine whether to canonicalize file paths or not.
- (Amareshwari Sriramadasu via ddas)
- HADOOP-4236. Ensure un-initialized jobs are killed correctly on
- user-demand. (Sharad Agarwal via acmurthy)
- HADOOP-4373. Fix calculation of Guaranteed Capacity for the
- capacity-scheduler. (Hemanth Yamijala via acmurthy)
- HADOOP-4053. Schedulers must be notified when jobs complete. (Amar Kamat via omalley)
- HADOOP-4335. Fix FsShell -ls for filesystems without owners/groups. (David
- Phillips via cdouglas)
- HADOOP-4426. TestCapacityScheduler broke due to the two commits HADOOP-4053
- and HADOOP-4373. This patch fixes that. (Hemanth Yamijala via ddas)
- HADOOP-4418. Updates documentation in forrest for Mapred, streaming and pipes.
- (Amareshwari Sriramadasu via ddas)
- HADOOP-3155. Ensure that there is only one thread fetching
- TaskCompletionEvents on TaskTracker re-init. (Dhruba Borthakur via
- acmurthy)
- HADOOP-4425. Fix EditLogInputStream to overload the bulk read method.
- (cdouglas)
- HADOOP-4427. Adds the new queue/job commands to the manual.
- (Sreekanth Ramakrishnan via ddas)
- HADOOP-4278. Increase debug logging for unit test TestDatanodeDeath.
- Fix the case when primary is dead. (dhruba via szetszwo)
- HADOOP-4423. Keep block length when the block recovery is triggered by
- append. (szetszwo)
- HADOOP-4449. Fix dfsadmin usage. (Raghu Angadi via cdouglas)
- HADOOP-4455. Added TestSerDe so that unit tests can run successfully.
- (Ashish Thusoo via dhruba)
- HADOOP-4457. Fixes an input split logging problem introduced by
- HADOOP-3245. (Amareshwari Sriramadasu via ddas)
- HADOOP-4464. Separate out TestFileCreationClient from TestFileCreation.
- (Tsz Wo (Nicholas), SZE via cdouglas)
- HADOOP-4404. saveFSImage() removes files from a storage directory that do
- not correspond to its type. (shv)
- HADOOP-4149. Fix handling of updates to the job priority, by changing the
- list of jobs to be keyed by the priority, submit time, and job tracker id.
- (Amar Kamat via omalley)
- HADOOP-4296. Fix job client failures by not retiring a job as soon as it
- is finished. (dhruba)
- HADOOP-4439. Remove configuration variables that aren't usable yet, in
- particular mapred.tasktracker.tasks.maxmemory and mapred.task.max.memory.
- (Hemanth Yamijala via omalley)
- HADOOP-4230. Fix for serde2 interface, limit operator, select * operator,
- UDF trim functions and sampling. (Ashish Thusoo via dhruba)
- HADOOP-4358. No need to truncate access time in INode. Also fixes NPE
- in CreateEditsLog. (Raghu Angadi)
- HADOOP-4387. TestHDFSFileSystemContract fails on windows nightly builds.
- (Raghu Angadi)
- HADOOP-4466. Ensure that SequenceFileOutputFormat isn't tied to Writables
- and can be used with other Serialization frameworks. (Chris Wensel via
- acmurthy)
- HADOOP-4525. Fix ipc.server.ipcnodelay originally missed in in HADOOP-2232.
- (cdouglas via Clint Morgan)
- HADOOP-4498. Ensure that JobHistory correctly escapes the job name so that
- regex patterns work. (Chris Wensel via acmurthy)
- HADOOP-4446. Modify guaranteed capacity labels in capacity scheduler's UI
- to reflect the information being displayed. (Sreekanth Ramakrishnan via
- yhemanth)
- HADOOP-4282. Some user facing URLs are not filtered by user filters.
- (szetszwo)
- HADOOP-4595. Fixes two race conditions - one to do with updating free slot count,
- and another to do with starting the MapEventsFetcher thread. (ddas)
- HADOOP-4552. Fix a deadlock in RPC server. (Raghu Angadi)
- HADOOP-4471. Sort running jobs by priority in the capacity scheduler.
- (Amar Kamat via yhemanth)
- HADOOP-4500. Fix MultiFileSplit to get the FileSystem from the relevant
- path rather than the JobClient. (Joydeep Sen Sarma via cdouglas)
- Release 0.18.4 - Unreleased
- BUG FIXES
- HADOOP-5114. Remove timeout for accept() in DataNode. This makes accept()
- fail in JDK on Windows and causes many tests to fail. (Raghu Angadi)
- HADOOP-5192. Block receiver should not remove a block that's created or
- being written by other threads. (hairong)
-
- HADOOP-5134. FSNamesystem#commitBlockSynchronization adds under-construction
- block locations to blocksMap. (Dhruba Borthakur via hairong)
- HADOOP-5412. Simulated DataNode should not write to a block that's being
- written by another thread. (hairong)
- HADOOP-5465. Fix the problem of blocks remaining under-replicated by
- providing synchronized modification to the counter xmitsInProgress in
- DataNode. (hairong)
- HADOOP-5557. Fixes some minor problems in TestOverReplicatedBlocks.
- (szetszwo)
- Release 0.18.3 - 2009-01-27
- IMPROVEMENTS
- HADOOP-4150. Include librecordio in hadoop releases. (Giridharan Kesavan
- via acmurthy)
- HADOOP-4668. Improve documentation for setCombinerClass to clarify the
- restrictions on combiners. (omalley)
- BUG FIXES
- HADOOP-4499. DFSClient should invoke checksumOk only once. (Raghu Angadi)
- HADOOP-4597. Calculate mis-replicated blocks when safe-mode is turned
- off manually. (shv)
- HADOOP-3121. lsr should keep listing the remaining items but not
- terminate if there is any IOException. (szetszwo)
- HADOOP-4610. Always calculate mis-replicated blocks when safe-mode is
- turned off. (shv)
- HADOOP-3883. Limit namenode to assign at most one generation stamp for
- a particular block within a short period. (szetszwo)
- HADOOP-4556. Block went missing. (hairong)
- HADOOP-4643. NameNode should exclude excessive replicas when counting
- live replicas for a block. (hairong)
- HADOOP-4703. Should not wait for proxy forever in lease recovering.
- (szetszwo)
- HADOOP-4647. NamenodeFsck should close the DFSClient it has created.
- (szetszwo)
- HADOOP-4616. Fuse-dfs can handle bad values from FileSystem.read call.
- (Pete Wyckoff via dhruba)
- HADOOP-4061. Throttle Datanode decommission monitoring in Namenode.
- (szetszwo)
- HADOOP-4659. Root cause of connection failure is being lost to code that
- uses it for delaying startup. (Steve Loughran and Hairong via hairong)
- HADOOP-4614. Lazily open segments when merging map spills to avoid using
- too many file descriptors. (Yuri Pradkin via cdouglas)
- HADOOP-4257. The DFS client should pick only one datanode as the candidate
- to initiate lease recovery. (Tsz Wo (Nicholas), SZE via dhruba)
- HADOOP-4713. Fix librecordio to handle records larger than 64k. (Christian
- Kunz via cdouglas)
- HADOOP-4635. Fix a memory leak in fuse dfs. (pete wyckoff via mahadev)
- HADOOP-4714. Report status between merges and make the number of records
- between progress reports configurable. (Jothi Padmanabhan via cdouglas)
- HADOOP-4726. Fix documentation typos "the the". (Edward J. Yoon via
- szetszwo)
- HADOOP-4679. Datanode prints tons of log messages: waiting for threadgroup
- to exit, active threads is XX. (hairong)
- HADOOP-4746. Job output directory should be normalized. (hairong)
- HADOOP-4717. Removal of default port# in NameNode.getUri() causes a
- map/reduce job failed to prompt temporary output. (hairong)
- HADOOP-4778. Check for zero size block meta file when updating a block.
- (szetszwo)
- HADOOP-4742. Replica gets deleted by mistake. (Wang Xu via hairong)
- HADOOP-4702. Failed block replication leaves an incomplete block in
- receiver's tmp data directory. (hairong)
- HADOOP-4613. Fix block browsing on Web UI. (Johan Oskarsson via shv)
- HADOOP-4806. HDFS rename should not use src path as a regular expression.
- (szetszwo)
- HADOOP-4795. Prevent lease monitor getting into an infinite loop when
- leases and the namespace tree does not match. (szetszwo)
- HADOOP-4620. Fixes Streaming to handle well the cases of map/reduce with empty
- input/output. (Ravi Gummadi via ddas)
- HADOOP-4857. Fixes TestUlimit to have exactly 1 map in the jobs spawned.
- (Ravi Gummadi via ddas)
- HADOOP-4810. Data lost at cluster startup time. (hairong)
- HADOOP-4797. Improve how RPC server reads and writes large buffers. Avoids
- soft-leak of direct buffers and excess copies in NIO layer. (Raghu Angadi)
- HADOOP-4840. TestNodeCount sometimes fails with NullPointerException.
- (hairong)
- HADOOP-4904. Fix deadlock while leaving safe mode. (shv)
- HADOOP-1980. 'dfsadmin -safemode enter' should prevent the namenode from
- leaving safemode automatically. (shv)
- HADOOP-4951. Lease monitor should acquire the LeaseManager lock but not the
- Monitor lock. (szetszwo)
- HADOOP-4935. processMisReplicatedBlocks() should not clear
- excessReplicateMap. (shv)
- HADOOP-4961. Fix ConcurrentModificationException in lease recovery
- of empty files. (shv)
- HADOOP-4971. A long (unexpected) delay at datanodes could make subsequent
- block reports from many datanode at the same time. (Raghu Angadi)
-
- HADOOP-4910. NameNode should exclude replicas when choosing excessive
- replicas to delete to avoid data lose. (hairong)
- HADOOP-4983. Fixes a problem in updating Counters in the status reporting.
- (Amareshwari Sriramadasu via ddas)
- Release 0.18.2 - 2008-11-03
- BUG FIXES
- HADOOP-3614. Fix a bug that Datanode may use an old GenerationStamp to get
- meta file. (szetszwo)
- HADOOP-4314. Simulated datanodes should not include blocks that are still
- being written in their block report. (Raghu Angadi)
- HADOOP-4228. dfs datanode metrics, bytes_read and bytes_written, overflow
- due to incorrect type used. (hairong)
- HADOOP-4395. The FSEditLog loading is incorrect for the case OP_SET_OWNER.
- (szetszwo)
- HADOOP-4351. FSNamesystem.getBlockLocationsInternal throws
- ArrayIndexOutOfBoundsException. (hairong)
- HADOOP-4403. Make TestLeaseRecovery and TestFileCreation more robust.
- (szetszwo)
- HADOOP-4292. Do not support append() for LocalFileSystem. (hairong)
- HADOOP-4399. Make fuse-dfs multi-thread access safe.
- (Pete Wyckoff via dhruba)
- HADOOP-4369. Use setMetric(...) instead of incrMetric(...) for metrics
- averages. (Brian Bockelman via szetszwo)
- HADOOP-4469. Rename and add the ant task jar file to the tar file. (nigel)
- HADOOP-3914. DFSClient sends Checksum Ok only once for a block.
- (Christian Kunz via hairong)
-
- HADOOP-4467. SerializationFactory now uses the current context ClassLoader
- allowing for user supplied Serialization instances. (Chris Wensel via
- acmurthy)
- HADOOP-4517. Release FSDataset lock before joining ongoing create threads.
- (szetszwo)
-
- HADOOP-4526. fsck failing with NullPointerException. (hairong)
- HADOOP-4483 Honor the max parameter in DatanodeDescriptor.getBlockArray(..)
- (Ahad Rana and Hairong Kuang via szetszwo)
- HADOOP-4340. Correctly set the exit code from JobShell.main so that the
- 'hadoop jar' command returns the right code to the user. (acmurthy)
- NEW FEATURES
- HADOOP-2421. Add jdiff output to documentation, listing all API
- changes from the prior release. (cutting)
- Release 0.18.1 - 2008-09-17
- IMPROVEMENTS
- HADOOP-3934. Upgrade log4j to 1.2.15. (omalley)
- BUG FIXES
- HADOOP-3995. In case of quota failure on HDFS, rename does not restore
- source filename. (rangadi)
- HADOOP-3821. Prevent SequenceFile and IFile from duplicating codecs in
- CodecPool when closed more than once. (Arun Murthy via cdouglas)
- HADOOP-4040. Remove coded default of the IPC idle connection timeout
- from the TaskTracker, which was causing HDFS client connections to not be
- collected. (ddas via omalley)
- HADOOP-4046. Made WritableComparable's constructor protected instead of
- private to re-enable class derivation. (cdouglas via omalley)
- HADOOP-3940. Fix in-memory merge condition to wait when there are no map
- outputs or when the final map outputs are being fetched without contention.
- (cdouglas)
- Release 0.18.0 - 2008-08-19
- INCOMPATIBLE CHANGES
- HADOOP-2703. The default options to fsck skips checking files
- that are being written to. The output of fsck is incompatible
- with previous release. (lohit vijayarenu via dhruba)
- HADOOP-2865. FsShell.ls() printout format changed to print file names
- in the end of the line. (Edward J. Yoon via shv)
- HADOOP-3283. The Datanode has a RPC server. It currently supports
- two RPCs: the first RPC retrives the metadata about a block and the
- second RPC sets the generation stamp of an existing block.
- (Tsz Wo (Nicholas), SZE via dhruba)
- HADOOP-2797. Code related to upgrading to 0.14 (Block CRCs) is
- removed. As result, upgrade to 0.18 or later from 0.13 or earlier
- is not supported. If upgrading from 0.13 or earlier is required,
- please upgrade to an intermediate version (0.14-0.17) and then
- to this version. (rangadi)
- HADOOP-544. This issue introduces new classes JobID, TaskID and
- TaskAttemptID, which should be used instead of their string counterparts.
- Functions in JobClient, TaskReport, RunningJob, jobcontrol.Job and
- TaskCompletionEvent that use string arguments are deprecated in favor
- of the corresponding ones that use ID objects. Applications can use
- xxxID.toString() and xxxID.forName() methods to convert/restore objects
- to/from strings. (Enis Soztutar via ddas)
- HADOOP-2188. RPC client sends a ping rather than throw timeouts.
- RPC server does not throw away old RPCs. If clients and the server are on
- different versions, they are not able to function well. In addition,
- The property ipc.client.timeout is removed from the default hadoop
- configuration. It also removes metrics RpcOpsDiscardedOPsNum. (hairong)
- HADOOP-2181. This issue adds logging for input splits in Jobtracker log
- and jobHistory log. Also adds web UI for viewing input splits in job UI
- and history UI. (Amareshwari Sriramadasu via ddas)
- HADOOP-3226. Run combiners multiple times over map outputs as they
- are merged in both the map and the reduce tasks. (cdouglas via omalley)
- HADOOP-3329. DatanodeDescriptor objects should not be stored in the
- fsimage. (dhruba)
- HADOOP-2656. The Block object has a generation stamp inside it.
- Existing blocks get a generation stamp of 0. This is needed to support
- appends. (dhruba)
- HADOOP-3390. Removed deprecated ClientProtocol.abandonFileInProgress().
- (Tsz Wo (Nicholas), SZE via rangadi)
- HADOOP-3405. Made some map/reduce internal classes non-public:
- MapTaskStatus, ReduceTaskStatus, JobSubmissionProtocol,
- CompletedJobStatusStore. (enis via omaley)
- HADOOP-3265. Removed depcrecated API getFileCacheHints().
- (Lohit Vijayarenu via rangadi)
- HADOOP-3310. The namenode instructs the primary datanode to do lease
- recovery. The block gets a new generation stamp.
- (Tsz Wo (Nicholas), SZE via dhruba)
- HADOOP-2909. Improve IPC idle connection management. Property
- ipc.client.maxidletime is removed from the default configuration,
- instead it is defined as twice of the ipc.client.connection.maxidletime.
- A connection with outstanding requests won't be treated as idle.
- (hairong)
- HADOOP-3459. Change in the output format of dfs -ls to more closely match
- /bin/ls. New format is: perm repl owner group size date name
- (Mukund Madhugiri via omally)
- HADOOP-3113. An fsync invoked on a HDFS file really really
- persists data! The datanode moves blocks in the tmp directory to
- the real block directory on a datanode-restart. (dhruba)
- HADOOP-3452. Change fsck to return non-zero status for a corrupt
- FileSystem. (lohit vijayarenu via cdouglas)
- HADOOP-3193. Include the address of the client that found the corrupted
- block in the log. Also include a CorruptedBlocks metric to track the size
- of the corrupted block map. (cdouglas)
- HADOOP-3512. Separate out the tools into a tools jar. (omalley)
- HADOOP-3598. Ensure that temporary task-output directories are not created
- if they are not necessary e.g. for Maps with no side-effect files.
- (acmurthy)
- HADOOP-3665. Modify WritableComparator so that it only creates instances
- of the keytype if the type does not define a WritableComparator. Calling
- the superclass compare will throw a NullPointerException. Also define
- a RawComparator for NullWritable and permit it to be written as a key
- to SequenceFiles. (cdouglas)
- HADOOP-3673. Avoid deadlock caused by DataNode RPC receoverBlock().
- (Tsz Wo (Nicholas), SZE via rangadi)
- NEW FEATURES
- HADOOP-3074. Provides a UrlStreamHandler for DFS and other FS,
- relying on FileSystem (taton)
- HADOOP-2585. Name-node imports namespace data from a recent checkpoint
- accessible via a NFS mount. (shv)
- HADOOP-3061. Writable types for doubles and bytes. (Andrzej
- Bialecki via omalley)
- HADOOP-2857. Allow libhdfs to set jvm options. (Craig Macdonald
- via omalley)
- HADOOP-3317. Add default port for HDFS namenode. The port in
- "hdfs:" URIs now defaults to 8020, so that one may simply use URIs
- of the form "hdfs://example.com/dir/file". (cutting)
- HADOOP-2019. Adds support for .tar, .tgz and .tar.gz files in
- DistributedCache (Amareshwari Sriramadasu via ddas)
- HADOOP-3058. Add FSNamesystem status metrics.
- (Lohit Vjayarenu via rangadi)
- HADOOP-1915. Allow users to specify counters via strings instead
- of enumerations. (tomwhite via omalley)
- HADOOP-2065. Delay invalidating corrupt replicas of block until its
- is removed from under replicated state. If all replicas are found to
- be corrupt, retain all copies and mark the block as corrupt.
- (Lohit Vjayarenu via rangadi)
- HADOOP-3221. Adds org.apache.hadoop.mapred.lib.NLineInputFormat, which
- splits files into splits each of N lines. N can be specified by
- configuration property "mapred.line.input.format.linespermap", which
- defaults to 1. (Amareshwari Sriramadasu via ddas)
- HADOOP-3336. Direct a subset of annotated FSNamesystem calls for audit
- logging. (cdouglas)
- HADOOP-3400. A new API FileSystem.deleteOnExit() that facilitates
- handling of temporary files in HDFS. (dhruba)
- HADOOP-4. Add fuse-dfs to contrib, permitting one to mount an
- HDFS filesystem on systems that support FUSE, e.g., Linux.
- (Pete Wyckoff via cutting)
- HADOOP-3246. Add FTPFileSystem. (Ankur Goel via cutting)
- HADOOP-3250. Extend FileSystem API to allow appending to files.
- (Tsz Wo (Nicholas), SZE via cdouglas)
- HADOOP-3177. Implement Syncable interface for FileSystem.
- (Tsz Wo (Nicholas), SZE via dhruba)
- HADOOP-1328. Implement user counters in streaming. (tomwhite via
- omalley)
- HADOOP-3187. Quotas for namespace management. (Hairong Kuang via ddas)
- HADOOP-3307. Support for Archives in Hadoop. (Mahadev Konar via ddas)
- HADOOP-3460. Add SequenceFileAsBinaryOutputFormat to permit direct
- writes of serialized data. (Koji Noguchi via cdouglas)
- HADOOP-3230. Add ability to get counter values from command
- line. (tomwhite via omalley)
- HADOOP-930. Add support for native S3 files. (tomwhite via cutting)
- HADOOP-3502. Quota API needs documentation in Forrest. (hairong)
- HADOOP-3413. Allow SequenceFile.Reader to use serialization
- framework. (tomwhite via omalley)
- HADOOP-3541. Import of the namespace from a checkpoint documented
- in hadoop user guide. (shv)
- IMPROVEMENTS
- HADOOP-3677. Simplify generation stamp upgrade by making is a
- local upgrade on datandodes. Deleted distributed upgrade.
- (rangadi)
-
- HADOOP-2928. Remove deprecated FileSystem.getContentLength().
- (Lohit Vijayarenu via rangadi)
- HADOOP-3130. Make the connect timeout smaller for getFile.
- (Amar Ramesh Kamat via ddas)
- HADOOP-3160. Remove deprecated exists() from ClientProtocol and
- FSNamesystem (Lohit Vjayarenu via rangadi)
- HADOOP-2910. Throttle IPC Clients during bursts of requests or
- server slowdown. Clients retry connection for up to 15 minutes
- when socket connection times out. (hairong)
- HADOOP-3295. Allow TextOutputFormat to use configurable spearators.
- (Zheng Shao via cdouglas).
- HADOOP-3308. Improve QuickSort by excluding values eq the pivot from the
- partition. (cdouglas)
- HADOOP-2461. Trim property names in configuration.
- (Tsz Wo (Nicholas), SZE via shv)
- HADOOP-2799. Deprecate o.a.h.io.Closable in favor of java.io.Closable.
- (Tsz Wo (Nicholas), SZE via cdouglas)
- HADOOP-3345. Enhance the hudson-test-patch target to cleanup messages,
- fix minor defects, and add eclipse plugin and python unit tests. (nigel)
- HADOOP-3144. Improve robustness of LineRecordReader by defining a maximum
- line length (mapred.linerecordreader.maxlength), thereby avoiding reading
- too far into the following split. (Zheng Shao via cdouglas)
- HADOOP-3334. Move lease handling from FSNamesystem into a seperate class.
- (Tsz Wo (Nicholas), SZE via rangadi)
- HADOOP-3332. Reduces the amount of logging in Reducer's shuffle phase.
- (Devaraj Das)
- HADOOP-3355. Enhances Configuration class to accept hex numbers for getInt
- and getLong. (Amareshwari Sriramadasu via ddas)
- HADOOP-3350. Add an argument to distcp to permit the user to limit the
- number of maps. (cdouglas)
- HADOOP-3013. Add corrupt block reporting to fsck.
- (lohit vijayarenu via cdouglas)
- HADOOP-3377. Remove TaskRunner::replaceAll and replace with equivalent
- String::replace. (Brice Arnould via cdouglas)
- HADOOP-3398. Minor improvement to a utility function in that participates
- in backoff calculation. (cdouglas)
- HADOOP-3381. Clear referenced when directories are deleted so that
- effect of memory leaks are not multiplied. (rangadi)
- HADOOP-2867. Adds the task's CWD to its LD_LIBRARY_PATH.
- (Amareshwari Sriramadasu via ddas)
- HADOOP-3232. DU class runs the 'du' command in a seperate thread so
- that it does not block user. DataNode misses heartbeats in large
- nodes otherwise. (Johan Oskarsson via rangadi)
- HADOOP-3035. During block transfers between datanodes, the receiving
- datanode, now can report corrupt replicas received from src node to
- the namenode. (Lohit Vijayarenu via rangadi)
- HADOOP-3434. Retain the cause of the bind failure in Server::bind.
- (Steve Loughran via cdouglas)
- HADOOP-3429. Increases the size of the buffers used for the communication
- for Streaming jobs. (Amareshwari Sriramadasu via ddas)
- HADOOP-3486. Change default for initial block report to 0 seconds
- and document it. (Sanjay Radia via omalley)
- HADOOP-3448. Improve the text in the assertion making sure the
- layout versions are consistent in the data node. (Steve Loughran
- via omalley)
- HADOOP-2095. Improve the Map-Reduce shuffle/merge by cutting down
- buffer-copies; changed intermediate sort/merge to use the new IFile format
- rather than SequenceFiles and compression of map-outputs is now
- implemented by compressing the entire file rather than SequenceFile
- compression. Shuffle also has been changed to use a simple byte-buffer
- manager rather than the InMemoryFileSystem.
- Configuration changes to hadoop-default.xml:
- deprecated mapred.map.output.compression.type
- (acmurthy)
- HADOOP-236. JobTacker now refuses connection from a task tracker with a
- different version number. (Sharad Agarwal via ddas)
- HADOOP-3427. Improves the shuffle scheduler. It now waits for notifications
- from shuffle threads when it has scheduled enough, before scheduling more.
- (ddas)
- HADOOP-2393. Moves the handling of dir deletions in the tasktracker to
- a separate thread. (Amareshwari Sriramadasu via ddas)
- HADOOP-3501. Deprecate InMemoryFileSystem. (cutting via omalley)
- HADOOP-3366. Stall the shuffle while in-memory merge is in progress.
- (acmurthy)
- HADOOP-2916. Refactor src structure, but leave package structure alone.
- (Raghu Angadi via mukund)
- HADOOP-3492. Add forrest documentation for user archives.
- (Mahadev Konar via hairong)
- HADOOP-3467. Improve documentation for FileSystem::deleteOnExit.
- (Tsz Wo (Nicholas), SZE via cdouglas)
- HADOOP-3379. Documents stream.non.zero.exit.status.is.failure for Streaming.
- (Amareshwari Sriramadasu via ddas)
- HADOOP-3096. Improves documentation about the Task Execution Environment in
- the Map-Reduce tutorial. (Amareshwari Sriramadasu via ddas)
- HADOOP-2984. Add forrest documentation for DistCp. (cdouglas)
- HADOOP-3406. Add forrest documentation for Profiling.
- (Amareshwari Sriramadasu via ddas)
- HADOOP-2762. Add forrest documentation for controls of memory limits on
- hadoop daemons and Map-Reduce tasks. (Amareshwari Sriramadasu via ddas)
- HADOOP-3535. Fix documentation and name of IOUtils.close to
- reflect that it should only be used in cleanup contexts. (omalley)
- HADOOP-3593. Updates the mapred tutorial. (ddas)
- HADOOP-3547. Documents the way in which native libraries can be distributed
- via the DistributedCache. (Amareshwari Sriramadasu via ddas)
- HADOOP-3606. Updates the Streaming doc. (Amareshwari Sriramadasu via ddas)
- HADOOP-3532. Add jdiff reports to the build scripts. (omalley)
- HADOOP-3100. Develop tests to test the DFS command line interface. (mukund)
- HADOOP-3688. Fix up HDFS docs. (Robert Chansler via hairong)
- OPTIMIZATIONS
- HADOOP-3274. The default constructor of BytesWritable creates empty
- byte array. (Tsz Wo (Nicholas), SZE via shv)
- HADOOP-3272. Remove redundant copy of Block object in BlocksMap.
- (Lohit Vjayarenu via shv)
- HADOOP-3164. Reduce DataNode CPU usage by using FileChannel.tranferTo().
- On Linux DataNode takes 5 times less CPU while serving data. Results may
- vary on other platforms. (rangadi)
- HADOOP-3248. Optimization of saveFSImage. (Dhruba via shv)
- HADOOP-3297. Fetch more task completion events from the job
- tracker and task tracker. (ddas via omalley)
- HADOOP-3364. Faster image and log edits loading. (shv)
- HADOOP-3369. Fast block processing during name-node startup. (shv)
- HADOOP-1702. Reduce buffer copies when data is written to DFS.
- DataNodes take 30% less CPU while writing data. (rangadi)
- HADOOP-3095. Speed up split generation in the FileInputSplit,
- especially for non-HDFS file systems. Deprecates
- InputFormat.validateInput. (tomwhite via omalley)
- HADOOP-3552. Add forrest documentation for Hadoop commands.
- (Sharad Agarwal via cdouglas)
- BUG FIXES
- HADOOP-2905. 'fsck -move' triggers NPE in NameNode.
- (Lohit Vjayarenu via rangadi)
- Increment ClientProtocol.versionID missed by HADOOP-2585. (shv)
- HADOOP-3254. Restructure internal namenode methods that process
- heartbeats to use well-defined BlockCommand object(s) instead of
- using the base java Object. (Tsz Wo (Nicholas), SZE via dhruba)
- HADOOP-3176. Change lease record when a open-for-write-file
- gets renamed. (dhruba)
- HADOOP-3269. Fix a case when namenode fails to restart
- while processing a lease record. ((Tsz Wo (Nicholas), SZE via dhruba)
- HADOOP-3282. Port issues in TestCheckpoint resolved. (shv)
- HADOOP-3268. file:// URLs issue in TestUrlStreamHandler under Windows.
- (taton)
- HADOOP-3127. Deleting files in trash should really remove them.
- (Brice Arnould via omalley)
- HADOOP-3300. Fix locking of explicit locks in NetworkTopology.
- (tomwhite via omalley)
- HADOOP-3270. Constant DatanodeCommands are stored in static final
- immutable variables for better code clarity.
- (Tsz Wo (Nicholas), SZE via dhruba)
- HADOOP-2793. Fix broken links for worst performing shuffle tasks in
- the job history page. (Amareshwari Sriramadasu via ddas)
- HADOOP-3313. Avoid unnecessary calls to System.currentTimeMillis
- in RPC::Invoker. (cdouglas)
- HADOOP-3318. Recognize "Darwin" as an alias for "Mac OS X" to
- support Soylatte. (Sam Pullara via omalley)
- HADOOP-3301. Fix misleading error message when S3 URI hostname
- contains an underscore. (tomwhite via omalley)
- HADOOP-3338. Fix Eclipse plugin to compile after HADOOP-544 was
- committed. Updated all references to use the new JobID representation.
- (taton via nigel)
- HADOOP-3337. Loading FSEditLog was broken by HADOOP-3283 since it
- changed Writable serialization of DatanodeInfo. This patch handles it.
- (Tsz Wo (Nicholas), SZE via rangadi)
- HADOOP-3101. Prevent JobClient from throwing an exception when printing
- usage. (Edward J. Yoon via cdouglas)
- HADOOP-3119. Update javadoc for Text::getBytes to better describe its
- behavior. (Tim Nelson via cdouglas)
- HADOOP-2294. Fix documentation in libhdfs to refer to the correct free
- function. (Craig Macdonald via cdouglas)
- HADOOP-3335. Prevent the libhdfs build from deleting the wrong
- files on make clean. (cutting via omalley)
- HADOOP-2930. Make {start,stop}-balancer.sh work even if hadoop-daemon.sh
- is not in the PATH. (Spiros Papadimitriou via hairong)
- HADOOP-3085. Catch Exception in metrics util classes to ensure that
- misconfigured metrics don't prevent others from updating. (cdouglas)
- HADOOP-3299. CompositeInputFormat should configure the sub-input
- formats. (cdouglas via omalley)
- HADOOP-3309. Lower io.sort.mb and fs.inmemory.size.mb for MiniMRDFSSort
- unit test so it passes on Windows. (lohit vijayarenu via cdouglas)
- HADOOP-3348. TestUrlStreamHandler should set URLStreamFactory after
- DataNodes are initialized. (Lohit Vijayarenu via rangadi)
- HADOOP-3371. Ignore InstanceAlreadyExistsException from
- MBeanUtil::registerMBean. (lohit vijayarenu via cdouglas)
- HADOOP-3349. A file rename was incorrectly changing the name inside a
- lease record. (Tsz Wo (Nicholas), SZE via dhruba)
- HADOOP-3365. Removes an unnecessary copy of the key from SegmentDescriptor
- to MergeQueue. (Devaraj Das)
- HADOOP-3388. Fix for TestDatanodeBlockScanner to handle blocks with
- generation stamps in them. (dhruba)
- HADOOP-3203. Fixes TaskTracker::localizeJob to pass correct file sizes
- for the jarfile and the jobfile. (Amareshwari Sriramadasu via ddas)
- HADOOP-3391. Fix a findbugs warning introduced by HADOOP-3248 (rangadi)
- HADOOP-3393. Fix datanode shutdown to call DataBlockScanner::shutdown and
- close its log, even if the scanner thread is not running. (lohit vijayarenu
- via cdouglas)
- HADOOP-3399. A debug message was logged at info level. (rangadi)
- HADOOP-3396. TestDatanodeBlockScanner occationally fails.
- (Lohit Vijayarenu via rangadi)
- HADOOP-3339. Some of the failures on 3rd datanode in DFS write pipelie
- are not detected properly. This could lead to hard failure of client's
- write operation. (rangadi)
- HADOOP-3409. Namenode should save the root inode into fsimage. (hairong)
- HADOOP-3296. Fix task cache to work for more than two levels in the cache
- hierarchy. This also adds a new counter to track cache hits at levels
- greater than two. (Amar Kamat via cdouglas)
- HADOOP-3375. Lease paths were sometimes not removed from
- LeaseManager.sortedLeasesByPath. (Tsz Wo (Nicholas), SZE via dhruba)
- HADOOP-3424. Values returned by getPartition should be checked to
- make sure they are in the range 0 to #reduces - 1 (cdouglas via
- omalley)
- HADOOP-3408. Change FSNamesystem to send its metrics as integers to
- accommodate collectors that don't support long values. (lohit vijayarenu
- via cdouglas)
- HADOOP-3403. Fixes a problem in the JobTracker to do with handling of lost
- tasktrackers. (Arun Murthy via ddas)
- HADOOP-1318. Completed maps are not failed if the number of reducers are
- zero. (Amareshwari Sriramadasu via ddas).
- HADOOP-3351. Fixes the history viewer tool to not do huge StringBuffer
- allocations. (Amareshwari Sriramadasu via ddas)
- HADOOP-3419. Fixes TestFsck to wait for updates to happen before
- checking results to make the test more reliable. (Lohit Vijaya
- Renu via omalley)
- HADOOP-3259. Makes failure to read system properties due to a
- security manager non-fatal. (Edward Yoon via omalley)
- HADOOP-3451. Update libhdfs to use FileSystem::getFileBlockLocations
- instead of removed getFileCacheHints. (lohit vijayarenu via cdouglas)
- HADOOP-3401. Update FileBench to set the new
- "mapred.work.output.dir" property to work post-3041. (cdouglas via omalley)
- HADOOP-2669. DFSClient locks pendingCreates appropriately. (dhruba)
-
- HADOOP-3410. Fix KFS implemenation to return correct file
- modification time. (Sriram Rao via cutting)
- HADOOP-3340. Fix DFS metrics for BlocksReplicated, HeartbeatsNum, and
- BlockReportsAverageTime. (lohit vijayarenu via cdouglas)
- HADOOP-3435. Remove the assuption in the scripts that bash is at
- /bin/bash and fix the test patch to require bash instead of sh.
- (Brice Arnould via omalley)
- HADOOP-3471. Fix spurious errors from TestIndexedSort and add additional
- logging to let failures be reproducible. (cdouglas)