iterator: RecordReaderIterator[Text]
HadoopFileLinesReader
HadoopFileLinesReader is a Scala Iterator of Apache Hadoop’s org.apache.hadoop.io.Text.
HadoopFileLinesReader is created to access datasets in the following data sources:
-
SimpleTextSource -
LibSVMFileFormat -
TextInputCSVDataSource -
TextInputJsonDataSource
HadoopFileLinesReader uses the internal iterator that handles accessing files using Hadoop’s FileSystem API.
iterator Internal Property
When created, HadoopFileLinesReader creates an internal iterator that uses Hadoop’s org.apache.hadoop.mapreduce.lib.input.FileSplit with Hadoop’s org.apache.hadoop.fs.Path and file.
iterator creates Hadoop’s TaskAttemptID, TaskAttemptContextImpl and LineRecordReader.
iterator initializes LineRecordReader and passes it on to a RecordReaderIterator.
|
Note
|
iterator is used for Iterator-specific methods, i.e. hasNext, next and close.
|