100G的文件如何读取续集 - 第307篇

文章目录

一、大文件读取之文件分割法

二、大文件读取之多线程读取

三、悟纤小结

 

 

一、大文件读取之文件分割法

      

我们来看下这种方法的核心思路就是:不是文件太大了嘛?那么是否可以把文件拆分成几个小的文件,然后使用多线程进行读取呐?具体的步骤:

(1)先分割成多个文件。

(2)多个线程操作多个文件,避免两个线程操作同一个文件

(3)按行读文件

 

1.1 文件分割

       在Mac和Linux都有文件分割的命令,可以使用:

split  -b 1024m  test2.txt   /data/tmp/my/test.txt.

说明:

(1)split:分割命令;

(2)-b 1024m:指定每多少字就要切成一个小文件。支持单位:m,k;这里是将6.5G的文件按照1G进行拆分成7个文件左右。

(3)test2.txt:要分割的文件;

(4)test.txt. : 切割后文件的前置文件名,split会自动在前置文件名后再加上编号;

其它参数:

(1)-l<行数> : 指定每多少行就要切成一个小文件。

(2) -C<字节>:与-b参数类似,但切割时尽量维持每行的完整性。

分割成功之后文件是这样子的:














1.2 多线程读取分割文件

       我们使用多线程读取分割的文件,然后开启线程对每个文件进行处理:

  1. public void readFileBySplitFile(String pathname) {
  2. //pathname这里是路径,非具体的文件名,比如:/data/tmp/my
  3. File file = new File(pathname);
  4. File[] files = file.listFiles();
  5. List<MyThread> threads = new ArrayList<>();
  6. for(File f:files) {
  7. MyThread thread = new MyThread(f.getPath());
  8. threads.add(thread);
  9. thread.start();
  10. }
  11. for(MyThread t:threads) {
  12. try {
  13. t.join();
  14. } catch (InterruptedException e) {
  15. e.printStackTrace();
  16. }
  17. }
  18. }
  19. private class MyThread extends Thread{
  20. private String pathname;
  21. public MyThread(String pathname) {
  22. this.pathname = pathname;
  23. }
  24. @Override
  25. public void run() {
  26. readFileFileChannel(pathname);
  27. }
  28. }

 

说明:

(1)获取到指定目录下的所有分割的文件信息;

(2)遍历文件路径,将路径使用线程进行处理,这里线程的run使用readFileChannel进行读取每个文件的信息。

(3)join方法:就是让所有线程等待,然后回到主线程,不懂的可以参之前的一篇文章:《悟纤和师傅去女儿国「线程并行变为串行,Thread你好牛」

测试:6.5G 耗时:4

       这个多线程的方式,那么理论上是文件越大,优势会越明显。对于线程开启的个数,这里使用的是文件的个数,在实际中,能这么使用嘛?答案肯定是不行的。相信大家应该知道怎么进行改良下,这里不展开讲解。

 

 

二、大文件读取之多线程读取同一个文件

2.1 多线程1.0版本

我们在看一下这种方式就是使用多线程读取同一个文件,这种方式的思路,就是讲文件进行划分,从不同的位置进行读取,那么满足这种要求的就是RandomAccessFile,因为此类中有一个方法seek,可以指定开始的位置。

  1. public void readFileByMutiThread(String pathname, int threadCount) {
  2. BufferedRandomAccessFile randomAccessFile = null;
  3. try {
  4. randomAccessFile = new BufferedRandomAccessFile(pathname, "r");
  5. // 获取文件的长度,进行分割
  6. long fileTotalLength = randomAccessFile.length();
  7. // 分割的每个大小.
  8. long gap = fileTotalLength / threadCount;
  9. // 记录每个的开始位置和结束位置.
  10. long[] beginIndexs = new long[threadCount];
  11. long[] endIndexs = new long[threadCount];
  12. // 记录下一次的位置.
  13. long nextStartIndex = 0;
  14. // 找到每一段的开始和结束的位置.
  15. for (int n = 0; n < threadCount; n++) {
  16. beginIndexs[n] = nextStartIndex;
  17. // 如果是最后一个的话,剩下的部分,就全部给最后一个线程进行处理了.
  18. if (n + 1 == threadCount) {
  19. endIndexs[n] = fileTotalLength;
  20. break;
  21. }
  22. /*
  23. * 不是最后一个的话,需要获取endIndexs的位置.
  24. */
  25. // (1)上一个nextStartIndex的位置+gap就是下一个位置.
  26. nextStartIndex += gap;
  27. // (2)nextStartIndex可能不是刚好这一行的结尾部分,需要处理下.
  28. // 先将文件移动到这个nextStartIndex的位置,然后往后进行寻找位置.
  29. randomAccessFile.seek(nextStartIndex);
  30. // 主要是计算回车换行的位置.
  31. long gapToEof = 0;
  32. boolean eol = false;
  33. while (!eol) {
  34. switch (randomAccessFile.read()) {
  35. case -1:
  36. eol = true;
  37. break;
  38. case '\n':
  39. eol = true;
  40. break;
  41. case '\r':
  42. eol = true;
  43. break;
  44. default:
  45. gapToEof++;
  46. break;
  47. }
  48. }
  49. // while循环,那个位置刚好是对应的那一行的最后一个字符的结束,++就是换行符号的位置.
  50. gapToEof++;
  51. nextStartIndex += gapToEof;
  52. endIndexs[n] = nextStartIndex;
  53. }
  54. // 开启线程
  55. List<MyThread2> threads = new ArrayList<>();
  56. for (int i = 0; i < threadCount; i++) {
  57. MyThread2 thread = new MyThread2(pathname, beginIndexs[i], endIndexs[i]);
  58. threads.add(thread);
  59. thread.start();
  60. }
  61. // 等待汇总数据
  62. for (MyThread2 t : threads) {
  63. try {
  64. t.join();
  65. } catch (InterruptedException e) {
  66. e.printStackTrace();
  67. }
  68. }
  69. } catch (FileNotFoundException e) {
  70. e.printStackTrace();
  71. } catch (IOException e) {
  72. e.printStackTrace();
  73. }
  74. }

说明:此方法的作用就是对我们的文件根据线程的个数进行位置的分割,每个位置负责一部分的数据处理。

 



       我们看下具体线程的处理:

  1. private class MyThread2 extends Thread{
  2. private long begin;
  3. private long end;
  4. private String pathname;
  5. public MyThread2(String pathname,long begin,long end) {
  6. this.pathname = pathname;
  7. this.begin = begin;
  8. this.end = end;
  9. }
  10. @Override
  11. public void run() {
  12. //System.out.println("TestReadFile.MyThread2.run()-"+begin+"--"+end);
  13. RandomAccessFile randomAccessFile = null;
  14. try {
  15. randomAccessFile = new RandomAccessFile(pathname, "r");
  16. //指定其实读取的位置.
  17. randomAccessFile.seek(begin);
  18. StringBuffer buffer = new StringBuffer();
  19. String str;
  20. while ((str = randomAccessFile.readLine()) != null) {
  21. //System.out.println(str+"--"+Thread.currentThread().getName());
  22. //处理字符串,并不会将字符串保存真正保存到内存中
  23. // 这里简单模拟下处理操作.
  24. buffer.append(str.substring(0,1));
  25. //+1 就是要加上回车换行符号
  26. begin += (str.length()+1);
  27. if(begin>=end) {
  28. break;
  29. }
  30. }
  31. System.out.println("buffer.length:"+buffer.length()+"--"+Thread.currentThread().getName());
  32. } catch (IOException e) {
  33. e.printStackTrace();
  34. }finally {
  35. //TODO close处理.
  36. }
  37. }
  38. }

说明:此线程的主要工作就是根据文件的位置点beginPositionendPosition读取此区域的数据。

       运行看下效果,6.5G的,居然要运行很久,不知道什么时候要结束,实在等待不了,就结束运行了。

       为啥会这么慢呐?不是感觉这种处理方式很棒的嘛?为什么要伤害我弱小的心灵

       我们分析下:之前的方法readFileByRandomAccessFile,我们在测试的时候,结果也是很慢,所以可以得到并不是因为我们使用的线程的原因导致了很慢了,那么这个是什么原因导致的呐?

       我们找到RandomAccessFile  readLin()方法:

  1. public final String readLine() throws IOException {
  2. StringBuffer input = new StringBuffer();
  3. int c = -1;
  4. boolean eol = false;
  5. while (!eol) {
  6. switch (c = read()) {
  7. case -1:
  8. case '\n':
  9. eol = true;
  10. break;
  11. case '\r':
  12. eol = true;
  13. long cur = getFilePointer();
  14. if ((read()) != '\n') {
  15. seek(cur);
  16. }
  17. break;
  18. default:
  19. input.append((char)c);
  20. break;
  21. }
  22. }
  23. if ((c == -1) && (input.length() == 0)) {
  24. return null;
  25. }
  26. return input.toString();
  27. }

       此方法的原理就是:使用while循环,不停的读取字符,如果遇到\n或者\r的话,那么readLine就结束,并且返回此行的数据,那么核心的方法就是read():

  1. public int read() throws IOException {
  2. return read0();
  3. }
  4. private native int read0() throws IOException;

       直接调用的是本地方法了。那么这个方法是做了什么呢?我们可以通过注释分析下:

  1. * Reads a byte of data from this file. The byte is returned as an
  2. * integer in the range 0 to 255 ({@code 0x00-0x0ff}). This
  3. * method blocks if no input is yet available.

       通过这里我们可以知道:read()方法会从该文件读取一个字节的数据。 字节返回为介于0到255之间的整数({@code 0x00-0x0ff})。 这个如果尚无输入可用,该方法将阻塞。

       到这里,不知道你是否知道这个为啥会这么慢了。一个字节一个字节每次读取,那么肯定是比较慢的嘛。

 

2.2 多线程2.0版本

       那么怎么办呢?有一个类BufferedRandomAccessFile,当然这个类并不属于jdk中的类,需要自己去找下源代码:

  1. package com.kfit.bloomfilter;
  2. /**
  3. * Licensed to the Apache Software Foundation (ASF) under one
  4. * or more contributor license agreements. See the NOTICE file
  5. * distributed with this work for additional information
  6. * regarding copyright ownership. The ASF licenses this file
  7. * to you under the Apache License, Version 2.0 (the
  8. * "License"); you may not use this file except in compliance
  9. * with the License. You may obtain a copy of the License at
  10. *
  11. * http://www.apache.org/licenses/LICENSE-2.0
  12. *
  13. * Unless required by applicable law or agreed to in writing, software
  14. * distributed under the License is distributed on an "AS IS" BASIS,
  15. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  16. * See the License for the specific language governing permissions and
  17. * limitations under the License.
  18. */
  19. import java.io.File;
  20. import java.io.FileNotFoundException;
  21. import java.io.IOException;
  22. import java.io.RandomAccessFile;
  23. import java.util.Arrays;
  24. /**
  25. * A <code>BufferedRandomAccessFile</code> is like a
  26. * <code>RandomAccessFile</code>, but it uses a private buffer so that most
  27. * operations do not require a disk access.
  28. * <P>
  29. *
  30. * Note: The operations on this class are unmonitored. Also, the correct
  31. * functioning of the <code>RandomAccessFile</code> methods that are not
  32. * overridden here relies on the implementation of those methods in the
  33. * superclass.
  34. */
  35. public final class BufferedRandomAccessFile extends RandomAccessFile
  36. {
  37. static final int LogBuffSz_ = 16; // 64K buffer
  38. public static final int BuffSz_ = (1 << LogBuffSz_);
  39. static final long BuffMask_ = ~(((long) BuffSz_) - 1L);
  40. private String path_;
  41. /*
  42. * This implementation is based on the buffer implementation in Modula-3's
  43. * "Rd", "Wr", "RdClass", and "WrClass" interfaces.
  44. */
  45. private boolean dirty_; // true iff unflushed bytes exist
  46. private boolean syncNeeded_; // dirty_ can be cleared by e.g. seek, so track sync separately
  47. private long curr_; // current position in file
  48. private long lo_, hi_; // bounds on characters in "buff"
  49. private byte[] buff_; // local buffer
  50. private long maxHi_; // this.lo + this.buff.length
  51. private boolean hitEOF_; // buffer contains last file block?
  52. private long diskPos_; // disk position
  53. /*
  54. * To describe the above fields, we introduce the following abstractions for
  55. * the file "f":
  56. *
  57. * len(f) the length of the file curr(f) the current position in the file
  58. * c(f) the abstract contents of the file disk(f) the contents of f's
  59. * backing disk file closed(f) true iff the file is closed
  60. *
  61. * "curr(f)" is an index in the closed interval [0, len(f)]. "c(f)" is a
  62. * character sequence of length "len(f)". "c(f)" and "disk(f)" may differ if
  63. * "c(f)" contains unflushed writes not reflected in "disk(f)". The flush
  64. * operation has the effect of making "disk(f)" identical to "c(f)".
  65. *
  66. * A file is said to be *valid* if the following conditions hold:
  67. *
  68. * V1. The "closed" and "curr" fields are correct:
  69. *
  70. * f.closed == closed(f) f.curr == curr(f)
  71. *
  72. * V2. The current position is either contained in the buffer, or just past
  73. * the buffer:
  74. *
  75. * f.lo <= f.curr <= f.hi
  76. *
  77. * V3. Any (possibly) unflushed characters are stored in "f.buff":
  78. *
  79. * (forall i in [f.lo, f.curr): c(f)[i] == f.buff[i - f.lo])
  80. *
  81. * V4. For all characters not covered by V3, c(f) and disk(f) agree:
  82. *
  83. * (forall i in [f.lo, len(f)): i not in [f.lo, f.curr) => c(f)[i] ==
  84. * disk(f)[i])
  85. *
  86. * V5. "f.dirty" is true iff the buffer contains bytes that should be
  87. * flushed to the file; by V3 and V4, only part of the buffer can be dirty.
  88. *
  89. * f.dirty == (exists i in [f.lo, f.curr): c(f)[i] != f.buff[i - f.lo])
  90. *
  91. * V6. this.maxHi == this.lo + this.buff.length
  92. *
  93. * Note that "f.buff" can be "null" in a valid file, since the range of
  94. * characters in V3 is empty when "f.lo == f.curr".
  95. *
  96. * A file is said to be *ready* if the buffer contains the current position,
  97. * i.e., when:
  98. *
  99. * R1. !f.closed && f.buff != null && f.lo <= f.curr && f.curr < f.hi
  100. *
  101. * When a file is ready, reading or writing a single byte can be performed
  102. * by reading or writing the in-memory buffer without performing a disk
  103. * operation.
  104. */
  105. /**
  106. * Open a new <code>BufferedRandomAccessFile</code> on <code>file</code>
  107. * in mode <code>mode</code>, which should be "r" for reading only, or
  108. * "rw" for reading and writing.
  109. */
  110. public BufferedRandomAccessFile(File file, String mode) throws IOException
  111. {
  112. this(file, mode, 0);
  113. }
  114. public BufferedRandomAccessFile(File file, String mode, int size) throws IOException
  115. {
  116. super(file, mode);
  117. path_ = file.getAbsolutePath();
  118. this.init(size);
  119. }
  120. /**
  121. * Open a new <code>BufferedRandomAccessFile</code> on the file named
  122. * <code>name</code> in mode <code>mode</code>, which should be "r" for
  123. * reading only, or "rw" for reading and writing.
  124. */
  125. public BufferedRandomAccessFile(String name, String mode) throws IOException
  126. {
  127. this(name, mode, 0);
  128. }
  129. public BufferedRandomAccessFile(String name, String mode, int size) throws FileNotFoundException
  130. {
  131. super(name, mode);
  132. path_ = name;
  133. this.init(size);
  134. }
  135. private void init(int size)
  136. {
  137. this.dirty_ = false;
  138. this.lo_ = this.curr_ = this.hi_ = 0;
  139. this.buff_ = (size > BuffSz_) ? new byte[size] : new byte[BuffSz_];
  140. this.maxHi_ = (long) BuffSz_;
  141. this.hitEOF_ = false;
  142. this.diskPos_ = 0L;
  143. }
  144. public String getPath()
  145. {
  146. return path_;
  147. }
  148. public void sync() throws IOException
  149. {
  150. if (syncNeeded_)
  151. {
  152. flush();
  153. getChannel().force(true);
  154. syncNeeded_ = false;
  155. }
  156. }
  157. // public boolean isEOF() throws IOException
  158. // {
  159. // assert getFilePointer() <= length();
  160. // return getFilePointer() == length();
  161. // }
  162. public void close() throws IOException
  163. {
  164. this.flush();
  165. this.buff_ = null;
  166. super.close();
  167. }
  168. /**
  169. * Flush any bytes in the file's buffer that have not yet been written to
  170. * disk. If the file was created read-only, this method is a no-op.
  171. */
  172. public void flush() throws IOException
  173. {
  174. this.flushBuffer();
  175. }
  176. /* Flush any dirty bytes in the buffer to disk. */
  177. private void flushBuffer() throws IOException
  178. {
  179. if (this.dirty_)
  180. {
  181. if (this.diskPos_ != this.lo_)
  182. super.seek(this.lo_);
  183. int len = (int) (this.curr_ - this.lo_);
  184. super.write(this.buff_, 0, len);
  185. this.diskPos_ = this.curr_;
  186. this.dirty_ = false;
  187. }
  188. }
  189. /*
  190. * Read at most "this.buff.length" bytes into "this.buff", returning the
  191. * number of bytes read. If the return result is less than
  192. * "this.buff.length", then EOF was read.
  193. */
  194. private int fillBuffer() throws IOException
  195. {
  196. int cnt = 0;
  197. int rem = this.buff_.length;
  198. while (rem > 0)
  199. {
  200. int n = super.read(this.buff_, cnt, rem);
  201. if (n < 0)
  202. break;
  203. cnt += n;
  204. rem -= n;
  205. }
  206. if ( (cnt < 0) && (this.hitEOF_ = (cnt < this.buff_.length)) )
  207. {
  208. // make sure buffer that wasn't read is initialized with -1
  209. Arrays.fill(this.buff_, cnt, this.buff_.length, (byte) 0xff);
  210. }
  211. this.diskPos_ += cnt;
  212. return cnt;
  213. }
  214. /*
  215. * This method positions <code>this.curr</code> at position <code>pos</code>.
  216. * If <code>pos</code> does not fall in the current buffer, it flushes the
  217. * current buffer and loads the correct one.<p>
  218. *
  219. * On exit from this routine <code>this.curr == this.hi</code> iff <code>pos</code>
  220. * is at or past the end-of-file, which can only happen if the file was
  221. * opened in read-only mode.
  222. */
  223. public void seek(long pos) throws IOException
  224. {
  225. if (pos >= this.hi_ || pos < this.lo_)
  226. {
  227. // seeking outside of current buffer -- flush and read
  228. this.flushBuffer();
  229. this.lo_ = pos & BuffMask_; // start at BuffSz boundary
  230. this.maxHi_ = this.lo_ + (long) this.buff_.length;
  231. if (this.diskPos_ != this.lo_)
  232. {
  233. super.seek(this.lo_);
  234. this.diskPos_ = this.lo_;
  235. }
  236. int n = this.fillBuffer();
  237. this.hi_ = this.lo_ + (long) n;
  238. }
  239. else
  240. {
  241. // seeking inside current buffer -- no read required
  242. if (pos < this.curr_)
  243. {
  244. // if seeking backwards, we must flush to maintain V4
  245. this.flushBuffer();
  246. }
  247. }
  248. this.curr_ = pos;
  249. }
  250. public long getFilePointer()
  251. {
  252. return this.curr_;
  253. }
  254. public long length() throws IOException
  255. {
  256. // max accounts for the case where we have written past the old file length, but not yet flushed our buffer
  257. return Math.max(this.curr_, super.length());
  258. }
  259. public int read() throws IOException
  260. {
  261. if (this.curr_ >= this.hi_)
  262. {
  263. // test for EOF
  264. // if (this.hi < this.maxHi) return -1;
  265. if (this.hitEOF_)
  266. return -1;
  267. // slow path -- read another buffer
  268. this.seek(this.curr_);
  269. if (this.curr_ == this.hi_)
  270. return -1;
  271. }
  272. byte res = this.buff_[(int) (this.curr_ - this.lo_)];
  273. this.curr_++;
  274. return ((int) res) & 0xFF; // convert byte -> int
  275. }
  276. public int read(byte[] b) throws IOException
  277. {
  278. return this.read(b, 0, b.length);
  279. }
  280. public int read(byte[] b, int off, int len) throws IOException
  281. {
  282. if (this.curr_ >= this.hi_)
  283. {
  284. // test for EOF
  285. // if (this.hi < this.maxHi) return -1;
  286. if (this.hitEOF_)
  287. return -1;
  288. // slow path -- read another buffer
  289. this.seek(this.curr_);
  290. if (this.curr_ == this.hi_)
  291. return -1;
  292. }
  293. len = Math.min(len, (int) (this.hi_ - this.curr_));
  294. int buffOff = (int) (this.curr_ - this.lo_);
  295. System.arraycopy(this.buff_, buffOff, b, off, len);
  296. this.curr_ += len;
  297. return len;
  298. }
  299. public void write(int b) throws IOException
  300. {
  301. if (this.curr_ >= this.hi_)
  302. {
  303. if (this.hitEOF_ && this.hi_ < this.maxHi_)
  304. {
  305. // at EOF -- bump "hi"
  306. this.hi_++;
  307. }
  308. else
  309. {
  310. // slow path -- write current buffer; read next one
  311. this.seek(this.curr_);
  312. if (this.curr_ == this.hi_)
  313. {
  314. // appending to EOF -- bump "hi"
  315. this.hi_++;
  316. }
  317. }
  318. }
  319. this.buff_[(int) (this.curr_ - this.lo_)] = (byte) b;
  320. this.curr_++;
  321. this.dirty_ = true;
  322. syncNeeded_ = true;
  323. }
  324. public void write(byte[] b) throws IOException
  325. {
  326. this.write(b, 0, b.length);
  327. }
  328. public void write(byte[] b, int off, int len) throws IOException
  329. {
  330. while (len > 0)
  331. {
  332. int n = this.writeAtMost(b, off, len);
  333. off += n;
  334. len -= n;
  335. this.dirty_ = true;
  336. syncNeeded_ = true;
  337. }
  338. }
  339. /*
  340. * Write at most "len" bytes to "b" starting at position "off", and return
  341. * the number of bytes written.
  342. */
  343. private int writeAtMost(byte[] b, int off, int len) throws IOException
  344. {
  345. if (this.curr_ >= this.hi_)
  346. {
  347. if (this.hitEOF_ && this.hi_ < this.maxHi_)
  348. {
  349. // at EOF -- bump "hi"
  350. this.hi_ = this.maxHi_;
  351. }
  352. else
  353. {
  354. // slow path -- write current buffer; read next one
  355. this.seek(this.curr_);
  356. if (this.curr_ == this.hi_)
  357. {
  358. // appending to EOF -- bump "hi"
  359. this.hi_ = this.maxHi_;
  360. }
  361. }
  362. }
  363. len = Math.min(len, (int) (this.hi_ - this.curr_));
  364. int buffOff = (int) (this.curr_ - this.lo_);
  365. System.arraycopy(b, off, this.buff_, buffOff, len);
  366. this.curr_ += len;
  367. return len;
  368. }
  369. }

       然后将我们在上面使用到的类RandomAccessFile  替换成BufferedRandomAccessFile 即可。

       来测试下吧:

如果是前面的方法:

TestReadFile.readFileByBufferedRandomAccessFile(pathname2);

6.5G 耗时:32

       相比之前一直不能读取的情况下,已经是好很多了,但是相对于nio的话,还是慢了。

       测试下多线程版本的吧:

6.5G 耗时:2个线程20秒,3个线程16秒,4个线程14秒,5个线程11秒,6个线程8秒,7个线程8秒,8个线程9

       我这个Mac电脑是6核处理器,所以在6核的时候,达到了性能的最高点,在开启的更多的时候,线程的上下文切换会浪费这个时间,所以时间就越越来越高。但和上面的版本好像还是不能媲美。

 

2.3 多线程3.0版本

       RandomAccessFile的绝大多数功能,在JDK 1.4以后被nio的”内存映射文件(memory-mapped files)”给取代了MappedByteBuffer,大家可以自行去尝试下,本文就不展开讲解了。

 

三、悟纤小结

师傅:本文有点难,也有点辣眼睛骚脑,今天就为师给你总结下。

徒儿:师傅,我太难了,我都要听睡着了。

师傅:文件操作本身就会比较复杂,在一个项目中,也不是所有人都会去写IO流的代码。

       来个小结,主要讲了两个知识点。

(1)第一:使用文件分隔的方式读取大文件,配套NIO的技术,速度会有提升。核心的思路就是:使用Mac/Linx下的split命令,将大文件分割成几个小的文件,然后使用多线程分别读取每个小文件。13.56G :分割为6个文件,耗时8秒;26G,耗时16秒。按照这样的情况,那么读取100G的时间,也就是1分钟左右的事情了,当然实际耗时,还是和你具体的获取数据的处理方法有很大的关系,比如你使用系统的System.out的话,那么这个时间就很长了。

(2)第二:使用多线程读取大文件。核心的思路就是:根据文件的长度将文件分割成n段,然后开启多线程利用类RandomAccessFile的位置定位seek方法,直接从此位置开启读取。13.56G 6个线程耗时23秒。

       另外实际上NIO的FileChannel单线程下的读取速度也是挺快的:13.56G  :耗时15,之前就提到过了Java天然支持大文件的处理,这就是Java ,不仅Write once ,而且Write happy

       最后要注意下,ByteBuffer读取到的是很多行的数据,不是一行一行的数据。

  1. 我就是我,是颜色不一样的烟火。
  2. 我就是我,是与众不同的小苹果。

    购买完整视频,请前往:http://www.mark-to-win.com/TeacherV2.html?id=287