Subscribe to How-To Geek

Split a text file in half (or any percentage) on Ubuntu Linux

If you have an unwieldy text file that you are trying to process, splitting it in sections can sometimes help processing time, especially if we were going to import a file into a spreadsheet. Or you might want to just retrieve a particular set of lines from a file.

Enter split, wc, tail, cat, and grep. (don’t forget sed and awk). Linux contains a rich set of utilities for working with text files on the command line. For our task today we will use split and wc.

First we take a look at our log file….

> ls -l
-rw-r–r– 1 thegeek ggroup 42046520 2006-09-19 11:42 access.log

We see that the file size is 42MB. That’s kinda big… but how many lines are we dealing with? If we wanted to import this into Excel, we would need to keep it less than 65k lines.

Let’s check the amount of lines in the file using the wc utility, which stands for “word count”.

> wc -l access.log
146330 access.log

We’re way over our limit. We’ll need to split this into 3 segments. We’ll use the split utility to do this.

> split -l 60000 access.log
> ls -l

total 79124
-rw-rw-r– 1 thegeek ggroup 40465200 2006-09-19 12:00 access.log
-rw-rw-r– 1 thegeek ggroup 16598163 2006-09-19 12:05 xaa
-rw-rw-r– 1 thegeek ggroup 16596545 2006-09-19 12:05 xab
-rw-rw-r– 1 thegeek ggroup 7270492 2006-09-19 12:05 xac

We’ve now split our text files into 3 seperate files, each containing less than 60000 lines, which seemed like a good number to choose. The last file contains the leftover amount. If you were going to cut this particular file in half, you’d have done this:

> split -l 73165 access.log

And, that’s all there is to it.

The Geek is the founder of How-To Geek and a geek enthusiast. This article was written on 09/19/06 and tagged with: Shell Scripts, Ubuntu

Daily Email Updates

You can get our how-to articles in your inbox each day for free. Just enter your name and email below:


Name:
Email:
Similar Articles Featured Wiki Articles
Latest Software Reviews Quick Linux Tips
Geek Arcade Popular Forum Threads

Comments (1)

  1. Simon C. Ion

    If you have bc and sed installed, why not do this to calculate the halfway point of the file and perform the split?

    split -l $(echo $(cat tmp.txt | wc -l)/2 | bc -l | sed -e ’s/\..*//’) access.log

    NB: bc seems to default to a floating-point output. The sed invocation effectively act as a call to floor(3), stripping away the numbers after the decimal, and making my version of split happy. I guess that the sed expression would need to be changed to ’s/,.*//’ for locales that use ‘,’ as their “numbers after the decimal” indicator.


Leave a Comment




Leave your friendly comment here.

If you have a computer help question, click here to leave it on the forums instead.

Note: Your comment may not show up immediately on the site.

Sponsored Links
Getting Started
About How-To Geek
What Is That Process?
svchost.exe
jusched.exe
dwm.exe
ctfmon.exe
wmpnetwk.exe
wmpnscfg.exe
rundll32.exe
wfcrun32.exe
Ipoint.exe
Itype.exe
Wfica32.exe
Mobsync.exe
Cmd.exe
Dpupdchk.exe Adobe_Updater.exe

Copyright © 2006-2009 HowToGeek.com. All Rights Reserved.