Subscribe to How-To Geek Mag-subscribe sa Paano-Upang Geek

Important! This is an automatic machine translated page. If you can read english, you should Click Here to read the original English version of the article.

Split a text file in half (or any percentage) on Ubuntu Linux Hatiin ang isang text file na sa kalahati (o anumang porsyento) sa Ubuntu Linux

If you have an unwieldy text file that you are trying to process, splitting it in sections can sometimes help processing time, especially if we were going to import a file into a spreadsheet. Or you might want to just retrieve a particular set of lines from a file. Kung ikaw ay isang mahirap gamitin text file na sinusubukan mong proseso, malakas ang mga ito sa mga seksyon ay maaaring paminsan-minsan ng tulong sa pagpoproseso ng oras, lalo na kung kami ay pagpunta sa import ng isang file sa isang spreadsheet. O baka gusto mong makuha lamang sa isang partikular na set ng mga linya mula sa isang file.

Enter split, wc, tail, cat, and grep. Ipasok ang split, WC, buntot, cat, at grep. (don't forget sed and awk). (huwag kalimutang sed at awk). Linux contains a rich set of utilities for working with text files on the command line. Linux ay naglalaman ng isang mayamang hanay ng mga kagamitan para sa mga nagtatrabaho sa text file sa command line. For our task today we will use split and wc. Para sa aming gawain sa araw na gagamitin namin ang split at WC.

First we take a look at our log file…. Una naming tingnan ang aming mga log file ....

> ls -l > Ls-l
-rw-r–r– 1 thegeek ggroup 42046520 2006-09-19 11:42 access.log -RW-r-r-1 thegeek ggroup 42046520 2006-09-19 11:42 access.log

We see that the file size is 42MB. Makita namin na ang mga sukat ng file ay 42MB. That's kinda big… but how many lines are we dealing with? That's kinda malaki ... pero kung gaano karami ang mga linya ay aming pakikitungo sa? If we wanted to import this into Excel, we would need to keep it less than 65k lines. Kung nais mag-import ng ito sa Excel, nais namin na kailangan upang panatilihin ito ng mas mababa sa 65k linya.

Let's check the amount of lines in the file using the wc utility, which stands for “word count”. Let's check ang halaga ng mga linya sa file gamit ang WC utility, na kung saan ay kumakatawan sa "bilang ng salita".

> wc -l access.log > WC-l access.log
146330 access.log 146,330 access.log

We're way over our limit. Ikinalulungkot namin na paraan sa aming mga limitasyon. We'll need to split this into 3 segments. Aming kailangan split ito sa 3 segment. We'll use the split utility to do this. Gagamitin namin ang mga split utility upang gawin ito.

> split -l 60000 access.log > Split-l 60,000 access.log
> ls -l > Ls-l

total 79124 kabuuang 79,124
-rw-rw-r– 1 thegeek ggroup 40465200 2006-09-19 12:00 access.log -RW-RW-r-1 thegeek ggroup 40465200 2006-09-19 12:00 access.log
-rw-rw-r– 1 thegeek ggroup 16598163 2006-09-19 12:05 xaa -RW-RW-r-1 thegeek ggroup 16598163 2006-09-19 12:05 xaa
-rw-rw-r– 1 thegeek ggroup 16596545 2006-09-19 12:05 xab -RW-RW-r-1 thegeek ggroup 16596545 2006-09-19 12:05 xab
-rw-rw-r– 1 thegeek ggroup 7270492 2006-09-19 12:05 xac -RW-RW-r-1 thegeek ggroup 7270492 2006-09-19 12:05 xac

We've now split our text files into 3 seperate files, each containing less than 60000 lines, which seemed like a good number to choose. Ngayon kami ng split aming text file sa 3 hiwalay na mga file, ang bawat isa na naglalaman ng mas mababa sa 60,000 na linya, na tila tulad ng isang magandang numero para pumili. The last file contains the leftover amount. Ang huling file na naglalaman ng mga tira-tirahan na halaga. If you were going to cut this particular file in half, you'd have done this: Kung kayo ay pagpunta sa hiwa sa partikular na file sa kalahati, gusto mo nagawa na ito:

> split -l 73165 access.log > Split-l 73,165 access.log

And, that's all there is to it. At, na ang lahat diyan ay sa ito.

This article was originally written on 09/19/06 Tagged with: Ang artikulong ito ay orihinal na isinulat sa 09/19/06 Tagged with: Shell Scripts Shell Scripts , , Ubuntu Ubuntu

Daily Email Updates Araw-araw na Updates Email

You can get our how-to articles in your inbox each day for free. Maaari kang makakuha ng aming kung-paano na mga artikulo sa iyong inbox sa bawat araw para sa libre. Just enter your name and email below: Ilagay lamang ang inyong pangalan at email sa ibaba:


Name: Pangalan:
Email: Email:

Comments (1) Comments (1)

  1. Simon C. Ion Simon C. Ion

    If you have bc and sed installed, why not do this to calculate the halfway point of the file and perform the split? Kung kayo ay BC at sed na naka-install, bakit hindi gawin ito upang makalkula ang Halfway point ng mga file at gawin ang split?

    split -l $(echo $(cat tmp.txt | wc -l)/2 | bc -l | sed -e 's/\..*//') access.log split-l $ (echo $ (cat tmp.txt | WC-l) / 2 | BC-l | sed-e 's / \ ..*//') access.log

    NB: bc seems to default to a floating-point output. NB: BC anyong-default sa isang lumulutang-output point. The sed invocation effectively act as a call to floor(3), stripping away the numbers after the decimal, and making my version of split happy. Ang sed pananalangin epektibong kumikilos bilang isang tawag sa sahig (3), bakbak malayo sa mga numero na pagkatapos ng decimal, at paggawa ng aking bersyon ng split masaya. I guess that the sed expression would need to be changed to 's/,.*//' for locales that use ',' as their “numbers after the decimal” indicator. Ako hulaan na ang sed na expression ay kailangang baguhin upang 's /,.*//' para sa locales na gamitin ang', 'bilang kanilang "mga numero pagkatapos ng decimal" indicator.


Our Friends Ang aming mga Friends
Getting Started Pagsisimula


About How-To Geek Tungkol sa Paano-Upang Geek
What Is That Process? Ano ba ang Proseso Iyon?
svchost.exe svchost.exe
jusched.exe jusched.exe
dwm.exe dwm.exe
ctfmon.exe ctfmon.exe
wmpnetwk.exe wmpnetwk.exe
wmpnscfg.exe wmpnscfg.exe
rundll32.exe rundll32.exe
wfcrun32.exe wfcrun32.exe
Ipoint.exe Ipoint.exe
Itype.exe Itype.exe
Wfica32.exe Wfica32.exe
Mobsync.exe Mobsync.exe
conhost.exe conhost.exe
Dpupdchk.exe Dpupdchk.exe Adobe_Updater.exe Adobe_Updater.exe

Copyright © 2006-2009 HowToGeek.com. Copyright © 2006-2009 HowToGeek.com. All Rights Reserved. All Rights Reserved.