Important! This is an automatic machine translated page. If you can read english, you should Click Here to read the original English version of the article.

Split a text file in half (or any percentage) on Ubuntu Linux Split besedilne datoteke v polovici (ali kateri koli odstotek) na Ubuntu Linux

If you have an unwieldy text file that you are trying to process, splitting it in sections can sometimes help processing time, especially if we were going to import a file into a spreadsheet. Or you might want to just retrieve a particular set of lines from a file. Če imate počasne besedilna datoteka, ki jo poskušate postopku, delitev je v oddelkih lahko včasih pomaga čas obdelave, še posebej, če smo bili bo uvoz datoteke v preglednico. Ali pa si morda želeli le pridobiti določeno skupino linij iz datoteke.

Enter split, wc, tail, cat, and grep. Vpišite split, wc, rep, mačka, in grep. (don't forget sed and awk). (ne pozabite in sed awk). Linux contains a rich set of utilities for working with text files on the command line. Linux vsebuje bogat nabor služb za delo z besedilom datoteke v ukazni vrstici. For our task today we will use split and wc. Za naša naloga danes bomo uporabili delih in wc.

First we take a look at our log file…. Najprej smo si oglejte naše log datoteko ....

> ls -l > Ls-l
-rw-r–r– 1 thegeek ggroup 42046520 2006-09-19 11:42 access.log -rw-r-r-1 thegeek ggroup 42046520 2006-09-19 11:42 access.log

We see that the file size is 42MB. Vidimo, da je velikost datoteke je 42MB. That's kinda big… but how many lines are we dealing with? To je nekako velik, ampak ... koliko vrstic smo se ukvarjajo z? If we wanted to import this into Excel, we would need to keep it less than 65k lines. Če bi želeli, da ta uvoz v Excel, bi potrebovali, da ostane manj kot 65k vrstic.

Let's check the amount of lines in the file using the wc utility, which stands for “word count”. Let's preverite količino vrstic v datoteko z wc korist, ki je kratica za "Word count".

> wc -l access.log > Wc-l access.log
146330 access.log 146.330 access.log

We're way over our limit. We're poti preko naše meje. We'll need to split this into 3 segments. Bomo morali razdeliti to na 3 segmente. We'll use the split utility to do this. Bomo uporabili delih pripomoček za to.

> split -l 60000 access.log > Split-l 60000 access.log
> ls -l > Ls-l

total 79124 skupaj 79.124
-rw-rw-r– 1 thegeek ggroup 40465200 2006-09-19 12:00 access.log -rw-rw-r-1 thegeek ggroup 40465200 2006-09-19 12:00 access.log
-rw-rw-r– 1 thegeek ggroup 16598163 2006-09-19 12:05 xaa -rw-rw-r-1 thegeek ggroup 16598163 2006-09-19 12:05 xaa
-rw-rw-r– 1 thegeek ggroup 16596545 2006-09-19 12:05 xab -rw-rw-r-1 thegeek ggroup 16596545 2006-09-19 12:05 xab
-rw-rw-r– 1 thegeek ggroup 7270492 2006-09-19 12:05 xac -rw-rw-r-1 thegeek ggroup 7270492 2006-09-19 12:05 xac

We've now split our text files into 3 seperate files, each containing less than 60000 lines, which seemed like a good number to choose. Mi smo zdaj po delih naše besedilne datoteke v 3 ločenih datotekah, od katerih vsaka vsebuje manj kot 60.000 vrstic, ki se je zdel kot dober številko izbrati. The last file contains the leftover amount. Zadnja datoteka vsebuje ostanke znesek. If you were going to cut this particular file in half, you'd have done this: Če si hotel zmanjšati to določeno datoteko na pol, bi si naredil to:

> split -l 73165 access.log > Split-l 73165 access.log

And, that's all there is to it. In to je vse, kar je z njo.

This article was originally written on 09/19/06 Tagged with: Ta članek je bil prvotno napisan na 09/19/06 Tagged with: Shell Scripts Shell Scripts , , Ubuntu Ubuntu

Daily Email Updates Dnevni Email Updates

You can get our how-to articles in your inbox each day for free. Lahko dobite našo kako do člankov v vašo mapo »Prejeto vsak dan brezplačno. Just enter your name and email below: Preprosto vpišite vaše ime in e-pošto spodaj:


Name: Ime:
Email: E-pošta:

Comments (1) Komentarji (1)

  1. Simon C. Ion Simon C. Ion

    If you have bc and sed installed, why not do this to calculate the halfway point of the file and perform the split? Če imate bc in sed nameščen, zakaj ne bi to naredili za izračun pol točke datoteke in opravlja po delih?

    split -l $(echo $(cat tmp.txt | wc -l)/2 | bc -l | sed -e 's/\..*//') access.log split-l $ (echo $ (cat tmp.txt | wc-l) / 2 | bc-l | sed-e 's / \ ..*//') access.log

    NB: bc seems to default to a floating-point output. Opomba: bc se zdi, da privzeto plavajočo-izhodna točka. The sed invocation effectively act as a call to floor(3), stripping away the numbers after the decimal, and making my version of split happy. Sed sklicevanje dejansko deluje kot poziv k nadstropje (3), stripping proč številke po decimalke, in bi moja različica delih srečna. I guess that the sed expression would need to be changed to 's/,.*//' for locales that use ',' as their “numbers after the decimal” indicator. Mislim, da bi sed izražanja, je treba spremeniti, da je /,.*//' locales za to uporabo ', kot njihovi "številke za decimalno" kazalnik.


Leave a Comment Pustite komentar




Leave your Pustite friendly prijazen comment here. comment here.

If you have a computer help question, Če imate računalnik pomagati vprašanje, click here to leave it on the forums kliknite tukaj, da ga pustijo na forumih instead. namesto.

Note: Your comment may not show up immediately on the site. Opomba: Tvoj komentar morda ne pokažejo takoj na mestu.

Our Friends Naši prijatelji
Getting Started Getting Started


About How-To Geek O Kako naj Geek
What Is That Process? Kaj je to proces?
svchost.exe svchost.exe
jusched.exe jusched.exe
dwm.exe dwm.exe
ctfmon.exe Ctfmon.exe
wmpnetwk.exe wmpnetwk.exe
wmpnscfg.exe wmpnscfg.exe
rundll32.exe rundll32.exe
wfcrun32.exe wfcrun32.exe
Ipoint.exe Ipoint.exe
Itype.exe Itype.exe
Wfica32.exe Wfica32.exe
Mobsync.exe Mobsync.exe
conhost.exe conhost.exe
Dpupdchk.exe Dpupdchk.exe Adobe_Updater.exe Adobe_Updater.exe

Copyright © 2006-2009 HowToGeek.com. Copyright © 2006-2009 HowToGeek.com. All Rights Reserved. All Rights Reserved.