Read to Second Underscore of Filename Sed
Linux sed command
Updated: 11/06/2021 by Computer Hope
On Unix-like operating systems, sed is a stream editor: it filters and transforms text.
This folio covers the GNU/Linux version of sed.
Description
The sed stream editor performs basic text transformations on an input stream (a file, or input from a pipeline). While in some means like to an editor which permits scripted edits (such as ed), sed works past making merely one pass over the input(s), and is consequently more efficient. But it is sed'due south ability to filter text in a pipeline which especially distinguishes information technology from other types of editors.
Syntax
sed OPTIONS... [SCRIPT] [INPUTFILE...]
If you practise not specify INPUTFILE, or if INPUTFILE is "-", sed filters the contents of the standard input. The script is actually the commencement non-option parameter, which sed specially considers a script and not an input file if and merely if none of the other options specifies a script to exist executed (that is, if neither of the -e and -f options is specified).
Options
-n, --repose, --silent | Suppress automatic printing of blueprint space. |
-eastward script, --expression= script | Add together the script script to the commands to be executed. |
-f script-file, --file= script-file | Add the contents of script-file to the commands to exist executed. |
--follow-symlinks | Follow symlinks when processing in place. |
-i[SUFFIX], --in-place[= SUFFIX] | Edit files in place (this makes a fill-in with file extension SUFFIX, if SUFFIX is supplied). |
-50 N, --line-length= Northward | Specify the desired line-wrap length, Northward, for the "fifty" control. |
--POSIX | Disable all GNU extensions. |
-r, --regexp-extended | Use extended regular expressions in the script. |
-southward, --separate | Consider files as separate rather than as a unmarried continuous long stream. |
-u, --unbuffered | Load minimal amounts of information from the input files and affluent the output buffers more oft. |
--help | Brandish a help message, and exit. |
--version | Output version information, and get out. |
Sed programs
A sed program consists of 1 or more sed commands, passed in by one or more than of the -e, -f, --expression, and --file options, or the first non-option argument if none of these options are used. This documentation frequently refers to "the" sed script; this should be understood to hateful the in-order catenation of all of the scripts and script-files passed in.
Commands within a script or script-file can be separated by semicolons (";") or newlines (ASCII code 10). Some commands, due to their syntax, cannot be followed by semicolons working as control separators and thus should be terminated with newlines or be placed at the terminate of a script or script-file. Commands can also be preceded with optional not-pregnant whitespace characters.
Each sed control consists of an optional address or address range (for example, line numbers specifying what part of the file to operate on; see selecting lines for details), followed past a one-character command name and whatever additional control-specific lawmaking.
How sed works
sed maintains 2 information buffers: the agile design infinite, and the auxiliary hold space. Both are initially empty.
sed operates past performing the following cycle on each line of input: first, sed reads one line from the input stream, removes whatsoever trailing newline, and places it in the pattern space. And so commands are executed; each command tin can accept an address associated to it: addresses are a kind of condition lawmaking, and a control is only executed if the condition is verified before the command is to be executed.
When the finish of the script is reached, unless the -northward option is in utilize, the contents of pattern space are printed out to the output stream, adding back the trailing newline if information technology was removed. And so the adjacent cycle starts for the adjacent input line.
Unless special commands (similar 'D') are used, the blueprint space is deleted between 2 cycles. The agree space, on the other hand, keeps its data between cycles (see commands 'h', 'H', 'x', 'g', 'Grand' to motion information between both buffers).
Selecting lines with sed
Addresses in a sed script can exist in any of the post-obit forms:
number | Specifying a line number volition match simply that line in the input. (Notation that sed counts lines continuously across all input files unless -i or -s options are specified.) |
kickoff ~ step | This GNU extension of sed matches every step lines starting with line kickoff. In item, lines volition be selected when in that location exists a not-negative n such that the electric current line-number equals first + (n * step). Thus, to select the odd-numbered lines, one would employ i~two; to pick every third line starting with the second, '2~three' would be used; to selection every fifth line starting with the 10th, utilize '10~5'; and '50~0' is another style of saying fifty. |
$ | This address matches the last line of the last file of input, or the concluding line of each file when the -i or -s options are specified. |
/ regexp / | This selects any line which matches the regular expression regexp. If regexp itself includes any "/" characters, each must exist escaped by a backslash ("\"). The empty regular expression '//' repeats the terminal regular expression match (the aforementioned holds if the empty regular expression is passed to the s command). Annotation that modifiers to regular expressions are evaluated when the regular expression is compiled, thus it is invalid to specify them together with the empty regular expression. |
\ % regexp % | (The % may be replaced past any other single grapheme.) This also matches the regular expression regexp, but allows one to use a different delimiter than "/". This option is particularly useful if the regexp itself contains a lot of slashes, since it avoids the wearisome escaping of every "/". If regexp itself includes any delimiter characters, each must be escaped by a backslash ("\"). |
/ regexp /I \ % regexp % I | The I modifier to regular-expression matching is a GNU extension which causes the regexp to be matched in a case-insensitive (equally opposed to instance-sensitive) fashion. |
/ regexp /M \ % regexp % M | The Grand modifier to regular-expression matching is a GNU sed extension which causes ^ and $ to friction match respectively (in improver to the normal beliefs) the empty cord after a newline, and the empty string earlier a newline. There are special character sequences ("\`" and "\'") which always match the offset or the end of the buffer. M stands for multi-line. |
If no addresses are given, and so all lines are matched; if one address is given, and then only lines matching that address are matched.
An address range tin be specified by specifying two addresses separated past a comma (","). An accost range matches lines starting from where the first address matches, and continues until the second address matches (inclusively).
If the second address is a regexp, and then checking for the ending match starts with the line post-obit the line which matched the first address: a range always spans at to the lowest degree two lines (except of course if the input stream ends).
If the second address is a number less than (or equal to) the line matching the starting time address, so only the i line is matched.
GNU sed also supports some special two-address forms; all these are GNU extensions:
0,/ regexp / | A line number of 0 can be used in an address specification like 0,/ regexp / so that sed will endeavor to match regexp in the starting time input line too. In other words, 0,/ regexp / is similar to 1,/ regexp /, except that if addr2 matches the very get-go line of input the 0,/ regexp / form volition consider information technology to cease the range, whereas the 1,/ regexp / form volition match the offset of its range and hence make the range span up to the 2d occurrence of the regular expression. Note that this is the only place where the 0 accost makes sense; there is no "0th" line, and commands that are given the 0 address in any other way gives an fault. |
addr1 ,+ North | Matches addr1 and the Northward lines following addr1. |
addr1 ,~ N | Matches addr1 and the lines following addr1 until the side by side line whose input line number is a multiple of North. |
Appending the ! character to the end of an address specification negates the sense of the match. That is, if the ! character follows an accost range, then merely lines which do not match the address range will be selected. This also works for singleton addresses, and, perhaps perversely, for the aught address.
Overview of regular expression syntax
To know how to utilise sed, empathise regular expressions ("regexp" for short). A regular expression is a pattern that is matched against a discipline string from left to correct. Well-nigh characters are ordinary: they stand for themselves in a blueprint, and match the corresponding characters in the subject. As a elementary example, the blueprint
The quick dark-brown fox
...matches a portion of a subject cord that is identical to itself. The power of regular expressions comes from the ability to include alternatives and repetitions in the pattern. These are encoded in the pattern by the utilize of special characters, which do not stand for themselves but instead are interpreted in some special way. Here is a cursory description of regular expression syntax every bit used in sed:
char | A single ordinary grapheme matches itself. |
* | Matches a sequence of zero or more than instances of matches for the preceding regular expression, which must exist an ordinary character, a special graphic symbol preceded past "\", a ".", a grouped regexp (encounter below), or a bracket expression. As a GNU extension, a postfixed regular expression can also be followed by "*"; for example, a** is equivalent to a*. POSIX 1003.1-2001 says that * stands for itself when it appears at the start of a regular expression or subexpression, simply many non-GNU implementations exercise non back up this, and portable scripts should instead employ "\*" in these contexts. |
\+ | Similar *, just matches one or more. Information technology is a GNU extension. |
\? | Like *, only but matches aught or ane. It is a GNU extension. |
\{ i \} | Like *, but matches exactly i sequences (i is a decimal integer; for compatibility, go along it betwixt 0 and 255, inclusive). |
\{ i , j \} | Matches between i and j, inclusive, sequences. |
\{ i ,\} | Matches more than or equal to i sequences. |
\( regexp \) | Groups the inner regexp as a whole; this is used to:
|
. | Matches any grapheme, including a newline. |
^ | Matches the null cord at beginning of the pattern space, i.due east., what appears after the ^ must appear at the beginning of the pattern space. In most scripts, design space is initialized to the content of each line. So, it is a useful simplification to think of ^#include as matching only lines where '#include' is the first thing on line—if there are spaces before, for example, the match fails. This simplification is valid equally long every bit the original content of pattern infinite is non modified, for case with an southward command. ^ acts as a special graphic symbol only at the beginning of the regular expression or subexpression (that is, later \( or \|). Portable scripts should avoid ^ at the beginning of a subexpression, though, as POSIX allows implementations that treat ^ every bit an ordinary character in that context. |
$ | It is the same as ^, only refers to end of pattern space. $ too acts as a special character only at the end of the regular expression or subexpression (that is, before \) or \|), and its apply at the end of a subexpression is not portable. |
[ list ] [^ list ] | Matches whatever single graphic symbol in listing: for instance, [aeiou] matches all vowels. A list may include sequences like char1 - char2, which matches any character between char1 and char2. For case, [b-e] matches whatever of the characters b, c, d, or e. A leading ^ reverses the meaning of list, then that information technology matches any single grapheme not in list. To include ] in the list, brand information technology the kickoff graphic symbol (after the ^ if needed); to include - in the listing, brand information technology the showtime or last; to include ^ put it after the first character. The characters $, *, ., [, and \ are usually not special inside list. For instance, [\*] matches either '\' or '*', because the \ is not special here. Even so, strings like [.ch.], [=a=], and [:space:] are special within list and represent collating symbols, equivalence classes, and character classes, respectively, and [ is therefore special within listing when it is followed by ., =, or :. Also, when not in POSIXLY_CORRECT mode, special escapes similar \due north and \t are recognized within list. See escapes for more data. |
regexp1 \| regexp2 | Matches either regexp1 or regexp2. Use parentheses to use complex alternative regular expressions. The matching process tries each alternative in turn, from left to right, and the first 1 that succeeds is used. This choice is a GNU extension. |
regexp1regexp2 | Matches the concatenation of regexp1 and regexp2. Concatenation binds more tightly than \|, ^, and $, just less tightly than the other regular expression operators. |
\ digit | Matches the digit-thursday \(...\) parenthesized subexpression in the regular expression. This option is called a back reference. Subexpressions are implicitly numbered by counting occurrences of \( left-to-right. |
\n | Matches the newline grapheme. |
\ char | Matches char, where char is one of $, *, ., [, \, or ^. Annotation that the only C-like backslash sequences that you can portably assume to be interpreted are \n and \\; in detail \t is not portable, and matches a 't' under about implementations of sed, rather than a tab graphic symbol. |
Note that the regular expression matcher is greedy, i.east., matches are attempted from left to correct and, if two or more matches are possible starting at the same character, it selects the longest.
For instance:
abcdef | Matches "abcdef". |
a*b | Matches zip or more "a" characters, followed past a single "b". For example, "b" or "aaaaaaab". |
a\?b | Matches "b" or "ab". |
a\+b\+ | Matches 1 or more "a" characters followed past 1 or more "b"s. "ab" is the shortest possible match, but other examples are "aaaaab", "abbbbbb", or "aaaaaabbbbbbb". |
.* or .\+ | Either of these expressions will match all of the characters in a not-empty cord, merely but .* will lucifer the empty string. |
^main.*(.*) | This matches a cord starting with "primary", followed by an opening and closing parenthesis. The "n", "(" and ")" demand not be adjacent. |
^# | This matches a cord beginning with "#". |
\\$ | This matches a string ending with a unmarried backslash. The regexp contains ii backslashes for escaping. |
\$ | This matches a cord consisting of a single dollar sign. |
[a-zA-Z0-9] | In the C locale, this matches any ASCII letters or digits. |
[^ tab ]\+ | (Here tab stands for a single tab graphic symbol.) This matches a string of i or more characters that does non comprise a space or a tab. Usually this means a discussion. |
^\(.*\)\n\1$ | This matches a cord consisting of ii equal substrings separated by a newline. |
.\{9\}A$ | This matches nine characters followed by an 'A'. |
^.\{xv\}A | This matches the get-go of a string that contains 16 characters with the final character of existence 'A'. |
Frequently-used commands
If you use sed at all, you will probably desire to know these commands.
# | (No addresses allowed with this command.) The # character begins a comment; the annotate continues until the adjacent newline. If you are concerned about portability, be aware that some implementations of sed (which are not POSIX conformant) may only support a single one-line annotate, and then only when the very starting time grapheme of the script is a #. Warning: if the first 2 characters of the sed script are #n, then the -due north (no-autoprint) selection is forced. If yous want to put a comment in the start line of your script and that comment begins with the letter 'n' and yous practise non want this behavior, then either utilize a capital 'N', or identify at least one space before the 'n'. |
q [go out-code] | This command merely accepts a single accost. Exit sed without processing whatsoever more commands or input. Note that the current pattern infinite is printed if auto-impress is not disabled with the -n options. The ability to return an exit code from the sed script is a GNU sed extension. |
d | Delete the pattern space; immediately start next cycle. |
p | Print out the pattern infinite (to the standard output). This command is commonly only used in conjunction with the -n command-line choice. |
due north | If machine-print is not disabled, impress the blueprint space, and then, regardless, replace the pattern infinite with the next line of input. If there is no more than input then sed exits without processing whatsoever more commands. |
{ commands } | A group of commands may be enclosed between { and } characters. This choice is particularly useful when you want a grouping of commands to be triggered by a single address (or address-range) match. |
The south command
The syntax of the s control (which stands for "substitute") is: 'south/ regexp / replacement / flags'. The / characters may be uniformly replaced by any other unmarried graphic symbol within any given s command. The / graphic symbol (or whatever other character is used in its stead) tin announced in the regexp or replacement but if it's preceded by a \ character.
The due south command is probably the about of import in sed and has a lot of dissimilar options. Its basic concept is simple: the southward command attempts to match the blueprint space against the supplied regexp; if the match is successful, then that portion of the pattern space which was matched is replaced with replacement.
The replacement can contain \ n (northward existence a number from 1 to 9, inclusive) references, which refer to the portion of the match that is contained between the nth \( and its matching \). Also, the replacement can comprise unescaped & characters which reference the whole matched portion of the pattern infinite. Finally, as a GNU sed extension, you can include a special sequence made of a backslash and one of the letters L, l, U, u, or Due east. The meaning is every bit follows:
\Fifty | Turn the replacement to lowercase until a \U or \E is found |
\fifty | Turn the side by side character to lowercase |
\U | Turn the replacement to uppercase until a \L or \East is found |
\u | Turn the next character to uppercase |
\E | Stop instance conversion started by \L or \U |
To include a literal \, &, or newline in the final replacement, precede the desired \, &, or newline in the replacement with a \.
The s command can exist followed by naught or more than of the following flags:
thousand | Use the replacement to all matches to the regexp. |
number | Only supervene upon the number 'thursday match of the regexp. Note: the POSIX standard does not specify what should happen when you mix the k and number modifiers, and currently there is no widely agreed upon significant across sed implementations. For GNU sed, the interaction is divers to be: ignore matches before the numberthursday, so lucifer and supplant all matches from the numberth on. |
p | If the exchange was made, then print the new pattern space. Note: when both the p and due east options are specified, the relative ordering of the two produces very unlike results. In general, ep (evaluate and so print) is what yous want, simply operating the other fashion round can be useful for debugging. For this reason, the current version of GNU sed interprets especially the presence of p options both earlier and after e, printing the pattern space before and afterwards evaluation, while in full general flags for the southward command show their upshot once. This beliefs, although documented, might change in future versions. |
w file | If the exchange was made, then write out the issue to the named file. As a GNU sed extension, two special values of file are supported: /dev/stderr, which writes the effect to the standard mistake, and /dev/stdout, which writes to the standard output. |
east | This command allows one to pipe input from a crush command into pattern space. If a substitution was fabricated, the command plant in blueprint infinite is executed and pattern space is replaced with its output. A trailing newline is suppressed; results are undefined if the command to be executed contains a null character. This option is a GNU sed extension. |
I, i | The I modifier to regular-expression matching is a GNU extension which makes sed match regexp in a instance-insensitive manner. |
One thousand, m | The Grand modifier to regular-expression matching is a GNU sed extension which causes ^ and $ to match respectively (in addition to the normal beliefs) the empty string after a newline, and the empty string earlier a newline. There are special character sequences (\` and \') which always friction match the first or the end of the buffer. M stands for multi-line. |
Less frequently-used commands
Though peradventure less oftentimes used than those in the previous section, some very small notwithstanding useful sed scripts tin exist built with these commands.
y/ source-chars / dest-chars / | (The / characters may be uniformly replaced by any other single character within any given y command.) Transliterate whatsoever characters in the blueprint space which match whatsoever of the source-chars with the corresponding graphic symbol in dest-chars. Instances of the / (or whatever other character is used instead), \, or newlines can announced in the source-chars or dest-chars lists, provide that each instance is escaped by a \. The source-chars and dest-chars lists must incorporate the same number of characters (after de-escaping). |
a\ text | Equally a GNU extension, this command accepts two addresses. Queue the lines of text which follow this command (each merely the last ending with a \, which are removed from the output) to be output at the end of the current bike, or when the next input line is read. Escape sequences in text are candy, so employ \\ in text to print a unmarried backslash. Every bit a GNU extension, if between the a and the newline there is other than a whitespace-\ sequence, then the text of this line, starting at the first non-whitespace grapheme later on the a, is taken as the first line of the text cake. (This enables a simplification in scripting a i-line add.) This extension also works with the i and c commands. |
i\ text | As a GNU extension, this control accepts 2 addresses. Immediately output the lines of text which follow this command (each merely the last ending with a \, which are removed from the output). |
c\ text | Delete the lines matching the accost or address-range, and output the lines of text which follow this command (each but the last catastrophe with a \, which are removed from the output) in place of the concluding line (or in place of each line, if no addresses were specified). A new cycle is started afterwards this command is done, since the pattern space will be deleted. |
= | Every bit a GNU extension, this command accepts 2 addresses. Impress out the current input line number (with a abaft newline). |
l northward | Print the pattern space in an unambiguous class: non-printable characters (and the \ character) are printed in C-way escaped class; long lines are carve up, with a trailing \ character to indicate the split; the stop of each line is marked with a $. northward specifies the desired line-wrap length; a length of 0 (nada) means to never wrap long lines. If omitted, the default as specified on the command line is used. The n parameter is a GNU sed extension. |
r file name | As a GNU extension, this command accepts ii addresses. Queue the contents of file name to exist read and inserted into the output stream at the cease of the electric current cycle, or when the next input line is read. Notation that if file proper noun cannot exist read, it is treated every bit if information technology were an empty file, without whatever mistake indication. As a GNU sed extension, the special value /dev/stdin is supported for the file proper noun, which reads the contents of the standard input. |
w file name | Write the pattern space to file name. As a GNU sed extension, two special values of file name are supported: /dev/stderr, which writes the result to the standard fault, and /dev/stdout, which writes to the standard output. The file is created (or truncated) before the first input line is read; all w commands (including instances of the w flag on successful southward commands) which refer to the same file proper noun are output without closing and reopening the file. |
D | If pattern space contains no newline, start a normal new bike every bit if the d command was issued. Otherwise, delete text in the design space up to the commencement newline, and restart cycle with the resultant pattern space, without reading a new line of input. |
N | Add a newline to the pattern infinite, then append the next line of input to the design space. If there is no more input then sed exits without processing whatever more commands. |
P | Print out the portion of the blueprint space up to the starting time newline. |
h | Supercede the contents of the hold space with the contents of the blueprint infinite. |
H | Append a newline to the contents of the concord space, and then append the contents of the pattern space to that of the hold infinite. |
g | Replace the contents of the pattern space with the contents of the concur infinite. |
G | Append a newline to the contents of the blueprint infinite, and and then append the contents of the concur space to that of the pattern space. |
x | Exchange the contents of the hold and pattern spaces. |
Commands for sed gurus
In most cases, use of these commands indicates that you are probably better off programming in something similar awk or Perl. Just occasionally one is committed to sticking with sed, and these commands can enable one to write quite convoluted scripts.
: label | [No addresses allowed with this command.] Specify the location of characterization for branch commands. In all other respects, a no-op (no operation performed). |
b label | Unconditionally branch to label. The label may be omitted, in which example the next wheel is started. |
t label | Co-operative to label only if at that place was a successful substitution since the concluding input line was read or conditional co-operative was taken. The characterization may be omitted, in which example the next wheel is started. |
Commands specific to GNU sed
These commands are specific to GNU sed, and so y'all must use them with care and merely when you lot are certain that the script doesn't need to exist ported. They allow you to check for GNU sed extensions or practice tasks that are required quite often, yet are unsupported by standard seds.
e [control] | This command allows one to pipe input from a shell control into blueprint space. Without parameters, the e command executes the command plant in the blueprint space and replaces the blueprint space with the output; a trailing newline is suppressed. If a parameter is specified, instead, the e command interprets it as a command and sends its output to the output stream (like r does). The command tin run across multiple lines, all but the last catastrophe with a back-slash. In both cases, the results are undefined if the command to be executed contains a null grapheme. |
F | Print out the file name of the current input file (with a abaft newline). |
50 due north | This GNU sed extension fills and joins lines in pattern infinite to produce output lines of (at most) n characters, like fmt does; if n is omitted, the default equally specified on the control line is used. This command is considered a failed experiment and unless there is enough request (which seems unlikely) will be removed in future versions. |
Q [leave-code] | This command merely accepts a unmarried address. This control is the same as q, but will not impress the contents of pattern space. Like q, it provides the ability to return an get out code to the caller. This command can be useful because the simply alternative ways to accomplish this apparently trivial function are to use the -due north choice (which can unnecessarily complicate your script) or resorting to the following snippet, which wastes time by reading the whole file without any visible event: :consume #Quit silently on the last line: $d #Read another line, silently: Northward #Overwrite pattern space each fourth dimension to relieve memory: k b eat. |
R file name | Queue a line of file name to exist read and inserted into the output stream at the end of the current bicycle, or when the side by side input line is read. Note that if file name cannot be read, or if its stop is reached, no line is appended, without any fault indication. As with the r control, the special value /dev/stdin is supported for the file proper name, which reads a line from the standard input. |
T label | Branch to label merely if there was no successful substitutions since the final input line was read or conditional co-operative was taken. The characterization may be omitted, in which case the adjacent bicycle is started. |
five version | This command does nothing, but makes sed fail if GNU sed extensions are non supported, considering other versions of sed practice non implement it. Also, you tin can specify the version of sed your script requires, such as four.0.5. The default is 4.0 because that is the commencement version that implemented this command. This command enables all GNU extensions even if POSIXLY_CORRECT is prepare in the surround. |
Due west file proper noun | Write to the given file name the portion of the design space upwardly to the get-go newline. Everything said under the due west control well-nigh file handling holds here too. |
z | This control empties the content of design space. It is usually the aforementioned as 's/.*//', simply is more efficient and works in the presence of invalid multibyte sequences in the input stream. POSIX mandates that such sequences are not matched by '.', and then that there is no portable style to articulate sed's buffers in the center of the script in most multibyte locales (including UTF-8 locales). |
GNU extensions for escapes in regular expressions
Until now (on this page, anyway), nosotros have only encountered escapes of the grade '\^', for example, which tell sed non to interpret the circumflex (caret) as a special grapheme, but rather to take information technology literally. For another example, '\*' matches a unmarried asterisk rather than zero or more backslashes.
This section introduces another kind of escape—that is, escapes that are practical to a character or sequence of characters that ordinarily are taken literally, and that sed replaces with a special character. This provides a way of encoding non-printable characters in patterns in a visible way. In that location is no brake on the advent of non-printing characters in a sed script, simply when a script is being prepared in the shell or by text editing, it is usually easier to use one of the following escape sequences than the binary character it represents:
\a | Produces or matches a bel character, that is an "alert" (ASCII seven). |
\f | Produces or matches a form feed (ASCII 12). |
\northward | Produces or matches a newline (ASCII 10). |
\r | Produces or matches a carriage return (ASCII 13). |
\t | Produces or matches a horizontal tab (ASCII 9). |
\v | Produces or matches a and so chosen "vertical tab" (ASCII 11). |
\c x | Produces or matches Control- x, where x is whatever character. The precise effect of '\c 10' is as follows: if x is a lowercase alphabetic character, it is converted to uppercase. So flake 6 of the character (hex forty) is inverted. Thus '\cz' becomes hex 1A, but '\c{' becomes hex 3B, while '\c;' becomes hex 7B. |
\d xxx | Produces or matches a character whose decimal ASCII value is xxx. |
\o xxx | Produces or matches a grapheme whose octal ASCII value is 30. |
\x xx | Produces or matches a character whose hexadecimal ASCII value is xx. |
'\b' (backspace) was omitted because of the conflict with the existing "word purlieus" meaning.
Other escapes match a detail character grade and are valid just in regular expressions:
\w | Matches any "word" grapheme. A "word" grapheme is whatsoever letter of the alphabet or digit or the underscore character. |
\W | Matches whatever "non-give-and-take" character. |
\b | Matches a give-and-take boundary; that is, it matches if the graphic symbol to the left is a "discussion" character and the character to the correct is a "non-give-and-take" character, or vice versa. |
\B | Matches everywhere but on a word boundary; that is it matches if the graphic symbol to the left and the graphic symbol to the right are either both "word" characters or both "non-word" characters. |
\` | Matches simply at the commencement of pattern space. This option is different from ^ in multi-line mode. |
\' | Matches only at the cease of pattern space. This choice is different from $ in multi-line mode. |
Sample scripts
Here are some sed scripts to guide you in the art of mastering sed.
Sample script: centering lines
This script centers all lines of a file on 80 columns width. To alter that width, the number in \{...\} must be replaced, and the number of added spaces also must be inverse.
Note how the buffer commands are used to separate parts in the regular expressions to be matched, which is a common technique.
#!/usr/bin/sed -f # Put 80 spaces in the buffer 1 { x s/^$/ / s/^.*$/&&&&&&&&/ x } # del leading and trailing spaces y/tab/ / s/^ *// s/ *$// # add a newline and 80 spaces to end of line Grand # go on first 81 chars (80 + a newline) s/^\(.\{81\}\).*$/\ane/ # \2 matches half of the spaces, which are moved to the starting time s/^\(.*\)\n\(.*\)\2/\two\1/
Sample script: increment a number
This script is i of a few that demonstrate how to practice arithmetic in sed. This script is indeed possible, just must be washed manually.
To increment 1 number you add together 1 to last digit, replacing it by the following digit. There is one exception: when the digit is a ix the previous digits must be likewise incremented until you don't have a nine.
This solution is very clever and smart because it uses a single buffer; if you lot don't have this limitation, the algorithm used in Numbering Lines is faster. Information technology works past replacing trailing nines with an underscore, then using multiple s commands to increment the last digit, and so again substituting underscores with zeros.
#!/usr/bin/sed -f /[^0-9]/ d # supplant all leading 9s past _ (any other character except digits, could # be used) :d s/9\(_*\)$/_\i/ td # incr last digit only. The outset line adds a about-meaning # digit of 1 if we have to add together a digit. # # The tn commands are not necessary, merely make the thing # faster s/^\(_*\)$/1\1/; tn s/8\(_*\)$/nine\i/; tn s/7\(_*\)$/eight\1/; tn due south/6\(_*\)$/7\1/; tn s/5\(_*\)$/six\i/; tn southward/4\(_*\)$/5\1/; tn south/3\(_*\)$/4\one/; tn s/2\(_*\)$/iii\one/; tn s/one\(_*\)$/2\1/; tn s/0\(_*\)$/one\i/; tn :n y/_/0/
Sample script: rename files to lowercase
This script is a pretty strange use of sed. We transform text, and transform it to exist shell commands, then feed them to crush. Don't worry, fifty-fifty worse hacks are done when using sed. Scripts take fifty-fifty been written converting the output of date into a bc program... So, stranger things have happened.
The main body of this is the sed script, which remaps the name from lower to upper (or vice versa) and even checks out if the remapped name is the same equally the original name. Annotation how the script is parameterized using trounce variables and proper quoting.
#! /bin/sh # rename files to lower/upper case... # # usage: # movement-to-lower * # motility-to-upper * # or # movement-to-lower -R . # motility-to-upper -R . # help() { true cat << eof Usage: $0 [-n] [-r] [-h] files... -n do nada, but see what would be washed -R recursive (utilize find) -h this bulletin files files to remap to lower case Examples: $0 -n * (see if everything is ok, and so...) $0 * $0 -R . eof } apply_cmd='sh' finder='repeat "[email protected]" | tr " " "\north"' files_only= while : practice case "$1" in -due north) apply_cmd='cat' ;; -R) finder='find "[email protected]" -blazon f';; -h) aid ; exit 1 ;; *) break ;; esac shift done if [ -z "$1" ]; then repeat Usage: $0 [-h] [-north] [-r] files... exit 1 fi LOWER='abcdefghijklmnopqrstuvwxyz' UPPER='ABCDEFGHIJKLMNOPQRSTUVWXYZ' example `basename $0` in *upper*) TO=$UPPER; FROM=$LOWER ;; *) FROM=$UPPER; TO=$LOWER ;; esac eval $finder | sed -n ' # remove all trailing slashes s/\/*$// # add ./ if in that location is no path, but a file name /\//! south/^/.\// # save path+file proper noun h # remove path s/.*\/// # practise conversion only on file proper noun y/'$FROM'/'$TO'/ # at present line contains original path+file, while # hold space contains the new file name x # add together converted file name to line, which now contains # path/file-name\nconverted-file-name G # cheque if converted file name is equal to original file proper name, # if information technology is, do not print nothing /^.*\/\(.*\)\n\1/b # at present, transform path/fromfile\n, into # mv path/fromfile path/tofile and print it southward/^\(.*\/\)\(.*\)\n\(.*\)$/mv "\1\ii" "\1\3"/p ' | $apply_cmd
Sample script: print bash environment
This script strips the definition of the beat out functions from the output of the ready command in the Bourne-Again shell (bash).
#!/bin/bash set | sed -n ' :ten # if no occurrence of '=()' print and load next line /=()/! { p; b; } / () $/! { p; b; } # possible start of functions section # save the line in case this is a var like FOO="() " h # if the next line has a brace, we quit because # nothing comes after functions north /^{/ q # print the one-time line 10; p # work on the new line now x; bx '
Sample script: reverse characters of lines
This script can reverse the position of characters in lines. The technique moves two characters at a fourth dimension, hence information technology is faster than more intuitive implementations.
Note the tx control earlier the definition of the label. This command is often needed to reset the flag that is tested by the t command.
#!/usr/bin/sed -f /../! b # Contrary a line. Brainstorm embedding the line between ii newlines s/^.*$/\ &\ / # Move first character at the cease. The regexp matches until # there are goose egg or one characters between the markers tx :x southward/\(\north.\)\(.*\)\(.\n\)/\3\2\ane/ tx # Remove the newline markers s/\n//thousand
Sample script: contrary lines of files
This i begins a series of totally useless (nonetheless interesting) scripts emulating various Unix commands. This, in particular, is a tac workalike.
Notation that on implementations other than GNU sed this script might easily overflow internal buffers.
#!/usr/bin/sed -nf # opposite all lines of input, i.e., first line became last, ... # from the second line, the buffer (which contains all previous lines) # is *appended* to current line, and then, the order will exist reversed 1! G # on the concluding line we're done -- print everything $ p # store everything on the buffer once again h
Sample script: numbering lines
This script replaces 'cat -n'; in fact it formats its output exactly like GNU cat does.
Of course this is completely useless for two reasons: first, because somebody else did information technology in C (the true cat control), and 2d, because the following Bourne-shell script could be used for the same purpose and would exist much faster:
#! /bin/sh sed -e "=" [email protected] | sed -e ' s/^/ / Northward southward/^ *\(......\)\n/\1 / '
It uses sed to print the line number, then groups lines two by 2 using Northward. Of course, this script does not teach as much as the ane presented below.
The algorithm used for incrementing uses both buffers, so the line is printed as before long as possible and then discarded. The number is split so that irresolute digits get in a buffer and unchanged ones go in the other; the changed digits are modified in a single step (using a y command). The line number for the next line is then equanimous and stored in the hold space, to exist used in the adjacent iteration.
#!/usr/bin/sed -nf # Prime the pump on the first line ten /^$/ s/^.*$/1/ # Add the correct line number before the pattern G h # Format information technology and print information technology s/^/ / s/^ *\(......\)\n/\1 /p # Become the line number from concord space; add together a null # if we're going to add a digit on the next line g southward/\north.*$// /^9*$/ s/^/0/ # split changing/unchanged digits with an ten due south/.9*$/x&/ # go along irresolute digits in agree infinite h south/^.*ten// y/0123456789/1234567890/ 10 # go on unchanged digits in design space s/x.*$// # etch the new number, remove the newline implicitly added by G G s/\n// h
Sample script: numbering not-blank lines
Emulating 'cat -b' is almost the same as 'cat -northward': we but have to select which lines are to be numbered and which are non.
The part that is mutual to this script and the previous 1 is not commented to show how of import information technology is to annotate sed scripts properly...
#!/usr/bin/sed -nf /^$/ { p b } # Same as cat -n from now ten /^$/ s/^.*$/one/ Thou h s/^/ / southward/^ *\(......\)\n/\one /p x s/\n.*$// /^9*$/ s/^/0/ s/.ix*$/x&/ h south/^.*x// y/0123456789/1234567890/ 10 s/x.*$// G due south/\n// h
Sample script: counting characters
This script shows another style to do arithmetic with sed. In this case, we have to add possibly large numbers, then implementing this by successive increments would not exist feasible (and perchance even more complicated to contrive than this script).
The approach is to map numbers to letters, kind of an abacus implemented with sed. 'a'due south are units, 'b's are tens so on: nosotros add the number of characters on the current line as units, and then propagate the bear to tens, hundreds, and so on.
As usual, running totals are kept in hold space.
On the final line, nosotros convert the abacus form dorsum to decimal. For the sake of diverseness, this is done with a loop rather than with some eighty due south commands: first we convert units, removing 'a'due south from the number; then nosotros rotate messages so that tens become 'a's, and so on until no more letters remain.
#!/usr/bin/sed -nf # Add together northward+1 a'south to hold space (+one is for the newline) s/./a/yard H x s/\n/a/ # Do the acquit. The t's and b'south are not necessary, # simply they do speed up the thing t a : a; s/aaaaaaaaaa/b/g; t b; b done : b; s/bbbbbbbbbb/c/m; t c; b done : c; south/cccccccccc/d/1000; t d; b washed : d; s/dddddddddd/e/thousand; t e; b done : e; southward/eeeeeeeeee/f/g; t f; b done : f; due south/ffffffffff/g/g; t g; b done : g; due south/gggggggggg/h/g; t h; b done : h; s/hhhhhhhhhh//g : done $! { h b } # On the final line, convert back to decimal : loop /a/! due south/[b-h]*/&0/ s/aaaaaaaaa/9/ south/aaaaaaaa/8/ south/aaaaaaa/7/ south/aaaaaa/6/ s/aaaaa/5/ south/aaaa/4/ due south/aaa/iii/ due south/aa/2/ southward/a/one/ : next y/bcdefgh/abcdefg/ /[a-h]/ b loop p
Sample script: counting words
This script is about the same as the previous one, one time each of the words on the line is converted to a unmarried 'a' (in the previous script each letter of the alphabet was changed to an 'a').
It is interesting that real wc programs take optimized loops for 'wc -c', so they are much slower at counting words rather than characters. This script'southward bottleneck, instead, is arithmetic, and hence the give-and-take-counting one is faster (it has to manage smaller numbers).
Again, the common parts are not commented to evidence the importance of commenting sed scripts.
#!/usr/bin/sed -nf # Convert words to a's south/[ tab][ tab]*/ /g s/^/ / s/ [^ ][^ ]*/a /chiliad south/ //g # Append them to concur infinite H 10 s/\n// # From here on it is the same as in wc -c. /aaaaaaaaaa/! bx; southward/aaaaaaaaaa/b/g /bbbbbbbbbb/! bx; s/bbbbbbbbbb/c/k /cccccccccc/! bx; s/cccccccccc/d/yard /dddddddddd/! bx; s/dddddddddd/east/g /eeeeeeeeee/! bx; south/eeeeeeeeee/f/m /ffffffffff/! bx; southward/ffffffffff/g/g /gggggggggg/! bx; south/gggggggggg/h/g south/hhhhhhhhhh//chiliad :x $! { h; b; } :y /a/! s/[b-h]*/&0/ southward/aaaaaaaaa/9/ s/aaaaaaaa/viii/ s/aaaaaaa/7/ southward/aaaaaa/vi/ due south/aaaaa/5/ southward/aaaa/4/ due south/aaa/3/ southward/aa/two/ south/a/ane/ y/bcdefgh/abcdefg/ /[a-h]/ by p
Sample script: counting lines
Sed gives us 'wc -l' functionality for gratuitous. Here is the code:
#!/usr/bin/sed -nf $=
Sample script: printing the showtime lines
This script is probably the simplest useful sed script. It displays the first 10 lines of input; the number of displayed lines is right before the q command.
#!/usr/bin/sed -f 10q
Sample script: printing the last lines
Printing the last n lines rather than the commencement is more complex but indeed possible. The due north is encoded in the second line, earlier the bang ("!") character.
This script is similar to the tac script (above) in that information technology keeps the final output in the concur space and prints information technology at the end:
#!/usr/bin/sed -nf 1! {; H; g; } ane,10 !s/[^\n]*\n// $p h
Mainly, the scripts keeps a window of 10 lines and slides it past adding a line and deleting the oldest (the substitution command on the second line works like a D command only does not restart the loop).
The "sliding window" technique is a very powerful way to write efficient and complex sed scripts, because commands like P would crave a lot of work if implemented manually.
To introduce the technique, which is fully demonstrated in the remainder of this chapter and is based on the North, P and D commands, here is an implementation of tail using a uncomplicated "sliding window."
This looks complicated simply in fact the working concept is the same as the last script: after we accept kicked in the appropriate number of lines, however, we stop using the hold space to go along inter-line state, and instead use N and D to slide design space past one line:
#!/usr/bin/sed -f 1h 2,x {; H; g; } $q ane,9d Northward D
Note how the showtime, second and fourth line are inactive later on the first ten lines of input. Later on that, all the script does is: exiting on the last line of input, appending the next input line to pattern space, and removing the first line.
Sample script: make indistinguishable lines unique
This script is an instance of the art of using the Due north, P and D commands, probably the most hard to principal.
#!/usr/bin/sed -f h :b # On the terminal line, impress and get out $b Due north /^\(.*\)\northward\1$/ { # The two lines are identical. Undo the upshot of # the due north control. yard bb } # If the North control had added the terminal line, impress and get out $b # The lines are different; print the first and go # back working on the 2nd. P D
As you can see, we maintain a ii-line window using P and D. This technique is often used in advanced sed scripts.
Sample script: impress duplicated lines of input
This script prints only duplicated lines, like 'uniq -d'.
#!/usr/bin/sed -nf $b N /^\(.*\)\n\1$/ { # Print the get-go of the duplicated lines due south/.*\n// p # Loop until we get a dissimilar line :b $b N /^\(.*\)\n\1$/ { southward/.*\n// bb } } # The concluding line cannot be followed by duplicates $b # Found a different one. Exit it alone in the pattern space # and go back to the summit, hunting its duplicates D
Sample script: remove all duplicated lines
This script prints merely unique lines, similar 'uniq -u'.
#!/usr/bin/sed -f # Search for a indistinguishable line --- until that, print what you find. $b N /^\(.*\)\n\1$/ ! { P D } :c # Got two equal lines in pattern space. At the # end of the file nosotros exit $d # Else, we keep reading lines with N until we # find a different i s/.*\n// N /^\(.*\)\n\i$/ { bc } # Remove the concluding instance of the duplicate line # and become back to the top D
Sample script: squeezing blank lines
As a concluding example, hither are three scripts, of increasing complexity and speed, that implement the same role as 'cat -s', that is squeezing blank lines.
The starting time leaves a blank line at the first and end if there are some already.
#!/usr/bin/sed -f # on empty lines, join with next # Note there is a star in the regexp :x /^\n*$/ { N bx } # now, squeeze all '\n', this tin can be also washed by: # s/^\(\northward\)*/\one/ s/\n*/\ /
This 1 is a bit more complex and removes all empty lines at the beginning. Information technology does leave a single bare line at end if one was there.
#!/usr/bin/sed -f # delete all leading empty lines one,/^./{ /./!d } # on an empty line nosotros remove it and all the post-obit # empty lines, only one :x /./!{ N s/^\due north$// tx }
This removes leading and trailing blank lines. It is too the fastest. Annotation that loops are completely done with northward and b, without relying on sed to restart the script automatically at the finish of a line.
#!/usr/bin/sed -nf # delete all (leading) blanks /./!d # go here: so there is a non empty :x # print it p # get next due north # got chars? print it over again, etc... /./bx # no, don't have chars: got an empty line :z # go next, if concluding line we finish hither so no trailing # empty lines are written n # also empty? so ignore it, and get next... this will # remove ALL empty lines /./!bz # all empty lines were deleted/ignored, but we take a non empty. As # what we want to do is to squeeze, insert a blank line artificially i\ bx
GNU sed's limitations (and non-limitations)
For those who want to write portable sed scripts, exist enlightened that some implementations are known to limit line lengths (for the pattern and concord spaces) to be no more than 4000 bytes. The POSIX standard specifies that befitting sed implementations shall support at least 8192 byte line lengths. GNU sed has no born limit on line length; as long as it tin classify more (virtual) retentiveness, you can feed or construct lines as long as you similar.
However, recursion is used to handle subpatterns and indefinite repetition. This indicates the bachelor stack space may limit the size of the buffer that can be processed past sure patterns.
Extended regular expressions
The only difference between bones and extended regular expressions is in the behavior of a few characters: '?', '+', parentheses, and braces ('{}'). While basic regular expressions require these to be escaped if you want them to conduct as special characters, when using extended regular expressions y'all must escape them if you lot desire them to match a literal character.
For example:
abc? | Becomes 'abc\?' when using extended regular expressions. It matches the literal string 'abc?'. |
c \+ | Becomes 'c +' when using extended regular expressions. It matches ane or more 'c's. |
a\{3,\} | Becomes 'a{3,}' when using extended regular expressions. It matches three or more than 'a's. |
\(abc\)\{2,3\} | Becomes '(abc){ii,three}' when using extended regular expressions. It matches either 'abcabc' or 'abcabcabc'. |
\(abc*\)\ane | Becomes '(abc*)\one' when using extended regular expressions. Backreferences must nonetheless be escaped when using extended regular expressions. |
Examples
sed G myfile.txt > newfile.txt
Double-spaces the contents of file myfile.txt, and writes the output to the file newfile.txt.
sed = myfile.txt | sed 'N;southward/\n/\. /'
Prefixes each line of myfile.txt with a line number, a period, and a space, and displays the output.
sed 's/test/case/g' myfile.txt > newfile.txt
Searches for the give-and-take "exam" in myfile.txt and replaces every occurrence with the word "example".
sed -n '$=' myfile.txt
Counts the number of lines in myfile.txt and displays the results.
awk — Interpreter for the AWK text processing programming linguistic communication.
ed — A simple text editor.
grep — Filter text which matches a regular expression.
supervene upon — A string-replacement utility.
Source: https://www.computerhope.com/unix/used.htm
0 Response to "Read to Second Underscore of Filename Sed"
Enregistrer un commentaire