Friday, February 18, 2011

Reporting Memory Utilization and the Memory Cap

# rcapstat -g
id project nproc vm rss cap at avgat pg avgpg
376565 rcap 0 0K 0K 10G 0K 0K 0K 0K
physical memory utilization: 55% cap enforcement threshold: 0%

Before that we need to enable rcapd daemon.

Enable the resource capping daemon in one of the followingways:
■ Turn on resource capping using the svcadm command.
# svcadm enable rcap
■ Enable the resource capping daemon so that it will be started now and also be started each
time the system is booted, type:
# rcapadm -E
■ Enable the resource capping daemon at boot without starting it now by also specifying the
-n option:
# rcapadm -n -E

Thursday, February 17, 2011

How To : Sun Routing Support Document/FAQ

Sun Routing Support Document/FAQ [ID 1010268.1]

--------------------------------------------------------------------------------

Modified 27-DEC-2010 Type HOWTO Migrated ID 214095 Status PUBLISHED
Steps to Follow
The following Document assumes a PATH variable that contains /usr/sbin and
/usr/bin. For commands or files that normally exist outside these common
directories, full paths are specified.
href="http://sunsolve.sun.com/search/document.do?assetkey=1-1-1113271-1">Click Here Synopsis: ip_forwarding should be configurable via /etc/system
Bug Id: 4135171 Click Here Synopsis: 2.6 routing issue should be documented
Bug Id: 4052347 Click Here Synopsis: Man page for ip needs updated for new ip_forwarding values in Solaris 2.6

To force ip_forwarding on, add the ndd change in the startup script.
ndd -set /dev/ip ip_forwarding 1
The Netra implements a S99routing
(/opt/netra/networking/routing/bin/boot.conf)
that turns it on
#!/bin/sh
ndd -set /dev/ip ip_forwarding 1
ndd -set /dev/ip ip_forward_src_routed 0
---
2.7 ip man page update:
"...The IP layer will normally act as a router (forwarding datagrams that are not addressed to it, among other things)
when the machine has two or more interfaces that are up.
... When the IP module is loaded, ip_forwarding is 0 and remains so if:

o only one non-DHCP-managed interface is up (the most common case)
o the file /etc/notrouter exists and DHCP does not say that IP forwarding is on
o the file /etc/defaultrouter exists and DHCP does not say IP forwarding is on

Otherwise, ip_forwarding will be set to 1."

4.2: Basic Routing Problems

This sections outlines a very common routing problems. The techniques described in Section 2.0 will be needed to resolve more
difficult routing problems.

Q: Why can I only ping machines on my subnet?
A1: You are using in.routed (this is how a machine ships by default), but are not propagating RIP packets on your network. You should
define a default router, as described in Section 3.1.

A2: Your netmask is set incorrectly on the machine which can not ping. Sections 3.8 and 3.9 describe how to define a permanent custom
netmask.

A3: You have a default router defined, but that machine is not forwarding the IP packets. If that router is a Sun machine, you should
make sure that it has _not_ been modified as described in Section 3.6 or 3.7. If it is a non-Sun machine, SunService can not provide
further assistance in resolving why it might not be forwarding IP packets.

4.3: in.routed Errors

Q: Why does in.routed constantly generate the following error:

"packet from unknown router, x.x.x.x"

A: This occurs because your machine is receiving broadcast packets from a router that is on a different subnet. Since broadcast
packets will not typically cross subnet boundaries, this usually means that you have machines from two subnets on the same physical
wire.

Often, this is a mistake if you see this error, and do not expect to have multiple networks on the same wire, you should track down
the source machine (x.x.x.x), and fix its IP address.

However, there are cases where this setup might be intentional, as outlined in RFC 1597. This would imply that you had several
networks all using the same physical wire. Unfortunately, this is not currently supported correctly.

If you are on a SunOS machine, just install patch 100283 Click Here and this problem will go away.

Under Solaris, a fix for this issue is still pending. If the errors are too much, you should revert to using a default router, as
described in Section 3.1, until the problem is corrected.

Q: Why does in.routed keep bringing my PPP interface up?
A: in.routed automatically sends out RIP packets every 30 seconds. This will keep your PPP interface up. To prevent this, you can put
the following entry in your /etc/gateways file:

%%%% cat /etc/gateways
norip ipdptp0

This will allow you to continue running in.routed, without have it keep your PPP interface constantly up. Another Tip Sheet exists
for PPP which explains many routing problems associated with a PPP interface.

4.4: netmask and broadcast Problems

Q: why is my netmasks entry being ignored?

A1: The network address listed in the netmasks file is a subnet instead of the base network. Remember that the network address
should just be # if you are on a class A, #.# if you are on a class B, and #.#.# if you are on a class C. Section 1.3 has a listing
of which IP addresses belong to which classes.

A2: Although you have added the netmasks to /etc/netmasks, you have not put it in NIS. After NIS is brought up, your /etc/netmasks
file is no longer consulted. Make sure to duplicate any netmasks entries in your NIS maps.

Q: Why do I get segmentation faults on boot after changing my netmasks file?
A: This is a known bug in some Solaris systems. If you examine your /etc/netmasks file, you will find that it contains blank lines.
Remove them, reboot, and your system will come up clean.

Q: Does Sun support variable-length subnetting, per RFC 1219?
A: Solaris 2.6 or above supports VLSM and CIDR.

Q1: Why can't I add a netmask for a remote subnet?
Q2: If I add a netmask for a subnet not directly connected to my machine, it is ignored. Why?
A: Sun systems only support netmasks for networks that are directly connected to a machine. This means that if you try and add a
netmask for a network that is not directly connected to your machine, it will fail. If you want to route to a remote network which
is subnetted, the correct method is to route to a machine that is directly connected to the subnetted network, and then allow that
machine to route to the appropriate subnetwork.

When RFC 1219 is implemented on Suns, this functionality may become available.

Q: How do I make SunOS 4.x limited broadcast all 1's like solaris2.
A. Manually specify the broadcast,
---------
example:
# ifconfig le0
le0: flags=63
inet 1207.48.123.87 netmask ffffff00 broadcast 207.48.123.0
ether 8:0:20:1a:f:83

# ifconfig le0 broadcast 129.151.21.255

# ifconfig le0
le0: flags=63
inet 207.48.123.87 netmask ffffff00 broadcast 207.48.123.255
ether 8:0:20:1a:f:83

---------
Add a ifconfig line to /etc/rc.local to take effect on bootup.
...
# set the netmask from NIS if running, or /etc/netmasks for all ether
interfaces
ifconfig -a netmask + broadcast + > /dev/null
ifconfig le0 broadcast 207.48.123.255


4.5: traceroute information and Problems

Sun does not support the public domain traceroute package.

New :
Solaris 2.7 software bundles the popular traceroute utility. The traceroute utility is used to trace the route an IP packet follows
to an Internet host. Traceroute uses the IP protocol ttl (time to live) field and attempts to elicit an ICMP TIME_EXCEEDED response
from each gateway along the path, and PORT_UNREACHABLE (or ECHO_REPLY) from the destination host. The traceroute utility starts
sending probes with a ttl of 1, and increases by one until it gets to the intended host or has passed through a maximum number of
intermediate hosts.

The traceroute utility is especially useful for determining routing configuration problems and routing path failures. If a particular
host is unreachable, the traceroute utility can be used to see what path the packet follows to the intended host and where possible
failures occur. The traceroute utility also displays the round-trip time for each gateway along the path to the target host. This
information can be useful for analyzing where traffic is slow between the two hosts.

For more information, see TCP/IP and Data Communications Administration Guide.

The below questions simply note some problems that we have run into with traceroute, and the simple steps that can be taken to
correct them. Further problems or errors with traceroute should be directed to the traceroute author.

Q: Why does traceroute fail on every other packet, under Solaris:

# traceroute psi
traceroute to psi (150.101.16.28), 30 hops max, 40 byte packets
1 psi (150.101.16.28) 6 ms * 2 ms


A: Solaris introduced a variable called 'ip_icmp_err_interval' which enforces a minimum time in between ICMP error messages.
Traceroute depends upon ICMP error messages, and tends to send them very fast. As a result, if ip_icmp_err_interval is set to the
default value on a Solaris machine, traceroute's second packet will always get dropped, as shown above. You can disable this Solaris
feature by adjusting the ndd variable:

# ndd -set /dev/ip ip_icmp_err_interval 0

If you want this change to be permanent, you should add it to the file /etc/rc2.d/S69inet.

However, since every single Solaris machine between you and your destination must change this variable, it is often more reasonable
just to let every second packet get dropped, and not worry about it.

4.6: arp Problems

Q: Why does arp fail with the following message:

# arp -s system 0:8:20:1:2:3 pub
"system: No such device or address"

A: This occurs if you try and use arp to add a machine that is not on a local network. You may only arp entries for hosts on subnets
that are directly connected to your machine.

5.0 References

5.1: Important Man Pages

arp
in.rdisc (Solaris only)
netmasks
route
routed
routing

5.2: Sunsolve Documents

There are a number of Sunsolve documents concerning routing, subnetting and netmasks. The ones listed below are simply those which
contain some amount of information which is not already in this document.

5.2.1: Technical instruction

Technical Instruction < Solution: 211537 > Routing and delivery of datagrams

5.2.2: Problem Resolutions

4713 ARP Thrashing in Large Bridged Networks
Problem Resolution Document: 1008617.1 booting diskless client fails with ARP/RARP timeout
5986 Subnetting

5.3: Third Party Documentation

_TCP/IP Illustrated, Volume 1_, by W. Richard Stevens, published by Addison-Wesley Publishing Company, ISBN 0-201-63345-9

An excellent text explaining how the various TCP/IP protocols work. The chapters on arp, ping, traceroute, IP routing, dynamic
routing, broadcasting and multicasting are all very useful in gaining an understanding of the way that the routing protocols
work.

_Internetworking With TCP/IP Volume 1_, by Douglas Comer

Another book with good information on networking.

5.4: RFCs

RFCs are the internet-written documents that define the specifications of many common networking programs. RFCs can be retrieved
from http://www.faqs.org/rfcs.

A very large number of RFCs describe routing over the internet. Included below are just some of the RFCs which most directly cover
the topics described in this document.

917 Internet subnets

Original specs for subnets and netmasks.

950 Internet Standard Subnetting Procedure

Additional notes on subnetting.

1219 On the Assignment of Subnet Numbers

Suggestions for additions to the subnet specs, which have not yet been incorporated by Sun.

1256 ICMP router discovery messages

Specs for the Router discovery protocol, as implemented in in.rdisc.

6.0: Supportability

Sun is not responsible for the initial configuration of your network routing. In addition, we can not help resolve routing
problems caused by your routers, bridges or other machines.

SunService can help resolve problems where your Sun is not sending packets correctly to its next-hop router, or where one of your
Sun's routing daemons is not responding correctly, but in such cases, the contact must be a system administrator who has a good
understanding of the network layout, and its routing design.

7.0: Additional Support

For overall routing configuration, please contact your local SunService office for possible consulting offerings. Sun's Customer
Relations organization can put you in touch with your local SunIntegration or Sales office. You can reach Customer Relations at
800-821-4643.


Product
Solaris

How to : Check the SAN Disk Connection Status

How to : Check the SAN Disk Connection Status [ID 1012427.1]
________________________________________

Applies to:
Solaris SPARC Operating System - Version: 8.0 and later [Release: 8.0 and later ]
All Platforms
Goal
The objective of this document is to provide a general guideline of using luxadm and cfgadm commands to check the status of SAN device.
Solution
Steps to Follow:

To simplify the scenario, inter-connected devices (switch or director) will not be considered in this topic. From the server side, the following items are being checked :

1. HBA connection status
2. LUN connection status

HBA connection status:
=====================
* Check /var/adm/message after system boot
Example:
Lun=0 for target=41500 disappeared or FCP: target=41500 reported NO Luns
path /pci@1f,4000/SUNW,qlc@4/fp@0,0 (fp6) to target address:
50020f2300002a7b,0 is offline

Remark:
1. status should be "online"
* luxadm -e port

Example:
/devices/pci@1f,4000/SUNW,qlc@4/fp@0,0:devctl CONNECTED
/devices/pci@1f,4000/SUNW,qlc@4,1/fp@0,0:devctl CONNECTED

Remark:
1. Dual port HBA should see 2 entries
2. Status should be "CONNECTED"

Note: The fact that the output of "luxadm -e port" shows a CONNECTED state
does not always mean that there are LUNs mapped to the target device.
Condition like this can occur for HBAs that are connected to a Hitachi
storage array, where the target device is presented, but no LUNs mapped.
Additionally, some arrays or storage devices present LUNs used for control
or monitoring functions (such as SES - SCSI Enclosure Services)

LUN connection status:
=====================

* cfgadm -al
Example:
c5 fc-fabric connected unconfigured unknown
c5::50020f2300000cab disk connected configured unknown
c5::50020f2300002a7b disk connected configured unknown

* cfgadm -al -o show_FCP_dev
c5 fc-fabric connected unconfigured unknown
c5::50020f2300000cab,0 disk connected configured unknown
c5::50020f2300000cab,1 disk connected configured unknown
c5::50020f2300000cab,2 disk connected configured unknown
c5::50020f2300002a7b,0 disk connected configured unknown
c5::50020f2300002a7b,1 disk connected configured unknown
c5::50020f2300002a7b,2 disk connected configured unknown

Remarks:
1. Receptacle should be "connected"
2. Occupant should be "configured"
3. Note the luns mapped to each target device

* luxadm probe
Example:
luxadm probe
Node WWN:50020f2000000caa Device Type:Disk device
Logical Path:/dev/rdsk/c10t60020F2000000CAA3DABB7B10001D2D4d0s2

Remark:
1. The Universial device name "c10t60020F2000000CAA3DABB7B10001D2D4d0s2"
is generated based on the WWN of the storage device

* luxadm disp
Example:
luxadm disp /dev/rdsk/c10t60020F2000000CAA3DABB7B10001D2D4d0s2
DEVICE PROPERTIES for disk:
/dev/rdsk/c5t600015D0002109000000000000000104d0s2
Status(Port A): O.K.
Status(Port B): O.K.
Vendor: SUN
Product ID: SE6920
WWN(Node): 230000015d210900
WWN(Port A): 234100015d210900
WWN(Port B): 233100015d210900
Revision: 0201
Serial Num: 08349695
Unformatted capacity: 10240.000 MBytes
Read Cache: Enabled
Minimum prefetch: 0x0
Maximum prefetch: 0xffff
Device Type: Disk device
Path(s):
/dev/rdsk/c5t600015D0002109000000000000000104d0s2
/devices/scsi_vhci/ssd@g600015d0002109000000000000000104:c,raw
Controller /devices/pci@1f,0/pci@1/SUNW,qlc@2/fp@0,0
Device Address 234100015d210900,0
Host controller port WWN 210000e08b0fb16b
Class primary
State ONLINE
Controller /devices/pci@1f,0/pci@1/SUNW,qlc@2,1/fp@0,0
Device Address 233100015d210900,0
Host controller port WWN 210100e08b2fb16b
Class primary
State ONLINE

Remarks:
1. Can be found by "luxadm probe"
2. If there are 2 "Controller" for this device, it is dual path connection
3. If MPxIO is enabled and the storage device/array is symmetric, then the
class of Controller for both paths should be "primary", and state should
be "ONLINE". If the storage device/array is asymmetric, then one controller
class should be "primary" and the the controller class should be "secondary"

After a reboot, a node in a VERITAS Cluster Server (VCS) environment is in an ADMIN_WAIT state or in a STALE_ADMIN_WAIT state

After a reboot, a node in a VERITAS Cluster Server (VCS) environment is in an ADMIN_WAIT state or in a STALE_ADMIN_WAIT state

Problem

After a reboot, a node in a VERITAS Cluster Server (VCS) environment is in an ADMIN_WAIT state or in a STALE_ADMIN_WAIT state
Solution

Below are descriptions of the states that a Cluster Server node could end up in after a reboot as seen from the following command:

# hastatus
attempting to connect....connected

group resource system message
--------------- -------------------- --------------- --------------------
sptsunvcs3 STALE ADMIN WAIT: all system stale
sptsunvcs4 STALE ADMIN WAIT: all system stale


ADMIN_WAIT state:

If VCS is started on a system with a valid configuration file, and if other systems are in the ADMIN_WAIT state, the new system transitions to the ADMIN_WAIT state.
INITING===>CURRENT_DISCOVER_WAIT===>ADMIN_WAIT

If VCS is started on a system with a stale configuration file, and if other systems are in the ADMIN_WAIT state, the new system transitions to the ADMIN_WAIT state.
INITING===>STALE_DISCOVER_WAIT===>ADMIN_WAIT

STALE_ADMIN_WAIT state:

If VERITAS Cluster Server is started on a system with a stale configuration file, and if all other systems are in STALE_ADMIN_WAIT state, the system transitions to the STALE_ADMIN_WAIT state as shown below. A system stays in this state until another system with a valid configuration file is started, or when the command hasys -force is issued.
INITING===>STALE_DISCOVER_WAIT===>STALE_ADMIN_WAIT

Resolution:

If all systems are in STALE_ADMIN_WAIT or ADMIN_WAIT, first validate the configuration file (/etc/VRTSvcs/conf/config/main.cf) on all systems in the cluster by running the 'hacf -verify .' command for syntax error check (ensure that this command is run in the directory containing the main.cf file), and reviewing its contents for proper resource and service group definitions.
Then enter the following command on the system with the correct configuration file to force start VCS.
# hasys -force system_name


This will have the effect of starting Cluster Server on that node and starting Cluster Server running on all other nodes in the ADMIN_WAIT or STALE_ADMIN_WAIT state.

One of the most common causes of a node being in one of these states is the existence of /etc/VRTSvcs/conf/config/.stale. This file is typically left behind if Cluster Server is stopped while the configuration is still open, i.e. someone has forgotten to save changes made to a running main.cf configuration. The .stale file is deleted automatically if changes are correctly saved and will therefore not force the relevant node into an ADMIN state when it next has to restart Cluster Server. As indicated earlier, the file can be safely removed if the main.cf file is known to be ok.